Data liquidity in the age of inference

Water ripples (source: Blazing Firebug via Pixabay)Save the dates for the Artificial Intelligence Conference in New York, happening April 29-May 2, 2018. The call for speakers is now open. It’s a special time in the evolutionary history of computing. Oft-used terms like big data, machine learning, and artificial intelligence have become popular descriptors of a broader underlying shift in information processing. While traditional rules-based computing isn’t going anywhere, a new computing paradigm is emerging around probabilistic inference, where digital reasoning is learned from sample data rather than hardcoded with boolean logic. This shift is so significant that a new computing stack is forming around it with emphasis on data engineering, algorithm development, and even novel hardware designs optimized for parallel computing workloads, both within data centers and at endpoints. A funny thing about probabilistic inference is that when models work…
Original Post: Data liquidity in the age of inference

You need an Analytics Center of Excellence

Ferris wheel (source: Skeeze via Pixabay)Check out Carme Artigas’ executive briefing “Analytics Centers of Excellence as a way to accelerate big data adoption by business” at the Strata Data Conference in Singapore, Dec. 5-7. Registration is now open. More than 10 years after big data emerged as a new technology paradigm, it is finally in a mature state and its business value throughout most industry sectors is established by a significant number of use cases. A couple of years ago, the discussion was still about how big data changed our way of capturing, processing, analyzing, and exploiting data in new and meaningful ways for business decision makers. Now many companies undertake analytical projects at a departmental level, redefining the relationship between business and IT by the adoption of Agile and DevOps methodologies. Real-time processing, machine learning algorithms, and even artificial…
Original Post: You need an Analytics Center of Excellence

Query the planet: Geospatial big data analytics at Uber

Uber’s Presto architecture (source: Courtesy of Zhenxiao Luo)For more on efficient geospatial analysis, check out Zhenxiao Luo and Wei Yan’s session, “Geospatial big data analysis at Uber,” at the Strata NYC Data Conference in NYC, September 25-28, 2017. From determining the most convenient rider pickup points to predicting the fastest routes, Uber aims to use data-driven analytics to create seamless trip experiences. Within engineering, analytics inform decision-making processes across the board. One of the distinct challenges for Uber is analyzing geospatial big data. City locations, trips, and event information, for instance, provide insights that can improve business decisions and better serve users. Geospatial data analysis is particularly challenging, especially in a big data scenario, such as computing how many rides start at a transit location, how many drivers are crossing state lines, and so on. For these analytical requests, we…
Original Post: Query the planet: Geospatial big data analytics at Uber

Load, search, and secure data in multiple formats

Window frame. (source: Robert on Wikimedia Commons)Download Part 1 of the free O’Reilly MarkLogic Cookbook for efficient recipes and hands-on solutions to common problems developers face using XQuery in MarkLogic. In this podcast episode, I speak with Dave Cassel, technical community manager at MarkLogic, creator of a multi-model NoSQL database that aims to integrate data silos for a unified view. We talked about integration patterns for loading and exporting data at ease, an architecture that enables efficient search and queries, and layers of security that follow the data from its original source throughout its lifecycle. Work on applications, as soon as you load the data The idea of ‘load as-is’ is that your data already exists in some form, and that form can vary dramatically. It can be word documents or XML or JSON data. It can also be stuff…
Original Post: Load, search, and secure data in multiple formats

What are the challenges in building an anomaly detection system for streaming and live data?

How do you identify the “optimal” trade-off between speed and accuracy in a mission critical live data health care app? Master statistician Arun Kejariwal walks you through such conundrums in this video on the implications of velocity of data in anomaly detection systems. Get wise to the conundrums, trade-offs, and gotchas of streaming anomaly detection in this Safari course by Arun Kejariwal that details anomaly detection history, applications, and state-of-the-art techniques. Article image: What are the challenges in building an anomaly detection system for streaming and live data? (source: O’Reilly).
Original Post: What are the challenges in building an anomaly detection system for streaming and live data?

What are the challenges in building an anomaly detection system?

What may work for anomaly detection today may not work tomorrow. Master statistician Arun Kejariwal helps you understand why in this fascinating walk-through of modern anomaly detection systems – how the definition of “normal” changes as applications, platforms, infrastructure, and algorithms evolve; as well as recognizing the effect of context in what defines an anomaly. Learn how you, your data, and your decision-making can keep from getting skewed in master statistician Arun Kejariwal’s course from Safari on what works – and doesn’t work – when building anomaly detection systems. Article image: What are the challenges in building an anomaly detection system? (source: O’Reilly).
Original Post: What are the challenges in building an anomaly detection system?

Why might an off-the-shelf anomaly detection technique not work in practice?

Off-the-shelf anomaly detection techniques carry the baggage – and traps – of their underlying assumptions. Master statistician Arun Kejariwal demonstrates the flaws of such techniques, teaches you the necessity of always validating assumptions, and shows you why you should keep off-the-shelf off your shelf. Learn to recognize and overcome common problems in anomaly detection techniques and more in this Safari course from Arun Kejariwal that explains what really works (and doesn’t work) in today’s state-of-the-art anomaly detection systems. Article image: Why might an off-the-shelf anomaly detection technique not work in practice? (source: O’Reilly).
Original Post: Why might an off-the-shelf anomaly detection technique not work in practice?

The state of machine learning in Apache Spark

Fractal (source: Pixabay)To learn more about machine learning in Spark, the upcoming Strata Data Conference in NYC, September 25-28, 2017, features a half-day tutorial on “Natural language understanding at scale with spaCy, Spark ML, and TensorFlow,” and a full-day tutorial on “Analytics and Text Mining with Spark ML.” Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode of the Data Show, we look back to a recent conversation I had at the Spark Summit in San Francisco with Ion Stoica (UC Berkeley professor and executive chairman of Databricks) and Matei Zaharia (assistant professor at Stanford and chief technologist of Databricks). Stoica and Zaharia were core members of UC Berkeley’s AMPLab, which originated Apache Spark, Apache Mesos, and Alluxio.…
Original Post: The state of machine learning in Apache Spark

What does dysfunction look like on a data team?

Join Jesse at the Strata Data Conference New York, September 25-28, 2017, and learn how to identify the top five dysfunctions of a data engineering team. So, you have to build a data team. You are processing data and getting insights, but something seems off. You are not getting back what you expected from your investment. In this short video, Jesse Anderson discusses the types of problems you may be encountering and reasons why you should look inward, at the skills and makeup of your data engineering team, to help solve those problems. Article image: What does dysfunction look like on a data team? (source: O’Reilly).
Original Post: What does dysfunction look like on a data team?

How do use cases benefit from real-time processing?

Attend the Strata Data Conference in New York, September 25-28, 2017, and learn more about implementing real-time systems. Real-time data processing is now a critical task for businesses and their customers. In this brief video, Jesse Anderson provides an overview of batch and real-time processing, and examines use cases in finance, systems monitoring, and more. Article image: How do use cases benefit from real-time processing? (source: O’Reilly).
Original Post: How do use cases benefit from real-time processing?