As organizations become more sophisticated in their use of Big Data, many are struggling to make real-time business decisions before opportunities disappear. Legacy infrastructure holds them back because it cannot scale to the speed and volume required to analyze Big Data in real-time.
By Ryan Betts
Sluggish legacy systems force organizations to use outdated data to make critical business decisions. But no longer is it necessary to wait for insight and take action until data has been analyzed deeply in a Big Data store. This transformation – toward streaming analytics that enable real-time action – is changing the way in which enterprises manage both data in motion—fast data streaming in from millions of endpoints—and data at rest, or Big Data stored in Hadoop and other data warehouses.
New opportunities to extract value require that enterprises adopt new approaches to data management. But many traditional database architectures and systems are incapable of dealing with fast data’s challenges. Enterprises must make a number of development decisions to unlock the value of Big Data, and most of these decisions will center on first taking advantage of fast data – streams of data in motion.
Related: Why Big Data Isn't Always the Answer
Chain of events
As an example, it’s much easier to extract value from data when you view the data processing pipeline as a chain of events with different values. If, say, you design applications to combine real-time analytics on incoming data with transactions before pushing the data to the data lake, you’ll be prepared to extract business value from data in real time, not after post-facto analysis in Hadoop or another Big Data store. In this scenario, it’s necessary to have a component on the front end of the enterprise data architecture capable of ingesting and interacting with multiple data streams, in real time, at the same time.
Not surprisingly, to make this work, developers must understand the difference between big-volume applications, where it is most important to manage data volume (scale) and less important to react quickly and accurately (speed), and high-velocity applications, which require an architecture that can react in real-time to incoming data. High-velocity apps require the ability to analyze data as it is ingested so it can be used for real-time response and action; high-volume apps may skew toward providing batch-style, Big Data analytics.
Organizations that place value on taking action on incoming data need streaming analytics that inform actions by accurately correlating multiple streams of real-time data. For this reason, it’s important to choose appropriate tools – for example, a fast, transactional database – to support applications. As a corollary, it never makes sense to write applications to fix your database or data pipeline.
App developers dealing with streams of data also must understand the collect->explore->analyze->act cycle. Once you’ve analyzed old data, you want to use those insights to optimize real-time customer interactions. Fast ingestion is necessary but not sufficient. An approach that links the fast data stream to historical data stores enables exploration of analyzed data that may correlate with incoming data; faster analytics enable instant understanding and provide extra insight to inform appropriate actions.
Because data has greatest value as it enters the pipeline, it’s vital to harness its actionable power in this very moment. In too many cases, enterprises are pushing fast data directly into data warehouses, missing the opportunity to extract valuable real-time insights from data streams using in-memory technology. Realizing the benefits of fast data can be the edge your business needs to gain that elusive competitive advantage.
Related: Big Data and Business Intelligence
Ingest and interact
Fast data speeds the payoff for companies that have vast stores of Big Data. Simply collecting data for exploration and analysis does not prepare a business to act in real time, since data now flows into the organization from millions of endpoints. It’s necessary to have a component on the front end of the enterprise data architecture to ingest and interact on data, perform real real-time analytics, and make data-driven decisions on each event. Applications can then take action, and data can be exported to the data warehouse for later historical analytics, reporting, and analysis.
The missing link between fast and big is a unified enterprise data architecture that links high-value, historical data to fast-moving, inbound data from multiple endpoints. An in-memory operational system that can decide, analyze, and serve results at fast data’s speed is key to making Big Data work at enterprise scale. Enterprises capable of managing fast data are in the best position to unlock the potential of Big Data by making decisions in real-time data to drive sales, connect with customers, inform business processes, and create value from real-time data.
Traditionally, organizations have divided analytics and transaction processing with dedicated enterprise data warehouses, cubes and other tools for analytics, and dedicated OLTP systems for transactions and operations. As Big Data continues to transform the way we interact with the world, with our customers, and with devices, this division is evolving. The scale and speed of new, diverse flows of data, and the limitations of a purely Big Data focus, are causing organizations to turn to emerging high-performance OLTP systems which enable real-time business decisions by linking streaming analytics and transactions in a real time system.