Out with the ETLs and in with Data Virtualization

One of the key factors in a business’s success is how quickly and efficiently it can manage data. Effective data management allows businesses to maintain historical records while also growing and developing to compete in today’s market. Traditionally, the process of gathering and using data could be described with three letters: ETL. ETL stands for extract, transform and load. These three operations are how most businesses handle their data for any and all purposes (e.g. business intelligence, application development); data is extracted from the original source, transformed for its new purpose and loaded into the business’s database management system (DBM). While this method seems straight-forward and easy-to-use, there are several glaring shortcomings associated with ETL systems, including but not limited to:

  • Slow processing speeds
  • Data structure format requirement (i.e. ETLs can’t handle both structured and unstructured data)
  • Compatibility issues between DMB platforms
  • Protection and masking efficiency for sensitive data

Slow processing speeds are primarily a result of increases in the amount of data accessible to companies and developers today. With the beginning of the Era of Big Data, increasingly larger volumes of data are flooding business every day, so faster processing speeds are necessary to use this ever-increasing quantity of data effectively. The non-uniformity of data being processed can also be attributed to the presence of more data in today’s world. Compatibility issues occur when data was collected by different sources or at different times and was consequently managed on different, incompatible platforms (i.e. Think about how SQL and Oracle data collection schemes don’t get along easily). Protecting sensitive data with ETLs requires scripting and manual masking of this data, which is time-consuming.

Data virtualization offers fixes for all of these problems, and it appears that businesses are trending towards data management using data virtualization as ETLs become less and less sufficient. First, let’s break down how data virtualization works. Essentially, data virtualization software allows businesses to import data from any source, and the software processes, clones, and stores the data for a variety of purposes like development, reporting, testing, and analytics. Oh, and all of this happens in real time. Delphix is one company putting out software to accomplish these tasks. Their Data as a Service (DaaS) software allows for better data management, to put it simply. An overview of the Dephix Daas Platform can be seen below.

Figure 1. Delphix Daas Platform acceleration workflow.
Figure 1. Delphix Daas Platform acceleration workflow.

What does data virtualization do differently that ETLs to make it so great? In short, good data virtualization software eliminates the back-and-forth that comes with traditional database management. Developers no longer need special data permissions and additional analytic capabilities only offered from managers and system, storage and database administrators. This streamlined effect allows developers to gain access to data quicker and ultimately develop and test newer, better applications that lead to increased profits. Data processing itself is more efficient because physical copies of data are not being used. The use of virtual copies of data allows for quick access at any time while using less memory. These copies also aren’t pushed across networks like traditional data, meaning more bandwidth is available, which results in faster processing times. When accessing sensitive data, masking and encrypting the data is also streamlined with the use of automated masking applications.

Where data virtualization software is used today, it’s often used for application development. Following development, data virtualization can be used for testing and integration into the market. Once data virtualization becomes more commonplace – and all the signs seem to point that it will – it will likely be used for business intelligence applications like reporting, customer relationship management (CRM; historical records of customer/data), and increased handling of Big Data. Until then, expect to see a decline in traditional DBMs and ETLs and expect more data virtualization software companies like Delphix to grow and expand.

Rebecca Seasholtz

Rebecca is a senior Materials Science and Engineering major at Georgia Tech. She specializes in soft materials (i.e. plastics and textiles) and has also worked extensively with functional materials for electrical applications. Rebecca is originally from Grayson, GA and likes to spend her free time running, cycling, drinking coffee, or hanging around the campus house of a ministry she attends at Georgia Tech. Contact Rebecca at [email protected]