While writing my last several posts on data science, it occurred to me that I should comment on data virtualization. Still in the adoption phase, data virtualization is nevertheless worth examination by companies working to modernize their systems.
People who have been working in data management software development for many years (two decades in my case) will agree that data virtualization is not a new concept, although it’s had several names over the years. But with the evolution of database custom development to business intelligence and the advent of big-data solutions, data virtualization has finally started to gain strong momentum.
In fact, many database professional are using ETL products without realizing that data virtualization capabilities have already been integrated. I believe that as big-data solutions continue to gain popularity and we discover new sources of data, data virtualization will play an immense role in modern enterprise data management and data integration projects.
By concealing the physical implementation of the data layer and the associated metadata from the application layer, data virtualization consolidates multiple data sources into a logical layer. This is in contrast with most data-driven solutions for reporting and business intelligence systems, which are not flexible enough to accommodate small changes in a short timeframe.
Here’s an example. I recently worked with a customer that needed changes to its reporting application. From a data-warehouse design change perspective, the project was simple—just add a couple of new data sources to existing DWH systems. But during project analysis, we discovered that making those small changes would require at least three months of custom development. With enterprises and business processes changing so fast, is that a reasonable timeframe to implement small changes to a mission-critical application? The answer is a resounding “no.”
The biggest challenge for data integration projects is to develop an agile data integration solution/platform to accommodate the fast-changing requirements of an enterprise. Imagine you are building a complex system that requires a dashboard to show reports from a mission-critical, agile application. If you have to build a data warehouse with dozens of facts/dimensions and measures with several complicated business rules, it will take several months for traditional end-to-end development. Using data virtualization, you could expedite the project significantly. Because you do not care about physical location of data sources, you can build solutions much faster.
To summarize, here are the major benefits of data virtualization:
- 11More agility to accommodate fast-changing business processes
- 2Lower cost of development for virtual data warehouse/data-mart solutions
- 3Reduced development time
- 4Fewer, easier-to-handle data quality issues
- 5A simplified, streamlined data governance process
- 6Reduced time to market for data integration projects
- 7Easy to manage back-end and production data (due to reduced downtime of associated consumer and downstream applications)
- 8Improved customer satisfaction (from the ability to introduce new data sources)
- 9 Reduced need to replicate data (real-time and historical data can be combined in a unified way)
I don’t want to sound like a techno salesman pitching data virtualization technology, but if we truly want to keep up with the fast pace of data integration projects, data virtualization is the way to go for data management related solutions. There’s a lot more to discuss on this topic, so I’ll be expanding on these ideas in later posts.