What is Data Virtualization? Benefits, Case Studies & Top Tools 
Data virtualization enables organizations to increase analytics effectiveness and reduce analytics costs by creating a virtual layer that aggregates data from multiple sources. This enables companies to access data from multiple sources without setting up a costly data warehouse or spending time on data preparation. It is also called Logical data warehouses – LDW, data federation, virtual databases, and decentralized data warehouses.
A traditional data warehouse relies heavily on ETL that needs a significant programming effort with special tools and scripting languages. A logical data warehouse creates a virtual layer that handles the ETL.
What is data virtualization?
Data virtualization is one of those technologies that enable Data-as-a-Service (DaaS) solutions. Data virtualization is a data management approach that enables data processing without dealing with the technical aspects of data storage. Wikipedia provides a more formal description:
Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically locate
To understand data virtualization, we need to understand what the traditional Data Warehouse (DW) means. Data warehouse is a commonly used data integration technique that is used to centralize data. We define it as: Data warehousing is a technology that aggregates structured data from one or multiple sources in order to compare and analyze it for business intelligence. It is effective for getting a better understanding of the overall performance of a business because it makes a wide range of data available for analysis. The Logical Data Warehouse (LDW) is the most common implementation of data virtualization. It is a term invented by Gartner in 2011. LDW differs from data warehouse because it is not monolithic. Its architecture, besides from core data warehouse of organization, includes external data sources such as enterprise systems, web and cloud data. LDW connects multiple data sources and allows querying data via SQL to make data accessible. In LDW, the data remains in place, real-time access is given to the source system for the data. Gartner Research Vice President Mark Beyer defines logical data warehouse: Logical Data Warehouse is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.
Why is it important now?
According to Gartner, data virtualization is the evolution and augmentation of data warehouse practices, not a replacement. 1 These factors drive its growing importance:
- Increasing complexity of businesses: Acquisitions and phases of fast growth leave businesses with a multitude of physical databases that are not integrated. Data virtualization is the fastest approach to merge them for analytics
- Increasing importance of analytics: Data virtualization enables faster analytics. As organizations’ interest in analytics and data driven decision making rises, the importance of a logical data warehouse gets more apparent since it enables a faster analytics process
- Data hungry AI algorithms: While in the past analytics relied on the relationship between a few variables, modern deep learning algorithms are data hungry and can identify counter intuitive relationships in data. Therefore, aggregating critical data in a data warehouse is no longer sufficient for advanced analytics applications.
- Increasing data volumes: Data generation rate is obviously increasing which makes it harder to keep a physical data warehouse up to date. Data virtualization is a more advanced approach to processing data from remote locations. Data virtualization enables some data processing to be completed at the remote data storage units, reducing data communication time.
How does it work?
Data virtualization combines two technologies to offer organizations flexible and scalable data. Two technologies are:
- Data federation: Connecting multiple databases and showing them to the user as a single database. The technology provides flexibility to query data.
- Analytical database management: Providing scalability to a logical data warehouse. Analytical databases are available as data warehouse appliances and no effort is required to relocate data for analysis in this virtual layer.
What are the advantages of an LDW vs a traditional DW?
The benefits of data virtualization (logical data warehouses) can be grouped into 2 categories. Here are the benefits:
Ease of use
- A logical data warehouse is up to 90% faster to implement since less effort is required for its setup. 2
- The source file’s format is not critical to access data for analysis
- Each data is accessed via a range of services eg. SQAP, Odata, Sharepoint
Improved analytics effectiveness
- Data virtualization minimizes data latency enabling real-time analysis from different data sources. Depending on data transfer speeds, data virtualization may not be able to offer real-time analysis but it definitely offers more up to date data than a physical data warehouse (DW) that can not be updated every minute in a cost-effective manner.
- There is no data storage in a logical data warehouse. Data is at the source to access. Logical data warehouse features replace extract, transform and load (ETL) processes, data scientist can shift their focus on data query and analysis.
What are the challenges of data virtualization?
- A sufficient amount of data sources (>10) is needed for a logical data warehouse to make more sense in terms of efficiency. Otherwise, the trade-off between cost and speed may not be worth it.
- Logical DW does not provide a single source of truth like a traditional DW. Stability, availability, data consistency and correctness of a logical DW may challenge organizations.
What are example case studies?
Challenge: Anadolu Hayat is an insurance company in Turkey. They were facing storage challenges because they were in parallel planning to move their data center. These shortages lead to problems in database creation requests and it would take more than a week to export/import a database from production systems.
Solution: Anadolu Hayat partnered with a data virtualization company to ingest production databases so that they provided their test and solution development teams more accurate data with less time and fewer resources. 3
Results: The insurance firm reduced the database import/export time from 5+ days to 10+ minutes and 250 TB storage is avoided by virtual databases.
What are the leading tools for LDW?
-Delivered as an OVA, VHD, AMI, or as an image from other cloud marketplaces
|AtScale Intelligent Data Virtualization
|Logical Data Warehouse
-Supports standard APIs such as JDBC, ODBC, REST to deliver data to the data consumers. -You can connect your data source in XML, JSON, CSV, xSV formats and manage data in SQL
-Available on leading cloud marketplaces such as Amazon Web Services (AWS), Microsoft Azure and Docker. -Supports OAuth 2.0, SAML, OpenAPI, OData 4
|Cloud Pak for Data
-Embedded governance capabilities such as automated data discovery and classification, data masking, data zones and data lifecycle management
-Compatible with XML, JSON, PDF, Microsoft Office, and Internet of Things machine data
|Oracle Data Service Integrator
-Provides a virtual relational database interface to applications via JDBC or ODC
|Red Hat Virtualization Platform
|SAS Federation Server
-Compatible with popular relational databases, including DB2, Oracle, SAP, SQL Server, Teradata, and Greenplum
|Stone Bond Technologies
-Simple drag & drop interface to auto-generate virtual models that can be consumed by BI reports, web services, and applications
Informatica is the leading company in Logical DW market, and it generates $1.1B annually.
Discover more on ETL processes in traditional DW, here are our articles for:
Also, you can dig into our data-driven lists to compare data visualization tool.
If you still have questions, don’t hesitate to contact us:
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
To stay up-to-date on B2B tech & accelerate your enterprise:Follow on
Next to Read
Your email address will not be published. All fields are required.