Data virtualization enables organizations to increase analytics effectiveness and reduce analytics costs by creating a virtual layer that aggregates data from multiple sources. This enables companies to access data from multiple sources without setting up a costly data warehouse or spending time on data preparation. It is also called Logical data warehouses – LDW, data federation, virtual databases, and decentralized data warehouses.
A traditional data warehouse relies heavily on ETL that needs a significant programming effort with special tools and scripting languages. A logical data warehouse creates a virtual layer that handles the ETL.
What is data virtualization?
Data virtualization is one of those technologies that enable Data-as-a-Service (DaaS) solutions. Data virtualization is a data management approach that enables data processing without dealing with the technical aspects of data storage. Wikipedia provides a more formal description:
Data virtualization is any approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically locate
Data warehousing is a technology that aggregates structured data from one or multiple sources in order to compare and analyze it for business intelligence. It is effective for getting a better understanding of the overall performance of a business because it makes a wide range of data available for analysis.
Logical Data Warehouse is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.
Why is it important now?
According to Gartner, data virtualization is the evolution and augmentation of data warehouse practices, not a replacement. These factors drive its growing importance:
- Increasing complexity of businesses: Acquisitions and phases of fast growth leave businesses with a multitude of physical databases that are not integrated. Data virtualization is the fastest approach to merge them for analytics
- Increasing importance of analytics: Data virtualization enables faster analytics. As organizations’ interest in analytics and data driven decision making rises, the importance of a logical data warehouse gets more apparent since it enables a faster analytics process
- Data hungry AI algorithms: While in the past analytics relied on the relationship between a few variables, modern deep learning algorithms are data hungry and can identify counter intuitive relationships in data. Therefore, aggregating critical data in a data warehouse is no longer sufficient for advanced analytics applications.
- Increasing data volumes: Data generation rate is obviously increasing which makes it harder to keep a physical data warehouse up to date. Data virtualization is a more advanced approach to processing data from remote locations. Data virtualization enables some data processing to be completed at the remote data storage units, reducing data communication time.
How does it work?
Data virtualization combines two technologies to offer organizations flexible and scalable data. Two technologies are:
- Data federation: Connecting multiple databases and showing them to the user as a single database. The technology provides flexibility to query data.
- Analytical database management: Providing scalability to a logical data warehouse. Analytical databases are available as data warehouse appliances and no effort is required to relocate data for analysis in this virtual layer.
What are the advantages of an LDW vs a traditional DW?
The benefits of data virtualization (logical data warehouses) can be grouped into 2 categories. Here are the benefits:
Ease of use
- A logical data warehouse is up to 90% faster to implement since less effort is required for its setup.
- The source file’s format is not critical to access data for analysis
- Each data is accessed via a range of services eg. SQAP, Odata, Sharepoint
Improved analytics effectiveness
- Data virtualization minimizes data latency enabling real-time analysis from different data sources. Depending on data transfer speeds, data virtualization may not be able to offer real-time analysis but it definitely offers more up to date data than a physical data warehouse (DW) that can not be updated every minute in a cost-effective manner.
- There is no data storage in a logical data warehouse. Data is at the source to access. Logical data warehouse features replace extract, transform and load (ETL) processes, data scientist can shift their focus on data query and analysis.
What are the challenges of data virtualization?
- A sufficient amount of data sources (>10) is needed for a logical data warehouse to make more sense in terms of efficiency. Otherwise, the trade-off between cost and speed may not be worth it.
- Logical DW does not provide a single source of truth like a traditional DW. Stability, availability, data consistency and correctness of a logical DW may challenge organizations.
What are example case studies?
Challenge: Anadolu Hayat is an insurance company in Turkey. They were facing storage challenges because they were in parallel planning to move their data center. These shortages lead to problems in database creation requests and it would take more than a week to export/import a database from production systems.
Solution: Anadolu Hayat partnered with Delphix to ingest production databases so that they provided their test and solution development teams more accurate data with less time and fewer resources.
Results: With Delphix, they reduced the database import/export time from 5+ days to 10+ minutes and 250 TB storage is avoided by virtual databases.
What are the leading tools for LDW?
|Vendor||Tool||Year Founded||IPO Status||Additional Features|
|Actifio||Actifio Sky||2009||Private||-Delivered as an OVA, VHD, AMI, or as an image from other cloud marketplaces|
|AtScale||AtScale Intelligent Data Virtualization||2013||Private|
|Data Virtuality||Logical Data Warehouse||2012||Private||-Supports standard APIs such as JDBC, ODBC, REST to deliver data to the data consumers.
-You can connect your data source in XML, JSON, CSV, xSV formats and manage data in SQL
|Denodo||Denodo Platform||1999||Private||-Available on leading cloud marketplaces such as Amazon Web Services (AWS), Microsoft Azure and Docker.
-Supports OAuth 2.0, SAML, OpenAPI, OData 4
|IBM||Cloud Pak for Data||1911||Public||-Embedded governance capabilities such as automated data discovery and classification, data masking, data zones and data lifecycle management|
|Informatica||PowerCenter||1993||Private||-Compatible with XML, JSON, PDF, Microsoft Office, and Internet of Things machine data|
|Oracle||Oracle Data Service Integrator||1977||Public||-Provides a virtual relational database interface to applications via JDBC or ODC|
|Red Hat||Red Hat Virtualization Platform||1999||Private|
|SAS||SAS Federation Server||1976||Private||-Compatible with popular relational databases, including DB2, Oracle, SAP, SQL Server, Teradata, and Greenplum|
|Stone Bond Technologies||Enterprise Enabler||2001||Private||-Simple drag & drop interface to auto-generate virtual models that can be consumed by BI reports, web services, and applications|
Informatica is the leading company in Logical DW market, and it generates $1.1B annually.
If you still have questions, don’t hesitate to contact us:
How can we do better?
Your feedback is valuable. We will do our best to improve our work based on it.