A long time ago I wrote a blog post where I described three approaches to providing operational reports and compared them against each other. They are 1) traditional DW approach, 2) the abstract views approach and 3) the direct query approach. The blog post can be found here and I keep it handy whenever I am asked about operational reports.
I have been recently looking at Data Virtualization (DV) and started thinking how it can be used in a BI project. This blog is about that. If you are not familiar with DV, this youtube video from one of the DV software vendors provides good introduction. I would highly recommend watching it before proceeding.
This is the wikipedia definition of DV but in a nutshell it is a technique in which multiple data sources are joined together in a logical layer so as to abstract the complexities of the data sources and provide a unified view of the data to the end users. Going back to my original blog post, one of the difficulties of the views approach is in integrating multiple data sources. That’s primarily because views are simply projections on tables in databases. It’s quite difficult to create a view which spans across different databases for e.g. SQL Server view based on data in Oracle unless you some how bring it in the SQL server AKA ETL it. It is even more challenging if the data source is a flat file or XML or JSON. How would you create views on them?
Enter DV. In DV, as mentioned earlier, multiple data sources are joined together in a virtual layer. The type of the data sources can vary from relational databases to files in Hadoop to web services. The ETL is performed on the fly and in-place,if need be, i.e. multiple data sources are integrated in real time. If we can point our reporting tool to this virtual layer, we can provide real time operational reporting. Sweet!! So imagine a quick report you want to knock together which involves SQL Server, a web service and Excel file. DV will gladly connect to three, allow you mash them together and publish this data. You can point the reporting tool to published data and are good to go. And all in real time – that’s added bonus.
This all looks very cool hence a word of caution. It sounds like DV is replacement for ETL and data warehouse but it is NOT. There are far more advantages to having a DW and ETL process in place but they are beyond the scope of this blog. We can certainly think of creative ways of using DV in conjunction with traditional DW and ETL. Providing real time operational reports in a way mentioned above can be one of them.