All the operational systems eventually need some kind of analytic functionality. It can be static reports or ad-hoc analysis using an OLAP tool. This facet is often overlooked at the beginning of the project and rears its head after data has grown in the system. Suddenly everybody wants reports, summarised by x & y, with slice and dice capability and all the colourful charts you can think of. The schedule is very tight by this time and requirements very high. I can think of three approaches in such situations and ,in his blog, I would like to briefly compare them. Each one has its own merits and which one to use depends on many external factors such as budgets,resources, schedule, skills etc. I must say that the approaches discussed are heavily influenced by my experience with the SQL Server stack and there can be alternatives to them on other platforms. Lets first define what I mean in each approach.
1. The Traditional Route : This is the tried and trusted route. In a nutshell, we load data from operational source systems into a common staging area, run the Extract-Transform-Load (ETL) routines and load it into a data warehouse. The data warehouse can use Kimball, Inmon, Data Vault or any hybrid methodology. There is an analytics layer (OLAP) on data warehouse and reports run off this layer.
2. The Views Route : In the views route, we create views on the operational database. They can be SQL Server views or can be materialized as actual tables and populated periodically. I will not call it a data warehouse or even virtual data warehouse because we don’t combine data from various source systems or there is no extensive data cleaning and transformation. If you think about it they are simply a different view on the source data. OLAP cubes can built on the top of this layer.
3. The Direct Reports Route : In this approach, we create reports which run directly off the source database. There is no OLAP functionality. The reports are very static in nature and with very limited ad-hoc analysis capacity. The report queries are directly run on the source system. Mirroring/replication can be used to reduce the stress on the source system.
Lets now compare the three routes. Please see table below.
|Factors||Traditional Route||Views Route||Direct Reports Route|
|Data Latency||High Latency||Low latency. Can be made near real time.||Real time.|
|Database size||For large data volume, this is preferred approach because of the batch processing mode.||Not preferred for large data volumes.||Not preferred for large data volumes.|
|Business Logic, control and governance||Centrally maintained||Somewhat centrally maintained||Resides in each individual report|
|Effort & project resources required||More resources will be required||Comparatively less resources are required||Depends on the number of reports and complexity of reports|
|Effect on source system||Less pressure on the source system||Queries get complex and stress source system. This can be mitigated by using mirroring or replication||Stress on the system is high|
|Ad-hoc analysis||Possible using an OLAP tool on the data warehouse||Possible using an OLAP tool on views||Very limited|
|Flexibility||Changing the cube structure is difficult because we would have to change ETL||Changing the cube structure is comparatively simple||Reports can be changed easily|
|Data Integration||Multiple data sources can be integrated||Difficult to integrate data from different sources||Not possible to integrate different data sources|
|Data History||History can be maintained in DW||No history is maintained except in the source system||No history is maintained except in the source system|
|Data Cleaning||Extensive data cleaning can be done in ETL||Elementary data cleaning such as checking empty values, data types etc. can be done||Elementary data cleaning such as checking empty values, data types etc. can be done|
|Lookups||Lookups can be done for data from various sources||Lookups are limited to within same data source only||Not possible outside of data source|
|Source Systems Data Structure||Doesn’t matter||Have to have business keys, foreign key relationships, constraints, indexes etc well defined||Have to have business keys, foreign key relationships, constraints, indexes etc well defined|
I am very sure there are a number of other factors which should be taken into consideration when deciding what route to take. As said earlier the choice depends on each individual scenario. As usual comments are always welcome.