Because of the endless confusion from decision makers on establishing data driven decision making in their organization at all levels this post seeks to explain one of the fundamentals in mastering business analytics. Again a Data Warehouse is a critical component to any business where insights are required to manage, gain better insights, or create action for the business. It is a look backwards for trending analysis, it is a curated set of information for quick analysis by slicing and dicing relationships in the data, it is a means for basing forecasting/planning activities with actuals side-by-side, and it is a consistent means for comparative analysis across calendars and changes to meaningful data point attributes across time. Regardless of sewing circle conversation of building a data lake storage, using ETL vs ELT, hiring a data scientist or engaging in machine learning, a data warehouse (call it a data mart if you’d like based on size or scale) is a fundamental asset to any organization that uses data to make decisions.
Since a Data Warehouse needs data to aggregate and build relationships, a database is often that source of data for the Data Warehouse. A Data Warehouse can combine multiple sources of data together to one holistic view of the curated need for the analytical power required of the Data Warehouse. One or more data sources for the Data Warehouse can come from a database such as an ERP or CRM system (an example would be customer, financials, GL, accounting, sales, etc. data). And through a transformation process the data elements are combined, aggregated, parsed, and otherwise enriched based on business logic the company needs to facilitate the aforementioned analysis. Traditionally a Data Warehouse is a special type of database that is infrequently updated compared to the transactional nature of most system of record operational databases used to support ERP and CRM systems such as SAP, Oracle, PeopleSoft, SAGE, MS Dynamics, etc. Each of those systems would act as sources for the Data Warehouse for example.
Here is a basic breakdown of some simple characteristics of a Data Warehouse and a Database for your review.
Characteristic Item | Data Warehouse | Database |
---|---|---|
Purpose / Identified By | Trend Analysis, Analytics, Reporting | Transactional Events, Data Entry |
Transactional / Analytical | Analytical | Transactional |
Sources of Information | Structured data, any logical data source, CSV, Flat File, Operational Database | Data input from users via some form based application, usually a single source transactional system |
Automation Type | Transformation process, usually in batch, micro-batch, or near real-time, typically ETL or ELT processes | None, usually. People or triggered events are the means for adding event records to this transactional repository |
Database Model Form | Denormalized, typically to create fast data relationship queries of the stored data. Typically dimensionally structured (Star schemas, satellites, etc.) | Usually Third Normal Form (3NF), a normalized approach to application development |
Storage Form | File system based and often in memory for faster query performance, often columnar storage is used for MPP capabilities | Caters to operations requiring fast write, high-throughput into relational column-row structures baked by file system storage |
Security / Access | Based on model developed, but often follows constraints of database vendor or any attached BI/Analytics/Reporting application used to query the data. | Based on the database vendor, and the application developed, usually a GUI, that simplifies the data writing/input process of the operational requirement, ex: GL account entries. |
Please comment and leave feedback to what you think are some additional characteristics that delineate a typical Database from a Data Warehouse and we’ll seek to add them to the list.