Trying to separate the nuances between a Data Warehouse and a Data Mart is similar to buy a car and try to determine if you want a coupe (2 doors) or a sedan (4 doors). In this loose analogy, regardless of which you purchase, you’ll go places (hopefully fast when you need to), it has most of the same components (it needs gas, it has an engine, tires, etc.) but if choose a coupe, it’s fairly difficult to add 2 extra doors if you’ve outgrown the space (for example, going from single life to married with kids). So, in essence a Data Mart is a type of Data Warehouse, but usually with a very specific focus or break out that identifies itself with a single domain or line of business in the organization, like sales/market, HR, finance, etc. It’s more focused, and may contain different perspectives of data that are unique to how that domain of the business operates.
Through experience, we’ve seen that an organization should have a big picture view of their data and data warehouse initiatives. If only one domain or line of business in the organization is ready to move forward to satisfy their business needs that is an okay approach. As a minimally viable product (MVP) or firm solution without boiling the ocean, an organization may wish to focus on building a smaller subset of what will ultimately be the Enterprise Data Warehouse. This is the recommended approach but with caution that it is designed, and built for the big picture use case which would ultimately include all business units and scalable enough to include future business units yet to exists. By creating a Data Warehouse with the enterprise in mind one can safely spin off a subset, called a Data Mart, knowing that the sub-set Data Mart relates to the main Enterprise Data Warehouse. Therefore, any unique line of business data enrichments, if any, that need to be separate from the Data Warehouse still conform to the core Data Warehouse design. This means all lines of businesses ultimately use the same calendar, the same lists of: products, customer, geography, etc. And the organization still maintains its original vision of having a single source of truth data repository, perhaps though in this data mart off-shoot approach taking baby steps to achieve it, but all the time creating business value. While there are different schools of thought on this the above approach is the most systematic, agile, and productive compared to a large waterfall data warehouse project. YMMV.
Here is a basic breakdown of some simple comparable characteristics of a Data Warehouse and a Data Mart for your review.
Characteristic Item | Data Warehouse | Data Mart |
---|---|---|
Purpose / Identified By | Central single source of truth with conformed ability to integrate multiple domain subject areas/LOBs and cross-functional analytics | Line of business specific domain, and area of expertise still needing relation to remainder of the organization |
Transactional / Analytical / Other | Analytical, Historical Data | Analytical – varying degrees of granularity |
Sources of Information/Data | Structured data, any logical data source, CSV, Flat File, Operational Database | Portioned from the Data Warehouse typically, and may have some enriching data that is LOB specific, so, structured data, any logical data source, CSV, Flat File, Operational Database |
Automation Type | Transformation process, usually in batch, micro-batch, or near real-time, typically ETL or ELT processes | Same |
Size / Performance | Current and historical data often lead the the Data Warehouse storing, aggregating, and implementing business logic on years (typically last 3-7 years) of data which can grow to high gigabytes to terabytes of data | Since typically based on a the Data Warehouse, it can house the same amount of data for its domain/LOB. Usually the LOB has some say on data retention so could by less than but typically not more than the DW |
Storage Form | File system based and often in memory for faster query performance, often columnar storage is used for MPP capabilities | Same |
Security / Access | Based on model developed, but often follows constraints of database vendor or any attached BI/Analytics/Reporting application used to query the data. | Based on object policies, buckets, and other identity access management protocols. |
Price & Performance | Multi-parallel Processing (MPP), and speed query results especially on aggregated data, for hot data and cold data, typically with a higher storage cost. | Same, though typically less data |
Users (typical) | Line of Business Managers, Business Analysts, Data Engineers, Executives, Data Scientists | Same |
Use Cases | Near real-time, intra-day, batch, micro-batch analytics/reporting, Business Intelligence, Data Visualizations, Executive Reporting, Enterprise Business Analytics | Example: LOB leadership needing to add third-party data to enrich their forecast, marketing campaign, etc. for existing customers. This bottom up approach needs to be fast, and no need to disrupt the larger DW. |
The benefits of a data warehouse area clear, it
- Creates a single source of truth for the organization
- Creates a point of confidence in data used to make decisions
- Creates a repeatable process, with a predictable outcome
- Brings together data from multiple data sources (aka: federated data)
- Allows a historical data storage mechanism for operational data, etc. with speedy retrieval
- Offloads operational system reporting which encumbers daily operations, to great improve performance and quality of information
- Takes an organization to the next level of data-driven decision making when they can focus on creating business value instead of wrangling data
Please comment and leave feedback to what you think are some additional characteristics that delineate a typical Data Mart from a Data Warehouse and we’ll seek to add them to the list.