Step by Step Design of Dimensional Model

1. First identify the business process.

In this step you will determine what is your business process that your data warehouse represents. This process will be the source of your metrics or measurements.

2. Identify the Grain

You will determine what does one row of fact table mean. In the previous example you have decided that your grain is ‘monthly sales per location per product’.

3. Identify the Dimensions

Your dimensions should be descriptive (SQL VARCHAR or CHARACTER) as much as possible and confirm to your grain.

4. Finally Identify the facts

In this step you will identify what are your measurements (or metrics or facts). The facts should be numeric and should confirm to the grain defines in step 2.

Mapping of Reports to Star Schema

How does this dimensional model map into an actual report?

Now if you look at a report  generated from your  dimensional modeled data warehouse, it will look like this:

OLAP Report

You can tell from the above report that all your dimensions are the row and column headings in the report.  The facts are the numeric numbers in the report.

Advantages of Star Schema

  • Star Schema is very easy to understand, even for non technical business managers
  • Star Schema provides better performance and smaller query times
  • Star schema is easily extensible and will handle future changes easily

Star Schema of OLAP

Star Schema Designing – Examples

What Is A Star Schema? If you carefully look at our new dimensional modeled schema, it will look like this:

OLAP Star Schema

You can easily tell this looks like a STAR. Hence it is also known as Star Schema.

Advantages of Star Schema

  • Star Schema is very easy to understand, even for non technical business managers
  • Star Schema provides better performance and smaller query times
  • Star schema is easily extensible and will handle future changes easily

A Typical SQL Query Template for the Sales Schema will look like:

--- Select the measurements that you want to aggregate using SUM clause
 SELECT P.Name, SUM(F.Sales)
 --- JOIN the FACT table with Dimension Tables
 FROM Sales F, Time T, Product P, Location L                          
 WHERE F.TM_Dim_Id = T.Dim_Id 
 AND F.PR_Dim_Id = P.Dim_Id 
 AND  F.LOC_Dim_Id = L.Dim_Id
--- Constrains the Dimension Attributes
 AND  T.Month='Jan' AND T.Year='2003' AND L.Country_Name='USA'
-- finally the 'group by' clause identifies the  aggregation level. In this example you are aggregating
 -- all sales within a product category.
GROUP BY P.Category

Fact Table Schema

Fact table contains the actual business process measurements or metrics called facts. Usually these facts are numeric.

These facts  are generally Additive.

Some times the facts are semi additive such as balances

Some times they are non additive such as unit price

In the above example where you are building a data warehouse of monthly sales in dollars, your fact table will contain the actual sales numbers, one row per month. In addition to the data itself, you will have the foreign keys for the various dimensions in this row.

Granularity or Grain of Fact Table

 The level of detail of the fact table is known as the grain of the fact table. In this example the grain of your fact table is monthly sales  number per location per product. Your Fact Table will look like this

MonthlySales Fact Table Schema

Field Name Type
TM_Dim_Id INTEGER (4)
PR_ Dim_Id INTEGER (4)
LOC_ Dim_Id INTEGER (4)
Sales INTEGER (4)

In this table the combination of all three dimension table foreign keys make up the primary key in the fact table. (TM_Dim_Id, PR_ Dim_Id, LOC_ Dim_Id) is our primary key.  This is the unique key into our Sales fact table.

The actual data in your MonthlySales Table will look like this:

MonthlySales Fact Table Data

TM_Dim_Id PR_ Dim_Id LOC_ Dim_Id Sales
1001 1001 1003 435677
1002 1002 1001 451121
1003 1001 1003 98765
1001 1004 1001 6543

A fact table may contain one or more Facts. Usually you create one fact table per business process or event. For example if you want to analyze the sales numbers and also advertising spending, they are two separate business processes. So you will create two separate fact tables, one for sales data and one for advertising cost data. On the other hand if you want to track the sales tax in addition to the sales number, you simply create one more fact column in the Sales fact table called Tax.

Product Dimension Table Schema

After de-normalization, your Product table will look like this:
 

Product Dimension Table Schema

Field Name Type
Dim_Id INTEGER (4)
SKU VARCHAR (10)
Name VARCHAR (30)
Category VARCHAR (30)

In this table PR_Dim_Id is our dimension Id. This is the unique key into our Product dimension table. The actual data in your Product Table may look like this:

Product Dimension Table Data

Dim_Id SKU Name Category
1001 DOVE6K Dove SOAP 6Pk Sanitary
1002 MLK66F# Skim Milk 1 Gal Dairy
1003 SMKSAL55 Smoked Salmon 6oz Meat

Design of the Time dimension table in the dimensional modeling

After de-normalization, your Time table will look like this:

Time Dimension Table Schema

Field Name Type
Dim_Id INTEGER (4)
Month SMALL INTEGER (2)
Month_Name VARCHAR (3)
Quarter SMALL INTEGER (4)
Quarter_Name VARCHAR (2)
Year SMALL INTEGER (2)

The actual data in your Time Table may will look like this:

Time Dimension Data

TM _Dim_Id TM _Month TM _Month_Name TM _Quarter TM _Quarter_Name TM_Year
1001 1 Jan 1 Q1 2003
1002 2 Feb 1 Q1 2003
1003 3 Mar 1 Q1 2003
1004 4 Apr 2 Q2 2003
1005 5 May 2 Q2 2003

What is a Dimension Table in the Dimensional Modeling?

Dimension Tables – Key elements of a Dimension Table

Dimensional modeling allows only one table per dimension. But your OLTP data spans across multiple tables as described.  So we need de-normalize the OLTPschema and export into your Dimension Tables. For example, for the location dimension, you achieve this by joining the three OLTP tables and inserting the data into the single Location table.

Your Location Table will look like this:

Location Dimension Table Schema

Field Name Type
Dim_Id INTEGER (4)
Loc_Code VARCHAR (4)
Name VARCHAR (50)
State_Name VARCHAR (20)
Country_Name VARCHAR (20)

All Dimension tables contain a key column called the dimension key. In this example Dim_Id is our dimension Id. This is the unique key into our Location dimension table.

The actual data in your Location Table may look like this:

Location Dimension Table Data

Dim_Id Loc_Code Name State_Name Country_Name
1001 IL01 Chicago Loop Illinois USA
1002 IL02 Arlington Hts Illinois USA
1003 NY01 Brooklyn New York USA
1004 TO01 Toronto Ontario Canada
1005 MX01 Mexico City Distrito Federal Mexico

You may notice that some of the information is repeated in the above dimension table. The State Name and Country Name are repeated through out the table. You may feel that this is waste of data space and against the normalization principles. But in dimensional modeling this type of design makes the querying very optimized and reduces the query times. Also we will learn later that in a typical data warehouse, the dimension tables make up only 10 to 15 % of the storage as the fact table is by far the largest table and takes up the rest of the storage allocation.

What is Online Transaction Processing (OLTP) Schema?

Online Transaction Processing (OLTP) Schema

In Online Transaction Processing (OLTP), the database is designed to achieve efficient transactions such as INSERT and UPDATE. This is very different from the OLAP design. Unlike OLAP, normalization is very important to reduce duplicates and also cut down on the size of the data. our OLTP schema may look like this

Locations Table

 
Field Name Type
Loc_Id INTEGER (4)
Loc_Code VARCHAR (5)
Loc_Name VARCHAR (30)
State_Id INTEGER (4)
Country_Id INTEGER (4)

States Table

Field Name Type
Sate_Id INTEGER (4)
State_Name VARCHAR (50)

Countries Table

Field Name Type
Country_Id INTEGER (4)
Country_Name VARCHAR (50)

OLTP Schema

In order to query for all locations that are in country ‘USA’ we will have to join these three tables. The SQL will look like:

SELECT * FROM Locations, States, Countries where  Locations.State_Id = States.State_Id AND  Locations.Country_id=Countries.Country_Id and Country_Name='USA'

Dimensional Modeling tutorial – OLAP, data warehouse design

In this dimensional modeling tutorial, we intend to teach people with basic SQL and relational database design skills. In this tutorial we show you the dimensional modeling techniques developed by the legendary Ralph Kimball of the Kimball Group.

Dimensional Modeling – Definition

Many data warehouse designers use Dimensional modeling design concepts to build data warehouses. Dimensional model is the underlying data model used by many of the commercial OLAP products available today in the market. In this dimensional model, we store all data in just two types of tables. They are Fact Tables and Dimension Tables. The Fact table contains the main facts or measures. Fact table links to many dimension tables thru foreign keys. We call this resulting schema as star schema because it looks like a star. Because of these multiple dimension tables, all connecting to single fact table, this design concept is named dimensional modeling.

Dimensional Modeling

Fig 1: Dimensional Modeling Schema, resembles a Star and hence called Star Schema

Dimensional Modeling – Fact Table

In a Dimensional Model, Fact table contains the measurements or metrics or facts of your business processes. If your business process is Sales, then a  measurement of this business process such as “monthly sales number” is captured in the fact table. In addition to the measurements, the only other things a fact table contains are foreign keys for the dimension tables.

Dimensional Modeling – Dimension Table

In a Dimensional Model, context of the measurements are represented in dimension tables. You can also think of the context of a measurement as the characteristics such as who, what, where, when, how  of a measurement (subject ). In your business process Sales, the characteristics of the ‘monthly sales number’ measurement can be a Location (Where), Time (When), Product Sold (What).

The Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code. Generally the Dimension Attributes are used in report labels, and query constraints such as where Country=’USA’. The dimension attributes also contain one or more hierarchical relationships. Before designing your data warehouse, you need to decide what this data warehouse contains. Say if you want to build a data warehouse containing monthly sales numbers across multiple store locations, across time and across products then your dimensions are:

  1. Location
  2. Time
  3. Product

Each dimension table contains data for one dimension. In the above example you get all your store location information and put that into one single table called Location. Your store location data may be spanning across multiple tables in your OLTP system (unlike OLAP). You need to de-normalize all that data into one single dimension table.

Copyright 2005-2016 KnowledgeHills. Privacy Policy. Contact .