Click here to Skip to main content
15,887,175 members
Articles / Programming Languages / SQL

Learn Datawarehouse Concepts

Rate me:
Please Sign up or sign in to vote.
4.57/5 (7 votes)
23 Jul 2014CPOL6 min read 25.5K   303   12   6
This article describes concepts and terminologies used in Datawarehouse. This will help beginners to understand datawarehouse concepts. This will also help developers who wish to design and develop SSAS cube, design data model of datawarehouse system.

Image 1

Introduction

Online transaction processing (OLTP) database contains huge transaction records. Sometimes, it has millions and billions of rows. These data are stored in relational database. When business requires analysis of these transactions, it takes huge amount of time to process these data. Business user wants to analyse these data in small time period. They don't want to have complexity in schema too. Sometimes, business user wants to analyse data from other source too. These other source data could be from different business sources. So database designer comes with a solution to implement Datawarehouse system for business data. This system is separate from OLTP database. So, analysis process becomes easier than before.

Background

Datawarehouse system consist of Fact table, Dimension table and Auxiliary table. Datawarehouse system can have multiple OLTP system as data source. It can be CRM system, ERP system, CMS system or Flat file data. Tables in Datawarehouse system are organised in specific schema. This schema can be either star schema or snowflake schema. We are going to cover each topic of datawarehouse system in the same article.

Using the Code

Image 2

SQL
CREATE TABLE SalesRegion (
                SalesRegionId INTEGER NOT NULL,
                RegionName VARCHAR(50) NOT NULL,
                CONSTRAINT SalesRegion_pk PRIMARY KEY (SalesRegionId)
);

CREATE TABLE DimSalesMan (
                SalesRegionId INTEGER NOT NULL,
                SalesManId INTEGER NOT NULL,
                SallesmanName VARCHAR(50) NOT NULL,
                CONSTRAINT DimSalesMan_pk PRIMARY KEY (SalesRegionId, SalesManId)
);

CREATE TABLE DimDate (
                DateId INTEGER NOT NULL,
                DateValue DATE NOT NULL,
                MonthValue INTEGER NOT NULL,
                YearValue INTEGER NOT NULL,
                DateQuarter INTEGER NOT NULL,
                DateHalfYear INTEGER NOT NULL,
                CONSTRAINT DimDate_pk PRIMARY KEY (DateId)
);

CREATE TABLE Currency (
                CurrencyId INTEGER NOT NULL,
                CurrencyName VARCHAR(50) NOT NULL,
                CurrencyFxRate DECIMAL(18) NOT NULL,
                CONSTRAINT Currency_pk PRIMARY KEY (CurrencyId)
);

CREATE TABLE DimProduct (
                ProductId INTEGER NOT NULL,
                CurrencyID INTEGER NOT NULL,
                ProductName VARCHAR(50) NOT NULL,
                ProductCost NUMERIC(18,2) NOT NULL,
                CONSTRAINT DimProduct_pk PRIMARY KEY (ProductId, CurrencyID)
);

CREATE TABLE FactSales (
                SalesRegionId INTEGER NOT NULL,
                SalesManId INTEGER NOT NULL,
                ProductId INTEGER NOT NULL,
                CurrencyID INTEGER NOT NULL,
                DateId INTEGER NOT NULL,
                SalesId INTEGER NOT NULL,
                SalesDescription VARCHAR(100) NOT NULL,
                CONSTRAINT FactSales_pk PRIMARY KEY _
                (SalesRegionId, SalesManId, ProductId, CurrencyID, DateId)
);

ALTER TABLE DimSalesMan ADD CONSTRAINT SalesRegion_DimSalesMan_fk
FOREIGN KEY (SalesRegionId)
REFERENCES SalesRegion (SalesRegionId)

ALTER TABLE FactSales ADD CONSTRAINT FactSales_DimSalesMan_fk
FOREIGN KEY (SalesRegionId, SalesManId)
REFERENCES DimSalesMan (SalesRegionId, SalesManId)

ALTER TABLE FactSales ADD CONSTRAINT FactSales_DimDate_fk
FOREIGN KEY (DateId)
REFERENCES DimDate (DateId)

ALTER TABLE DimProduct ADD CONSTRAINT Currency_DimProduct_fk
FOREIGN KEY (CurrencyID)
REFERENCES Currency (CurrencyId)

ALTER TABLE FactSales ADD CONSTRAINT FactSales_DimProduct_fk
FOREIGN KEY (ProductId, CurrencyID)
REFERENCES DimProduct (ProductId, CurrencyID)    

Fact Table

Table which contains fact information is known as fact table. This information can be measured in analysis such as Total Sales Amount, Total Order Count, Total Product Sales, etc. are different measures of business. This measure information is stored in Fact table. Normally, Fact table contains transaction of business. e.g. FactSales table is fact table. It contains business transaction.

Dimension Table

Table which contains dimension for analysis of data is known as Dimension table. It can be Date Dimension, Product Dimension, Region Dimension, Currency Dimension, etc. Analysis of Fact table are process by Dimension table. One can analyse business transaction on the basis of period wise, region wise, product wise. Data in dimension table are normally fixed nature. This data normally does not modify or modify in some condition when updated set is required in business. It may be modified in a month or in a year, e.g. DimDate, DimProduct, DimRegion are dimension tables.

Attribute of Dimension Table

Attribute of dimension table could be product name, product cost, product color, product category, product weight, etc.

Natural Key

The default key assigned to attribute is known as Natural key, e.g. ProductCode is natural key for any product.

Surrogate Key

Numeric value used as key which is used to join dimension table with fact table is known as surrogate key, e.g., ProductId is surrogate key for product dimension table. It joins with FactSales fact table to get product sales information.

Database Schema

Datawarehouse is a relational database where Fact and Dimension tables are related to each other. Before defining datawarehouse sytem, its structure must be defined. All tables in datawarehouse are structured in specific schema. It can be Star schema or Snowflake schema.

Star Schema

Database schema in which dimension tables are directly connected with fact table is known as Star Schema. It is the most popular schema used in designing datawarehouse system. It is easy to implement and relate dimension tables with fact tables.

Image 3

Snowflake Schema

Database schema in which dimension tables are connected with fact table but dimension table is itself related with another dimension table to get data is known as Snowflake Schema. Snowflake schema makes datawarehouse designing complex. If possible, we should avoid snowflake schema using view where multiple dimension tables are joined together and used as simple dimension table.

Image 4

Junk Dimension

Some values in data warehouse system are much less in number of rows. We cannot include them as dimension tables and also cannot avoid in system such as Global setting value for datawarehouse system. We can put all require global setting in separate setting table but it is not considered as dimension table. It is termed as Junk Dimension.

Degenerate Dimension

Some tables in datawarehouse system contains attribute of fact table. These attributes are used in analysis of business such as Transaction number in fact table is maintained as degenerate dimension.

Slowly Changing Dimension

It is process managing changes of data in dimension table. When data in dimension table is modified, it is required to maintain change history of data. It gives an accurate picture to business when analysis depends on different duration of time, e.g., Global business sales depends on currency rate. Each country has its different currency and the exchange rate gets updated from time to time. If some sales occurred in Jan 2014 with XXX currency exchange rate and it is modified in July 2014, the correct analysis is only possible when we can evaluate sales value exactly based on that duration exchange rate. If we calculate sales amount on current exchange rate value, then sales figure will not be accurate.

SCD type1

In SCD type1, dimension table overwrites the existing value of dimension attribute with new value. So, no change history is maintained for any changed attribute value. The dimension table will contain only the last updated value, e.g. in dimcustomer table, the customer address can be maintained as SCD type1 so that it will hold only the current address of customer.

SCD type2

In SCD type2, complete track of change history is maintained. So for every change in attribute value, insert new record in dimension table. History is tracked through date range given with record, e.g. DimCurrencyFxRate should be of type SCD type2 because currency exchange rate is valid from specific date to certain date range only.

SCD type3

In SCD type3, only last few change history is maintained. The number of changes tracking is based on business requirement. So such dimension table contains multiple columns for the same attribute, e.g., DimProduct table can contain three column names for Productname naming Productname1, Productname2, Productname3. This will help when business team asks to change the name of any product. Productname3 will contain the latest change name, so on Productname2 and Productname1 can contain previous 2 names of product.

Bridge Table or Factless Fact Table

It is an intermediate table which allows to join dimension table with another dimension table. It does not contain transaction detail, rather it contains key attribute with other columns.

Snapshot Fact Table

Table which contains aggregate of transaction of Fact table is known as snapshot fact table. When business transaction is huge, entry in fact table will be huge. This will make analysis process time consuming so that fact table transactions are aggregated and stored in separate snapshot table.

Transaction Fact Table

Table which contains complete details of transaction is known as transaction fact table. Attribute in transaction fact table contains each details of transaction, e.g., transaction number, productid, transaction date, transaction user, etc.

Points of Interest

This article contains information on terminologies used in datawarehouse. This will help user to understand concepts before designing new datawarehouse system.

History

  • 22nd July, 2014: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Database Developer Microsoft
India India
I am Ajit Kumar Thakur. I am Consultant in Microsoft Global Delivery India. I work on Database (SQL Server), BI (SSAS, SSRS,SSIS),and Windows Powershell. I have 8 years of experience in Database and BI project development.

Comments and Discussions

 
GeneralMy vote of 3 Pin
IdeasPravinh24-Jul-14 2:53
IdeasPravinh24-Jul-14 2:53 
QuestionGood overview Pin
Rarelyamson23-Jul-14 23:41
Rarelyamson23-Jul-14 23:41 
QuestionWhat is the USE of OLAP in datawharehouse Pin
Member 1096784323-Jul-14 21:10
Member 1096784323-Jul-14 21:10 
AnswerRe: What is the USE of OLAP in datawharehouse Pin
Ajit Kumar Thakur Microsoft23-Jul-14 22:51
professionalAjit Kumar Thakur Microsoft23-Jul-14 22:51 
GeneralNice article Pin
Abhishek Singh223-Jul-14 3:30
Abhishek Singh223-Jul-14 3:30 
GeneralRe: Nice article Pin
Ajit Kumar Thakur Microsoft23-Jul-14 6:46
professionalAjit Kumar Thakur Microsoft23-Jul-14 6:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.