In last post we discussed about “Implementing Type 1 SCD in SSIS using Slowly Changing Dimension Transformation” and u can find the same here. Let us discuss about “How to define Type 2 SCD in SSIS using Slowly Changing Dimension Transformation” in this post. As most of us know that there are many types of SCDS available, here in this post we will cover only SCD Type 2. Before jumping into the demonstration, first let us know what this SCD Type 2 says –
In Type 2 SCD, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.
Now, when the Country is changed to the Employee “Roopesh” from India to USA then the data will be stored in the following way –
The null End_Date in row two indicates the current tuple version. In some cases, a standardized surrogate high date (e.g. 9999-12-31) may be used as an end date, so that the field can be included in an index, and so that null-value substitution is not required when querying. So, in this case we can say the records with NULL as End_Date is the current record for an employee.
The major disadvantage of using this is the table SIZE grows faster as records will get added whenever there is change in data to maintain history and the advantage of using this Type is that we can track historical information accurately. I hope you got some useful info regarding SCD type 2 and now let’s jump into SCD Transformation.
I have created two tables one Test_Stage(Which I am using as Staging table) and another one is Test(Which I am using as Main table) with the following syntax. You can clearly see that Main table is having TWO Extra columns to track history.
Create table Test_Stage
Create table Test
Now I inserted some records into my staging table i.e Test_Stage and PFB the scripts used –
Insert into test_stage Values(1,’Roopesh’,’India’);
Insert into test_stage Values(2,’Lokesh’,’Pakistan’);
Insert into test_stage Values(3,’Vinay’,’USA’);
Insert into test_stage Values(4,’Rakesh’,’China’);
Insert into test_stage Values(5,’Venki’,’Japan’);
Now Source data is ready and PFB the steps you have to follow to use Slowly Changing Dimension Transformation to implement Type 2 SCD.
- Open SSIS Package and drag a dataFlow Task from toolbox to control Flow Pane as shown below.
- Either double click or Right click on Data Flow Task and select EDIT as shown below.
- One the Data Flow pane is opened then drag and drop OLEDB SOURCE from “Data Flow Sources” as shown below.
- Now select EDIT by Right clicking on OLE DB Source and provide the Source details. remember that you have to provide Staging table connection details here.
- Select columns tab and check whether columns are coming correctly or not.
- Now drag and drop Slowly Changing Dimension Transformation in to the Data Flow Pane and provide the connection between OLE DB Source and SCD Transformation.
- Now right click on the SCD Transformation and click on EDIT menu.
- Select the dimension table(in our case it is “Test”) and Keys under “Select Dimension Table and Keys” page as shown below.
- In the above screenshot, the Business Key is the column based on which we will filter the Newly added records from the existing one. In simple this is the column which we use to look up.
- After selecting Next, Under “Slowly Changing Dimension Columns” select the columns that MAY change and select Change Type as “Changing Attribute” for those which we DOESN’T required to maintain HISTORY and select “Historical Attribute” for those whose HISTORY needs to be tracked. This is very important part of SCD and in other words this is the ONLY change we have when compared to SCD Type 1 and SCD Type 2 implementation in SSIS. When we select “Changing Attribute” for any attribute then it WON’T CREATE a new record when there is a change in this value and if you select “Historical Attribute” then if there is any change in this attribute value then a NEW RECORD will be created to track history.
- In the above screenshot we can see that for Country attribute I selected “Historical Attribute” as I wish to track the historical changes for that attribute and hence when ever there is a change in that particular attribute column value then only it tracks history by creating a new record and for other attribute Emp_Name, it will directly apply UPDATE on the table.
- Check the option “Change all the matching records including the outdated records…….” if you wish to have that functionality and click Next.
- Under “Historical Attributes Option” page select “Use Start and End date to identify current and expired records” option and by selecting start date column and End date column click on NEXT.
- Click Next and finally click on Finish by leaving the default options as is.
- Once you click on Finish, SSIS server creates all the functionality that is required to implement SCD Type 1 based on the information provided by us. It will add Data flow tasks for Inserting new records and updating existing records as shown below.
- Now if I run the package then it will check for newly added records and inserts those records and updates the already existing records if there is any change in data. As there is NO RECORDS in TEST table in this run it will load all the records under New Insert as shown below.
- PFB the Screenshot of the data tables output after running the package. You can see data got loaded in to our table “TEST” with Start_Date as current Date.
- Now I updated the existing records in staging table using the following queries and once I run the package again we can see data getting updated into our main table.
- Once I run the package you can see the package showing “1 rows” under “Historical Attributes Inserts Output” section as shown below.
- Now if you run the queries you can see the data updated in TEST table.
Roopesh Babu V