Percentage Sampling transformation is similar to the TOP keyword in Sql Server. Just like TOP in SQL, Percentage Sampling Limits the records that are gonna flow through pipeline by the given integer as Percentage. Let’s say I have 1000 records in my source table and if I connect the same source to PERCENTAGE SAMPLING transformation by providing Limit records values to 10 then 10% of total records will be flowed from the transformation. Let us see an example of the same.
- Open a new project and drag a Data Flow task from toolbox in Control Flow.
- Edit the Data Flow task by double clicking the object or by selecting EDIT button on Right click on the object.
- Make sure the Data Flow Page is opened as shown below.
- Select OLE DB data source from data flow sources and drag and drop it in the data flow.
- Double click on the OLE DB data source to open a new window where we can set the properties of the connection.
- Select the connection manager and click on new button to set the connection string as shown below.
- Set the connection to the database by providing the Server name,database name and authentication details if required.
- After the connection is set, select Data Access mode as “Table or View” as shown below and then select the table which we are gonna use as input to PERCENTAGE SAMPLING Transformation.
- Now select the columns that needs to be present as part of source by going to Columns Page in OLE DB Data Source as shown below.
- Now drag and drop Percentage Sampling transformation and connect OLE DB source output as input to this transformation as shown below.
- Now edit the Percentage sampling transformation and select PERCENTAGE of rows out of total records in the Source table you wants to use as sample by mentioning it “Percentage of rows”.
- Give some meaningful names to Sample Output and Unselected output and use “Use the following random seed” option to get Random values from the source rather than getting TOP records.
- These are all the properties we can set for Percentage Sampling transformation. Now lets create couple of destinations to store Sampled output and not sampled output. I have taken OLE DB destination to push Sampled output and Flat File destination to push non sampled output.
- Now drag the output of Percentage Sampling transformation to give source to OLE DB destination and it will prompt us to select the INPUT (we have two, one sampled and another one not sampled) and select Sampled output as shown below.
- Select the Non Sampled output to Flat file destination and set the connection settings for both OLE DB and Flat File destinations. (You can see configuring destinations in the post here)
- Now the package is ready to execute and do the same. Make sure all the items turn GREEN.
- You can observe the records from source got grouped into two different pipelines based on the percentage we have given.
This is it !! This is one of the simplest transformation(to configure) available in SSIS and useful when ever you wish to limit the records flowing to destination.
Happy Coding !!
Roopesh Babu V