Change Data Capture in SQL Server by Srikanth Manda

change data


Before Microsoft introduced Change Data Capture in SQL Server, developers used to create custom solutions using DML Trigger and additional tables (Audit Tables) to track the data which we have modified. DML Triggers are very expensive and executed as part of our transaction which will degrade the performance of the project or server. By creating DML Triggers, we will be able to track the changes in the data. To track the changes, we need to create additional tables with similar columns to store the changes.

1) Takes time in developing/creating DML triggers and additional tables.
2) Performance hit.
3) Very complex process.

We need to know what records are being inserted, updated and deleted in one or more SQL Server tables? Microsoft has come up with new feature called Change Data Capture. We will focus on how to implement change data capture and how to review the captured information to produce an audit trail of the changes to a database table.
When you enable Change Data Capture on the database table, a shadow of the tracked table is created with same column structure of existing table, with few additional columns to summarize the nature of the change in the database table row.
Once you enable change data capture, a process is automatically generated and scheduled to collect and manage the information. By default change data capture information is only kept for 3 days.
Enabling Change Data Capture on a Database
Change Data Capture is table level feature. It has to be enabled on the each table to track the changes. Before, enabling on the table need enable the Change Data Capture on the Database.
To Check whether Change Data Capture is enabled on the Database, run the below script.

select name,database_id,is_cdc_enabled from sys.databases

You can run this script to enable CDC at database level. (The following script will enable CDC in ChangeDataCapture database. )
USE ChangeDataCapture
EXEC sys.sp_cdc_enable_db

Check whether CDC is enabled on the “ChangeDataCapture” Database

Once CDC is enabled on the Database. Some of the system tables will get created in the database as part of cdc Schema.

The table which have been created are listed here.
• cdc.captured_columns – This table returns result for list of captured column.
• cdc.change_tables – This table returns list of all the tables which are enabled for capture.
• cdc.ddl_history – This table contains history of all the DDL changes since capture data enabled.
• cdc.index_columns – This table contains indexes associated with change table.
• cdc.lsn_time_mapping – This table maps LSN number and time.
Additionally, in the ChangeDataCapture Database. You will see the schema CDC get created.

Creating a table:
USE ChangeDataCapture

Create Table dbo.Employee
EmpId BigInt Primary Key,
EmpName Varchar(50),
EmpSal Decimal(18,2),
EmpDeptNo Int

use ChangeDataCaputre
insert into dbo.employee values(1,’sreekanth’,1000,10)
insert into dbo.employee values(2,’sagar’,2000,20)
insert into dbo.employee values(3,’bala’,3000,30)
insert into dbo.employee values(4,’rama’,4000,10)
insert into dbo.employee values(5,’sudhakar’,5000,20)
insert into dbo.employee values(6,’ramana’,6000,30)
insert into dbo.employee values(7,’ravi’,7000,10)
insert into dbo.employee values(8,’satyadev’,8000,20)
insert into dbo.employee values(9,’venkat’,9000,30)
insert into dbo.employee values(10,’prashanth’,10000,10)

USE ChangeDataCapture
select * from dbo.Employee

Enabling Change Data Capture on one or more Database Tables:
The CDC feature can be enabled for table-level, once the CDC is enabled for database. It has to be enabled for any table which needs to be tracked. First run following query to show which tables of database have already been enabled for CDC.
Check Whether CDC is enabled on the Employee Table

USE ChangeDataCapture
Select name,object_id,is_tracked_by_cdc from Sys.tables

From the above image, we can know that CDC is not enabled on the table.
To Enable CDC on the Table
You can run the following stored procedure to enable each table. Before enabling CDC at the table level, make sure SQL Server Agent Jobs is in running mode. When CDC is enabled on a table, it creates two CDC-related jobs that are specific to the database, and executed using SQL Server Agent. Without SQL Server Agent enabled, these jobs will not execute.
• Additionally, it is very important to understand the role of the required parameter @role_name. @role_name is a database role which will be used to determine whether a user can access the CDC data; the role will be created if it doesn’t exist. You can add users to this role as required; you only need to add users that aren’t already members of the db_owner fixed database role.
Run the below script to enable CDC on the table dbo.Employee.
USE ChangeDataCapture
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’Employee’,
@role_name = NULL

In the Current Context, When we are enabling CDC on the table. System is throwing error stating
SQL Server Agent is not currently running.

First, we need to start the SQL Server Agent. Then we need to enable the CDC on the table.

Run the fallowing script to enable CDC on the table dbo.Employee.
USE ChangeDataCapture
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’Employee’,
@role_name = NULL

The sys.sp_cdc_enable_table system stored procedure has parameters. Let’s describe each one (only the first three parameters are required; the rest are optional and only the ones used are shown above):
• @source_schema is the schema name of the table that you want to enable for CDC
• @source_name is the table name that you want to enable for CDC
• @role_name is a database role which will be used to determine whether a user can access the CDC data; the role will be created if it doesn’t exist. You can add users to this role as required; you only need to add users that aren’t already members of the db_owner fixed database role.
• @supports_net_changes determines whether you can summarize multiple changes into a single change record; set to 1 to allow, 0 otherwise.
• @capture_instance is a name that you assign to this particular CDC instance; you can have up two instances for a given table.
• @index_name is the name of a unique index to use to identify rows in the source table; you can specify NULL if the source table has a primary key.
• @captured_column_list is a comma-separated list of column names that you want to enable for CDC; you can specify NULL to enable all columns.
• @filegroup_name allows you to specify the FILEGROUP to be used to store the CDC change tables.
• @partition_switch allows you to specify whether the ALTER TABLE SWITCH PARTITION command is allowed; i.e. allowing you to enable partitioning (TRUE or FALSE).

Once we enable Change Data Capture on the table, it creates the SQL Server Agent Jobs with following names.
1. cdc. ChangeDataCapture _capture – When this job is executed it runs the system stored procedure sys.sp_MScdc_capture_job. The procedure sys.sp_cdc_scan is called internally by sys.sp_MScdc_capture_job. This procedure cannot be executed explicitly when a change data capture log scan operation is already active or when the database is enabled for transactional replication. This system SP enables SQL Server Agent, which in facts enable Change Data Capture feature.
2. cdc. ChangeDataCapture _cleanup – When this job is executed it runs the system stored procedure sys.sp_MScdc_cleanup_job. This system SP cleans up database changes tables.

When everything is successfully completed, check the system tables again and you will find a new table called cdc. dbo_Employee_CT . This table will contain all the changes in the table dbo.Employee. If you expand this table i.e; cdc. dbo_Employee_CT , you will find five additional columns as well.
As you will see there are five additional columns to the mirrored original table
• __$start_lsn
• __$end_lsn
• __$seqval
• __$operation
• __$update_mask
There are two values which are very important to us is __$operation and __$update_mask.
Column _$operation contains value which corresponds to DML Operations. Following is quick list of value and its corresponding meaning.
• _$operation = 1 i.e; Delete
• _$operation = 2 i.e; Insert
• _$operation = 3 i.e; Values before Update
• _$operation = 4 i.e; Values after Update
The column _$update_mask shows, via a bitmap, which columns were updated in the DML operation that was specified by _$operation. If this was a DELETE or INSERT operation, all columns are updated and so the mask contains value which has all 1’s in it. This mask is contains value which is formed with Bit values.
Example of Change Data Capture
We will test this feature by doing DML operations such as INSERT, UPDATE and DELETE on the table dbo.Employee which we have set up for CDC. We will observe the effects on the CDC table cdc.dbo_Employee_CT.
Before we start let’s first SELECT from both tables and see what is in them.
USE ChangeDataCapture
select * from [dbo].[Employee]

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]

Insert Statement:
Let us execute Insert Operation on the dbo.Employee Table
USE ChangeDataCapture

insert into [dbo].[Employee] values (11,’Krishnaveni’,11000,20)
insert into [dbo].[Employee] values (12,’Mahathi’,12000,30)
insert into [dbo].[Employee] values (13,’Suma’,13000,10)
insert into [dbo].[Employee] values (14,’Jabeen’,14000,20)
insert into [dbo].[Employee] values (15,’Ambily’,15000,30)

Once the Insert Script is executed, let us query both the tables
USE ChangeDataCapture
select * from [dbo].[Employee]

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]

Because of the INSERT operation, we have a newly inserted five rows in the tracked table dbo.Employee. The tracking table also has the same row visible. The value of _operation is 2 which means that this is an INSERT operation.

Update Statement:
In the Update Operation, we will update a newly inserted row.
USE ChangeDataCapture

Update dbo.Employee
EmpName = ‘Sumala Yeluri’
EmpId = 13

After executing the above script, let us query content of both the tables
USE ChangeDataCapture
select * from [dbo].[Employee]

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]

On execution of UPDATE script result in two different entries in the cdc.dbo_Employee_CT tracking table. One entry contains the previous values before the UPDATE is executed. The second entry is for new data after the UPDATE is executed. The Change Data Capture mechanism always captures all the columns of the table unless, it is restricted to track only a few columns.
Delete Statement:
In this Delete Operation Scenario, we will run a DELETE operation on a newly inserted row.
USE ChangeDataCapture

Delete from
EmpId = 15

Once again, let us check the content of both the tables
USE ChangeDataCapture
select * from [dbo].[Employee]

USE ChangeDataCapture
select * from [cdc].[dbo_Employee_CT]

Due to the DELETE operation, one row got deleted from table dbo.Employee. We can see the deleted row visible in the tracking table cdc.dbo_Employee_CT as new record. The value of _operation is 4 , meaning that this is a delete operation.

Disabling CDC on a table:
In order to enable CDC, we have to do this in two steps – at table level and at database level. Similarly, if we want to disable , we can do it in two levels.
Let’s see one after other.
In order to disable Change Data Capture on any table we need three values the Source Schema, the Source Table name, and the Capture Instance. In our case, the schema is dbo and table name is Employee, however we don’t know the Capture Instance. To Know Capture Instance, run the following script.
USE ChangeDataCapture;
EXEC sys.sp_cdc_help_change_data_capture
this will return a result which contains all the three required information for disabling CDC ona table.

This System Procedure sys.sp_cdc_help_change_data_capture provides lots of other useful information as well. Once we have name of the capture instance, we can disable tracking of the table by running this T-SQL query.

USE ChangeDataCapture;
EXECUTE sys.sp_cdc_disable_table
@source_schema = N’dbo’,
@source_name = N’Employee’,
@capture_instance = N’dbo_Employee’;

Once Change Data Capture is disabled on any table, it drops the change data capture table, functions and associated data from all the system tables.
From the above Screenshot , we can see that system capture table cdc.dbo_Employee_CT is dropped.

Disable CDC on Database:
Run following script to disable CDC on whole database.
USE ChangeDataCapture
EXEC sys.sp_cdc_disable_db

Above Stored Procedure will delete all the data, system related functions and tables related to CDC. If there is any need of this data for any other purpose, you must take a backup before disabling CDC on any database.

Automatic Cleaning Process:
As we know if we keep track of data in the database, there would be huge amount of growth in hard drive on the server. This would lead to maintenance issues and input or output buffer issues..
In CDC, there is an automatic mechanism to CleanUp the process that runs at regular intervals or schedules. By default, it is configured for 3 days. We can also enable CDC on the database, System Procedure with sys.sp_cdc_cleanup_change_table which takes care of cleaning up the tracked data at the regular interval.

Hope this helps !!

Best Regards,
Srikanth Manda


