为 RDI 准备 SQL Server
准备 SQL Server 数据库以使用 RDI
要为 Debezium 准备 SQL Server 数据库,您必须首先创建一个专用的 Debezium 用户 运行脚本以全局启用 CDC,然后为要的每个表单独启用 CDC 捕获。您需要管理员权限才能执行此作。
启用 CDC 后,它会捕获所有 INSERT、UPDATE 和 DELETE作 在您选择的牌桌上。然后,Debezium 连接器可以将这些事件发送到 RDI。
1. 创建 Debezium 用户
强烈建议为 RDI 之间的连接创建一个专用的 Debezium 用户 和源数据库。使用现有用户时,请确保所需的 授予权限,并将用户添加到 CDC 角色。
-
使用以下 Transact-SQL 创建用户:
USE master GO CREATE LOGIN MyUser WITH PASSWORD = 'My_Password' GO USE MyDB GO CREATE USER MyUser FOR LOGIN MyUser GO
Replace
MyUser
,My_Password
andMyDB
with your chosen values. -
Grant required permissions:
USE master GO GRANT VIEW SERVER STATE TO MyUser GO USE MyDB GO EXEC sp_addrolemember N'db_datareader', N'MyUser' GO
2. Enable CDC on the database
There are two system stored procedures to enable CDC (you need
administrator privileges to run these). Use sys.sp_cdc_enable_db
to enable CDC for the whole database and then sys.sp_cdc_enable_table
to enable CDC for individual tables.
Before running the procedures, ensure that:
- You are a member of the
sysadmin
fixed server role for the SQL Server.
- You are a
db_owner
of the database.
- The SQL Server Agent is running.
Then, assuming your database is called MyDB
, run the script below to enable CDC:
USE MyDB
GO
EXEC sys.sp_cdc_enable_db
GO
Note:
For SQL Server on AWS RDS, you must use a different stored procedure:
EXEC msdb.dbo.rds_cdc_enable_db 'Chinook'
GO
When you enable CDC for the database, it creates a schema called cdc
and also
a CDC user, metadata tables, and other system objects.
3. Enable CDC for the tables you want to capture
-
You must also enable CDC on the tables you want Debezium to capture using the
following commands (again, you need administrator privileges for this):
USE MyDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name = N'MyRole',
@supports_net_changes = 0
GO
Repeat this for every table you want to capture.
Note:
The value for @role_name
can’t be a fixed database role, such as db_datareader
.
Specifying a new name will create a corresponding database role that has full access to the
captured change data.
-
Add the Debezium user to the CDC role:
USE MyDB
GO
EXEC sp_addrolemember N'MyRole', N'MyUser'
GO
4. Check that you have access to the CDC table
You can use another stored procedure sys.sp_cdc_help_change_data_capture
to query the CDC information for the database and check you have enabled
it correctly. To do this, connect as the Debezium user you created previously (MyUser
).
-
Run the sys.sp_cdc_help_change_data_capture
stored procedure to query
the CDC configuration. For example, if your database was called MyDB
then you would
run the following:
USE MyDB;
GO
EXEC sys.sp_cdc_help_change_data_capture
GO
-
The query returns configuration information for each table in the database that
has CDC enabled and that contains change data that you are authorized to
access. If the result is empty then you should check that you have privileges
to access both the capture instance and the CDC tables.
Troubleshooting
If no CDC is happening then it might mean that SQL Server Agent is down. You can check for this using the SQL query shown below:
IF EXISTS (SELECT 1
FROM master.dbo.sysprocesses
WHERE program_name = N'SQLAgent - Generic Refresher')
BEGIN
SELECT @@SERVERNAME AS 'InstanceName', 1 AS 'SQLServerAgentRunning'
END
ELSE
BEGIN
SELECT @@SERVERNAME AS 'InstanceName', 0 AS 'SQLServerAgentRunning'
END
If the query returns a result of 0, you need to need to start SQL Server Agent using the following commands:
EXEC xp_servicecontrol N'START',N'SQLServerAGENT';
GO
SQL Server capture job agent configuration parameters
In SQL Server, the parameters that control the behavior of the capture job agent
are defined in the SQL Server table msdb.dbo.cdc_jobs
. If you experience performance
problems while running the capture job agent then you can adjust the capture jobs
settings to reduce CPU load. To do this, run the sys.sp_cdc_change_job
stored procedure
with your new parameter values.
Note:
A full guide to configuring the SQL Server capture job agent parameters
is outside the scope of the Redis documentation.
The following parameters are the most important ones for modifying the capture agent behavior
of the Debezium SQL Server connector:
pollinginterval
: This specifies the number of seconds that the capture agent
waits between log scan cycles. A higher value reduces the load on the database
host, but increases latency. A value of 0 specifies no wait between scans.
The default value is 5.
maxtrans
: This specifies the maximum number of transactions to process during
each log scan cycle. After the capture job processes the specified number of
transactions, it pauses for the length of time that pollinginterval
specifies
before the next scan begins. A lower value reduces the load on the database host,
but increases latency. The default value is 500.
maxscans
: This specifies a limit on the number of scan cycles that the capture
job can attempt when capturing the full contents of the database transaction log.
If the continuous parameter is set to 1, the job pauses for the length of time
that the pollinginterval
specifies before it resumes scanning. A lower values
reduces the load on the database host, but increases latency. The default value is 10.
See the SQL Server documentation for more information about capture agent parameters.
SQL Server on Azure
You can also use the Debezium SQL Server connector with SQL Server on Azure.
See Microsoft's guide to
configuring SQL Server on Azure for CDC with Debezium
for more information.
Handling changes to the schema
RDI can't adapt automatically when you change the schema of a CDC table in SQL Server. For example,
if you add a new column to a table you are capturing then RDI will generate errors
instead of capturing the changes correctly. See Debezium's
SQL Server schema evolution
docs for more information.
If you have administrator privileges, you can follow the steps below to update RDI after
a schema change and resume CDC. See the
online schema updates
documentation for further details.
-
Make your changes to the source table schema.
-
Create a new capture table for the updated source table by running the sys.sp_cdc_enable_table
stored
procedure with a new, unique value for the parameter @capture_instance
. For example, if the old value
was dbo_MyTable
, you could replace it with dbo_MyTable_v2
(you can see the existing values by running
stored procedure sys.sp_cdc_help_change_data_capture
):
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@role_name = N'MyRole',
@capture_instance = N'dbo_MyTable_v2',
@supports_net_changes = 0
GO
-
When Debezium starts streaming from the new capture table, drop the old capture table by running
the sys.sp_cdc_disable_table
stored procedure with the parameter @capture_instance
set to the old
capture instance name, dbo_MyTable
:
EXEC sys.sp_cdc_disable_table
@source_schema = N'dbo',
@source_name = N'MyTable',
@capture_instance = N'dbo_MyTable'
GO
Note:
RDI will not correctly capture changes that happen in the time gap between changing
the source schema (step 1 above) and updating the value of @capture_instance
(step 2).
Try to keep the gap as short as possible or perform the update at a time when you expect
few changes to the data.
On this page