Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

(Module 5) Lab Taf

This document outlines a lab scenario for modifying SSIS packages to incrementally extract data from various sources, focusing on using datetime columns and Change Data Capture (CDC) methods. It includes detailed tasks for setting up the environment, defining variables, and implementing data extraction techniques to optimize ETL processes. The lab also covers creating stored procedures to retrieve modified rows and testing the package to ensure only updated records are processed.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

(Module 5) Lab Taf

This document outlines a lab scenario for modifying SSIS packages to incrementally extract data from various sources, focusing on using datetime columns and Change Data Capture (CDC) methods. It includes detailed tasks for setting up the environment, defining variables, and implementing data extraction techniques to optimize ETL processes. The lab also covers creating stored procedures to retrieve modified rows and testing the package to ensure only updated records are processed.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module 05 : Lab

Scenario
You have developed SSIS packages that extract data from various data sources and load it into a staging
database. However, the current solution extracts all source records each time the ETL process is run. This
results in unnecessary processing of records that have already been extracted and consumes a sizeable
amount of network bandwidth to transfer a large volume of data. To resolve this problem, you must modify
the SSIS packages to extract only data that has been added or modified since the previous extraction.

Objectives
After completing this lab, you will be able to:
• Use a datetime column to extract modified rows.
• Use Change Data Capture to extract modified rows.
• Use the CDC Control Task to extract modified rows.
• Use Change Tracking to extract modified rows.

Exercise 1: Using a Datetime Column to Incrementally Extract Data

Task 1: Prepare the Lab Environment


1. Start SQL Server Management Studio and connect to the (local) instance of the SQL Server database
engine by using Windows authentication.
2. Restore files all files in the « LABS-Atelier SID\Lab05\BackupFiles » folder. Indication : Refer to
«Préparation Environnement Labxx » file in the « LABS-Atelier SID » folder for help.
3. Run the « SetupAfterRestore.sql » file.

Task 2: View Extraction Data


1. Start SQL Server Management Studio and connect to the (local) instance of the database engine by
using Windows authentication.
2. In Object Explorer, expand Databases, Staging, and Tables.
3. Right-click dbo.ExtractLog and click Select Top 1000 Rows.
4. View the data in the ExtractLog table, noting the values in the LastExtract column, which indicate
the date and time of the last extract operations for the InternetSales and ResellerSales databases.
This is initially set to January 1st 1900.
5. In Object Explorer, under Databases, expand InternetSales and Tables.
6. Right-click dbo.SalesOrderHeader and click Select Top 1000 Rows. Then note that the
LastModified column indicates the date and time that the sales record was last modified.
7. Minimize SQL Server Management Studio.

Task 3: Examine an Incremental Data Extraction


1. Start Visual Studio and open the AdventureWorksETL.sln solution in the « LABS-Atelier
SID\Lab05\Ex1 » folder.
2. In Solution Explorer, under SSIS Packages, double-click Extract Reseller Data.dtsx.
3. On the SSIS menu, click Variables and then in the Variables pane, note that the following variables
have been defined with a data type of DateTime:
• CurrentTime
• ResellerSalesLastExtract
4. On the Control Flow tab of the design surface, double-click the Get Current Time task, and note
that it uses the GETDATE() function to assign the current data and time to the CurrentTime variable.
Then click Cancel to close the Expression Builder dialog box.
5. Double-click the Get Last Extract Time task to open the Execute SQL Task Editor dialog box, and
note the following configuration settings. Then click Cancel:
• On the General tab, the task is configured to return a single row from the Staging database by
executing the following Transact-SQL statement:
SELECT MAX(LastExtract) LastExtract
FROM ExtractLog
WHERE DataSource = 'ResellerSales'
• On the Result Set tab, the LastExtract column in the result returned by the query is assigned to
the User::ResellerSalesLastExtract variable.
6. Double-click the Extract Reseller Sales task to view its data flow, and then on the Data Flow tab,
double-click the Reseller Sales source to view its settings.
7. On the OLE DB Source Editor dialog box, note that the SQL command text used to retrieve the
reseller sales data includes the following parameterized WHERE clause :
WHERE LastModified > ?
AND LastModified <= ?
8. Click the Parameters button, and note that the parameters in the WHERE clause are mapped to the
User::ResellerSalesLastExtract and User::CurrentTime variables. Then click Cancel to close the Set
Query Parameters dialog box, and click Cancel again to close the OLE DB Source Editor dialog box.
9. Click the Control Flow tab, and then double-click the Update Last Extract Time task and note the
following configuration settings. Then click Cancel:
• On the General tab, the task is configured to update the Staging database by executing the
following parameterized Transact-SQL statement:
UPDATE ExtractLog
SET LastExtract = ?
WHERE DataSource = 'ResellerSales'
• On the Parameter Mapping tab, the parameter in the query is mapped to the
User::CurrentTime variable.
10. On the Debug menu, click Start Debugging, and observe the control flow as the package runs. When
package execution is complete, on the Debug menu, click Stop Debugging.
11. Maximize SQL Server Management Studio and in Object Explorer, in the Tables folder for the Staging
database, right-click dbo.ExtractLog and click Select Top 1000 Rows. Note that the last extract time
for the ResellerSales data source has been updated. Then minimize SQL Server Management Studio.

Task 4: Define Variables for Extraction Times


1. In Visual Studio, in Solution Explorer, double-click Extract Internet Sales Data.dtsx.
2. If the Variables pane is not visible, on the SSIS menu, click Variables. Then, in the Variables pane,
click the Add Variable button and add a variable with the following settings:
• Name: CurrentTime
• Data Type: DateTime
3. Click the Add Variable button again, and add a second variable with the following settings:
• Name: InternetSalesLastExtract
• Data Type: DateTime
4. In the SSIS Toolbox, drag an Expression Task inside the Extract Customer Sales Data sequence
container on the control surface. Arrange the tasks in this container so that the new expression task
is above the Extract Customers task. Then right-click the expression task, click Rename, and change
the name to Get Current Time.
5. Double-click the Get Current Time task, and in the Expression Builder dialog box, in the Expression
box, specify the following expression and then click OK:
@[User::CurrentTime] = GETDATE()
6. In the SSIS Toolbox, drag an Execute SQL task to the control flow surface and drop it inside the
Extract Customer Sales Data sequence container, immediately below the Get Current Time task.
Then right-click the Execute SQL task, click Rename, and change the name to Get Last Extract Time.
7. Double-click the Get Last Extract Time task and in the Execute SQL Task Editor dialog box,
configure the following settings. Then click OK:
• On the General tab, in the Connection drop-down list, select localhost.Staging.
• On the General tab, in the SQLStatement box, click the ellipsis (…) button and then in the Enter
SQL Query dialog box, enter the following Transact-SQL query and click OK:
SELECT MAX(LastExtract) LastExtract
FROM ExtractLog
WHERE DataSource = 'InternetSales'
• On the General tab, in the ResultSet drop-down list, select Single row.
• On the Result Set tab, click Add, and then in the Result Name column, change NewResultName
to LastExtract, and in the Variable Name drop-down list, select User::InternetSalesLastExtract.
8. On the control flow surface, click the Get Current Time task and drag its green precedence constraint
to the Get Last Extract Time task. Then click the Get Last Extract Time task and drag its green
precedence constraint to the Extract Customers task.

Task 5: Modify a Data Source to Filter Data


1. On the control flow surface, double-click the Extract Internet Sales task to display its data flow.
2. On the data flow surface, double-click the Internet Sales source, and in the OLE DB Source Editor
dialog box, review the existing SQL command text used to extract sales data. Then add the following
parameterized WHERE clause to the SQL Command text:
WHERE LastModified > ?
AND LastModified <= ?
3. Click the Parameters button, and in the Set Query Parameters dialog box, specify the following
parameter mappings with a Param Direction of Input, and click OK:
• Parameter0: User::InternetSalesLastExtract
• Parameter1: User:CurrentTime
4. In the OLE DB Source Editor dialog box, click OK.

Task 6: Add a Task to Update the Extraction Log


1. Click the Control Flow tab, and then in the SSIS Toolbox, drag an Execute SQL Task to the Extract
Customer Sales Data sequence under the Extract Internet Sales task on the control flow surface.
2. Right-click Execute SQL Task and click Rename. Then change the name to Update Last Extract
Time.
3. Double-click Update Last Extract Time and configure the following settings. Then click OK:
• On the General tab, in the Connection drop-down list, select localhost.Staging.
• On the General tab, in the SQLStatement box, click the ellipsis (…) button and then in the Enter
SQL Query dialog box, enter the following Transact-SQL query and click OK:
UPDATE ExtractLog
SET LastExtract = ?
WHERE DataSource = 'InternetSales'
• On the Parameter Mapping tab, click Add and create the following parameter mapping:
o Variable Name: User::CurrentTime
o Direction: Input
o Data Type: DATE
o Parameter Name: 0
o Parameter Size: -1
4. On the control flow surface, click the Extract Internet Sales task and then drag its green precedence
constraint to the Update Last Extract Time task.

Task 7: Test the Package


1. Click the Data Flow tab and view the Extract Internet Sales data flow.
2. On the Debug menu, click Start Debugging and observe the package as it executes, noting the
number of rows transferred.
3. When execution is complete, on the Debug menu, click Stop Debugging.
4. Maximize SQL Server Management Studio, and in Object Explorer, in the Staging database, rightclick
the dbo.ExtractLog table and click Select Top 1000 Rows.
5. View the data in the ExtractLog table, noting the value in the LastExtract column for the
InternetSales database has been updated to the date and time when you ran the package.
6. Right-click the dbo.InternetSales table and click Select Top 1000 Rows. The sales records in this
table were extracted from the InternetSales database, where the SalesOrderHeader table had a
LastModified column value between the previous LastExtract value, and the date and time when
the package was executed.
7. Minimize SQL Server Management Studio.
8. In Visual Studio, with the Extract Internet Sales data flow displayed in the designer, on the Debug
menu, click Start Debugging to execute the package again, noting that no rows are transferred
during this execution
9. When execution is complete, on the Debug menu, click Stop Debugging. Then close Visual Studio.
Results: After this exercise, you should have an SSIS package that uses the high water mark technique
to extract only records that have been modified since the previous extraction.

Exercise 2: Using Change Data Capture

Scenario
The Internet Sales database contains a Customers table that does not include a column to indicate when
records were inserted or modified. You plan to use the Change Data Capture feature of SQL Server Enterprise
Edition to identify records that have changed between data warehouse refresh cycles, and restrict data
extractions to include only modified rows.
The main tasks for this exercise are as follows:
1. Enable Change Data Capture
2. Create a Stored Procedure to Retrieve Modified Rows
3. Use the Stored Procedure in a Data Flow
4. Test the Package

Task 1: Enable Change Data Capture


1. Maximize SQL Server Management Studio, and open the Enable CDC.sql file in the « LABS-Atelier
SID\Lab05\Ex2 » folder.
2. Examine the Transact-SQL code in this script, noting that it enables CDC in the InternetSales
database, and for the Customers table. Then click Execute to run the script. Two jobs should be
started.
3. Open the Test CDC.sql file in the « LABS-Atelier SID\Lab05\Ex2» folder, and examine the query,
noting that it contains statements to perform the following tasks:
• Retrieve data changes between 1/1/1900 and the current date by using a CDC function.
• Modify the data in the Customers table.
• Retrieve data changes between 1/1/1900 and the current date again.
4. Select the code under the comment Select all changed customer records between 1/1/1900 and
today and click Execute. Note that no records are returned because there have been no changes in
the database since Change Data Capture was enabled.
5. Select the two UPDATE statements under the comment Make a change to all customers (to create
CDC records) and click Execute. This statement modifies the data in the Customers table by
reversing the FirstName value and then reversing it back to its original value.
6. Select the code under the comment Now see the net changes and click Execute. Note that the query
returns all records in the Customers table, because they have all been changed within the specified
time period.

Task 2: Create a Stored Procedure to Retrieve Modified Rows


1. In SQL Server Management Studio, open the Create SP.sql file in the « LABS-Atelier SID\Lab05\Ex2 »
folder.
2. Examine the Transact-SQL code in the query file, and note that it creates a stored procedure with
StartDate and EndDate parameters. The stored procedure performs the following tasks:
• Retrieves the log sequence numbers for the dates specified in the StartDate and EndDate
parameters.
• If neither of the log sequence numbers is null, then at least one transaction has occurred in the
database within the specified time period. The stored procedure uses a CDC function to return
all records that have changed in the Customers table.
• If no transactions have taken place in the specified time period, the stored procedure returns an
empty rowset.
3. Click Execute to run the Transact-SQL code and create the stored procedure.
4. Click New Query, and type the following Transact-SQL in the new query window. Then click Execute
to test the stored procedure:
USE InternetSales
GO
EXEC GetChangedCustomers '1/1/1900', '1/1/2099';
GO

Task 3: Use the Stored Procedure in a Data Flow


1. In SQL Server Management Studio, open the Reset Staging.sql file in the « LABS-Atelier
SID\Lab05\Ex2 » folder.
2. Click Execute to reset the staging database.
3. Minimize SQL Server Management Studio.
4. Start Visual Studio and open the AdventureWorksETL.sln solution in the « LABS-Atelier
SID\Lab05\Ex2 » folder.
5. In Solution Explorer, under SSIS Packages, double-click Extract Internet Sales Data.dtsx.
6. On the control flow surface, double-click the Extract Customers task.
7. On the data flow surface, double-click the Customers source.
8. In the OLE DB Source Editor dialog box, make the following changes to the configuration of the
Customers source. Then click OK:
• In the Data access mode drop-down list, select SQL Command.
• In the SQL command text box, type the following Transact-SQL statement:
EXEC GetChangedCustomers ?, ?
• Click the Parameters button, and in the Set Query Parameters dialog box, create the following
parameter mappings with a Param direction of Input, and then click then OK.
o @StartDate: User::InternetSalesLastExtract
o @EndDate: User::CurrentTime

Task 4: Test the Package


1. With the Extract Customers data flow displayed in the designer, on the Debug menu, click Start
Debugging and observe the package as it executes, noting the number of rows transferred.
2. When execution is complete, on the Debug menu, click Stop Debugging.
3. Maximize SQL Server Management Studio, and in Object Explorer, in the Staging database, rightclick
the dbo.ExtractLog table and click Select Top 1000 Rows.
4. View the data in the ExtractLog table, noting the value in the LastExtract column for the
InternetSales database has been updated to the date and time when you ran the package.
5. Right-click the dbo.Customers table and click Select Top 1000 Rows. The customer records in this
table were extracted from the InternetSales database, where no row has been changed between the
previous LastExtract value, and the date and time when the package was executed.
6. Minimize SQL Server Management Studio.
7. In Visual Studio, with the Extract Customers data flow displayed in the designer, on the Debug
menu, click Start Debugging to execute the package again, noting that no rows are transferred
during this execution.
8. When execution is complete, on the Debug menu, click Stop Debugging. Then close Visual Studio.

Results: After this exercise, you should have a database in which Change Data Capture has been enabled, and
an SSIS package that uses a stored procedure to extract modified rows based on changes monitored by
Change Data Capture.

Exercise 3: Using the CDC Control Task

Scenario
The HumanResources database contains an Employee table in which employee data is stored. You plan to
use the Change Data Capture feature of SQL Server Enterprise Edition to identify modified rows in this table.
You also plan to use the CDC Control Task in SSIS to manage the extractions from this table by creating a
package to perform the initial extraction of all rows, and a second package that uses the CDC data flow
components to extract rows that have been modified since the previous extraction.
The main tasks for this exercise are as follows:
1. Enable Change Data Capture
5. View Staging Tables
6. Create Connection Managers for CDC Components
7. Create a Package for Initial Data Extraction
8. Test Initial Extraction
9. Create a Package for Incremental Data Extraction
10. Test Incremental Extraction

Task 1: Enable Change Data Capture


1. Maximize SQL Server Management Studio, and open the Enable CDC.sql file in the « LABS-Atelier
SID\Lab05\Ex3 » folder.
2. Examine the query, noting that it enables CDC in the HumanResources database, and for the
Employee table. Then click Execute to run the query. Two jobs should be started.

Task 2: View Staging Tables


1. In SQL Server Management Studio, in Object Explorer, under the Tables folder for the Staging
database, right-click the dbo.EmployeeDeletes table and click Select Top 1000 Rows. Note that the
table is empty.
2. Repeat the previous step for the dbo.EmployeeInserts and dbo.EmployeeUpdates tables.
3. Minimize SQL Server Management Studio.

Task 3: Create Connection Managers for CDC Components


1. Start Visual Studio and open the AdventureWorksETL.sln solution in the « LABS-Atelier
SID\Lab05\Ex3 » folder.
2. In Solution Explorer, right-click the Connection Managers folder and click New Connection
Manager. Then in the Add SSIS Connection Manager dialog box, click the ADO.NET connection
manager type, and click Add.
3. In the Configure ADO.NET Connection Manager dialog box, click New. Then in the Connection
Manager dialog box, in the Server name box type localhost, ensure Use Windows authentication
is selected, and in the Select or enter a database name drop-down list, select HumanResources.
4. Click OK to close the Connection Manager dialog box, and click OK again to close the Configure
ADO.NET Connection Manager dialog box.
5. In Solution Explorer, right-click the localhost.HumanResources.conmgr connection manager and
click Rename. Then rename the connection manager to
localhost.HumanResources.ADO.NET.conmgr.
6. In Solution Explorer, right-click the Connection Managers folder and click New Connection
Manager. Then in the Add SSIS Connection Manager dialog box, click the ADO.NET connection
manager type and click Add.
7. In the Configure ADO.NET Connection Manager dialog box, click New. Then in the Connection
Manager dialog box, in the Server name box type localhost, ensure Use Windows authentication
is selected, and in the Select or enter a database name drop-down list, select Staging.
8. Click OK to close the Connection Manager dialog box, and click OK again to close the Configure
ADO.NET Connection Manager dialog box.
9. In Solution Explorer, right-click the localhost.Staging 1.conmgr connection manager and click
Rename. Then rename the connection manager to localhost.Staging.ADO.NET.conmgr.

Task 4: Create a Package for Initial Data Extraction


In Solution Explorer, right-click the SSIS Packages folder and click New SSIS Package.
1. When the new package is created, in Solution Explorer, right-click Package1.dtsx and click Rename.
Then rename the package to Extract Initial Employee Data.dtsx.
2. In the SSIS Toolbox, in the Other Tasks section, drag a CDC Control Task to the control flow surface
of the Extract Initial Employee Data.dtsx package.
3. On the control flow surface, right-click CDC Control Task and click Rename. Then rename it to Mark
Initial Load Start.
4. Double-click the Mark Initial Load Start task, and in the CDC Control Task Editor dialog box, set
the following properties. Then click OK:
• SQL Server CDC database ADO.NET connection manager: localhost HumanResources ADO
NET.
• CDC control operation: Mark initial load start.
• Variable containing the CDC state: Click New and then in the Add New Variable dialog box,
click OK to create a variable named CDC_State in the Extract Initial Employee Data container.
• Automatically store state in a database table: Selected.
• Connection manager for the database where the state is stored: localhost Staging ADO NET.
• Table to use for storing state: Click New, and in the Create New State Table dialog box, click
Run to create a table named [dbo].[cdc_states] in the Staging database.
• State name: CDC_State.
5. In the SSIS Toolbox, in the Favorites section, drag a Data Flow Task to the control flow surface of
the Extract Initial Employee Data.dtsx package, under the Mark Initial Load Start task.
6. On the control flow surface, right-click Data Flow Task and click Rename. Then rename it to Extract
Initial Employee Data. Then drag a green precedence constraint from the Mark Initial Load Start
task to the Extract Initial Employee Data task.
7. Double-click the Extract Initial Employee Data task to view its data flow surface.
8. In the SSIS Toolbox, in the Other Sources section, drag an ADO NET Source to the data flow surface.
Then on the data flow surface, right-click ADO NET Source, click Rename, and rename it to
Employees.
9. Double-click Employees and in the ADO.NET Source Editor dialog box, set the following properties.
Then click OK:
• ADO.NET connection manager: localhost HumanResources ADO NET.
• Data access mode: Table or view.
• Name of the table or view: dbo"."Employee".
10. In the SSIS Toolbox, in the Other Destinations section, drag an ADO NET Destination to the data
flow surface, below the Employees data source. Then on the data flow surface, right-click ADO NET
Destination, click Rename, and rename it to Employee Inserts.
11. Click the Employees source and drag the blue data flow path connection to the Employee Inserts
destination.
12. Double-click Employee Inserts and in the ADO.NET Destination Editor dialog box, set the following
properties. Then click OK:
• Connection manager: localhost Staging ADO NET.
• Use a table or view: "dbo"."EmployeeInserts".
• Mappings: On the Mappings tab, ensure all available input columns are mapped to destination
columns of the same name.
13. Click the Control Flow tab, and in the SSIS Toolbox, in the Other Tasks section, drag a CDC Control
Task to the control flow surface, below the Extract Initial Employee Data task.
14. On the control flow surface, right-click CDC Control Task and click Rename. Then rename it to Mark
Initial Load End. Drag a “success” precedence constraint from the Extract Initial Employee Data
task to the Mark Initial Load End task.
15. Double-click the Mark Initial Load End task, and in the CDC Control Task Editor dialog box, set the
following properties. Then click OK:
• SQL Server CDC database ADO.NET connection manager: localhost HumanResources ADO
NET.
• CDC control operation: Mark initial load end.
• Variable containing the CDC state: User:: CDC_State.
• Automatically store state in a database table: Selected.
• Connection manager for the database where the state is stored: localhost Staging ADO NET.
• Table to use for storing state: [dbo].[cdc_states].
• State name: CDC_State.
16. On the File menu, click Save All.

Task 5: Test Initial Extraction


1. In Visual Studio, ensure that the control flow for the Extract Initial Employee Data.dtsx package is
open, and on the Debug menu, click Start Debugging. Then, when package execution is complete,
on the Debug menu, click Stop Debugging.
2. Minimize Visual Studio and maximize SQL Server Management Studio.
3. In Object Explorer, right-click the dbo.EmployeeInserts table in the Staging database and click
Select Top 1000 Rows. Note that the table now contains employee records.
4. In Object Explorer, under the Staging database, right-click the Tables folder and click Refresh. Note
that a new table named dbo.cdc_states has been created in the staging database.
5. Right-click the dbo.cdc_states table and click Select Top 1000 Rows. Note that the table contains
an encoded string that indicates the CDC state.
6. Minimize SQL Server Management Studio.

Task 6: Create a Package for Incremental Data Extraction


1. Maximize Visual Studio, and then in Solution Explorer, right-click the SSIS Packages folder and click
New SSIS Package.
2. When the new package is created, in Solution Explorer, right-click Package1.dtsx and click Rename.
Then rename the package to Extract Changed Employee Data.dtsx.
3. In the SSIS Toolbox, in the Other Tasks section, drag a CDC Control Task to the control flow surface
of the package.
4. On the control flow surface, right-click CDC Control Task and click Rename. Then rename it to Get
Processing Range.
5. Double-click the Get Processing Range task, and in the CDC Control Task Editor dialog box, set
the following properties. Then click OK:
• SQL Server CDC database ADO.NET connection manager: localhost HumanResources ADO
NET.
• CDC control operation: Get processing range.
• Variable containing the CDC state: Click New and then click OK to create a variable named
CDC_State in the Extract Changed Employee Data container.
• Automatically store state in a database table: Selected.
• Connection manager for the database where the state is stored: localhost Staging ADO NET.
• Table to use for storing state: [dbo].[cdc_states].
• State name: CDC_State.
6. In the SSIS Toolbox, drag a Data Flow Task to the control flow surface, and drop it under the Get
Processing Range task.
7. On the control flow surface, right-click Data Flow Task and click Rename. Then rename it to Extract
Changed Employee Data.
8. Click Get Processing Range and drag its green precedence constraint to the Extract Changed
Employee Data task.
9. Double-click the Extract Changed Employee Data task to view its data flow surface.
10. In the SSIS Toolbox, in the Other Sources section, drag a CDC Source to the data flow surface of the
Extract Changed Employee Data task. Then on the data flow surface, right-click CDC Source, click
Rename, and rename it to Employee Changes.
11. Double-click Employee Changes, and in the CDC Source dialog box, set the following properties.
Then click OK:
• ADO.NET connection manager: localhost HumanResources ADO NET.
• CDC enabled table: [dbo].[Employee].
• Capture instance: dbo_Employee.
• CDC processing mode: Net.
• Variable containing the CDC state: User::CDC_State.
12. In the SSIS Toolbox, in the Other Transforms section, drag a CDC Splitter to the data flow surface,
below the Employee Changes data source. Then click Employee Changes and drag the blue data
flow path connection to the CDC Splitter transformation.
13. In the SSIS Toolbox, in the Other Destinations section, drag an ADO NET Destination to the data
flow surface, below the CDC Splitter transformation. Then on the data flow surface, right-click ADO
NET Destination, click Rename, and rename it to Employee Inserts.
14. Click the CDC Splitter transformation and drag the blue data flow path connection to the Employee
Inserts destination. In the Input Output Selection dialog box, select the InsertOutput output and
click OK.
15. Double-click Employee Inserts and in the ADO.NET Destination Editor dialog box, set the following
properties. Then click OK:
• Connection manager: localhost Staging ADO NET.
• Use a table or view: "dbo"."EmployeeInserts".
• Mappings: On the Mappings tab, verify that all available input columns other than
_$start_lsn,_$operation, and _$update_mask are mapped to destination columns of the same
name.
16. In the SSIS Toolbox, in the Other Destinations section, drag an ADO NET Destination to the data
flow surface, directly below and to the left of the CDC Splitter transformation. Then on the data flow
surface, right-click ADO NET Destination, click Rename, and rename it to Employee Updates.
17. Click the CDC Splitter transformation and drag the blue data flow path connection to the Employee
Updates destination. In the Input Output Selection dialog box, select the UpdateOutput output
and click OK.
18. Double-click Employee Updates and in the ADO.NET Destination Editor dialog box, set the
following properties. Then click OK:
• Connection manager: localhost Staging ADO NET.
• Use a table or view: "dbo"."EmployeeUpdates"
• Mappings: On the Mappings tab, verify that all available input columns other than
_$start_lsn,_$operation, and _$update_mask are mapped to destination columns of the same
name.
19. In the SSIS Toolbox, in the Other Destinations section, drag an ADO NET Destination to the data
flow surface, below and to the right of the CDC Splitter transformation. Then on the data flow surface,
right-click ADO NET Destination, click Rename, and rename it to Employee Deletes.
20. Click the CDC Splitter transformation and drag the blue data flow path connection to the Employee
Deletes destination. The DeleteOutput output should be selected automatically.
21. Double-click Employee Deletes and in the ADO.NET Destination Editor dialog box, set the
following properties. Then click OK:
• Connection manager: localhost Staging ADO NET.
• Use a table or view: "dbo"."EmployeeDeletes".
• Mappings: On the Mappings tab, verify that all available input columns other than _$start_lsn,
_$operation, and _$update_mask are mapped to destination columns of the same name.
22. Click the Control Flow tab, and in the SSIS Toolbox, in the Other Tasks section, drag a CDC Control
Task to the control flow surface, below the Extract Changed Employee Data task.
23. On the control flow surface, right-click CDC Control Task and click Rename. Then rename it to Mark
Processed Range.
24. Click Extract Changed Employee Data and drag its green precedence constraint to the Mark
Processed Range task.
25. Double-click the Mark Processed Range task, and in the CDC Control Task Editor dialog box, set
the following properties. Then click OK:
• SQL Server CDC database ADO.NET connection manager: localhost HumanResources ADO
NET.
• CDC control operation: Mark processed range.
• Variable containing the CDC state: User:: CDC_State.
• Automatically store state in a database table: Selected.
• Connection manager for the database where the state is stored: localhost Staging ADO NET.
• Table to use for storing state: [dbo].[cdc_states]
• State name: CDC_State.
26. On the File menu, click Save All.

Task 7: Test Incremental Extraction


1. In Visual Studio, ensure that the control flow for the Extract Changed Employee Data.dtsx package
is open, and on the Debug menu, click Start Debugging.
2. When package execution is complete, double-click the Extract Changed Employee Data task to
verify that no rows were extracted (because no changes have been made to the source data since the
initial extraction). Then, on the Debug menu, click Stop Debugging.
3. Maximize SQL Server Management Studio, and open the Change Employees.sql file in the « LABS-
Atelier SID\Lab05\Ex3» folder.
4. Review the Transact-SQL code and note that it truncates the dbo.EmployeeInserts,
dbo.EmployeeUpdates, and dbo.EmployeeDeletes tables in the Staging database, and then makes
the following changes to the dbo.Employee table in the HumanResources database:
• Inserts a new employee record.
• Updates employee 281 to change the Title column value.
• Deletes employee 273.
5. Click Execute to run the Transact-SQL code, and then minimize SQL Server Management Studio and
maximize Visual Studio.
6. In Visual Studio, ensure that the data flow for the Extract Changed Employee Data task is open, and
on the Debug menu, click Start Debugging.
7. When package execution is complete, double-click the Extract Changed Employee Data task to
verify that three rows were extracted and split into one insert, one update, and one delete. Then, on
the Debug menu, click Stop Debugging.
Note: If no rows were transferred, stop debugging, wait for a few seconds, and then repeat the
previous two steps.
8. Close Visual Studio and maximize SQL Server Management Studio.
9. In Object Explorer, under the Tables folder for the Staging database, right-click the
dbo.EmployeeInserts table and click Select Top 1000 Rows. Note that the table contains the row
that was inserted.
10. Repeat the previous step for the dbo.EmployeeUpdates and dbo.EmployeeDeletes tables, and
verify that they contain the updated and deleted records respectively.
11. Minimize SQL Server Management Studio.

Results: After this exercise, you should have a HumanResources database in which Change Data Capture has
been enabled, and an SSIS package that uses the CDC Control to extract the initial set of employee records.
You should also have an SSIS package that uses the CDC Control and CDC data flow components to extract
modified employee records based on changes recorded by Change Data Capture.

Exercise 4: Using Change Tracking

Task 1: Enable Change Tracking


1. Maximize SQL Server Management Studio, and open the Enable CT.sql file in the « LABS-Atelier
SID\Lab05\Ex4 » folder.
2. Examine the query, noting that it enables Change Tracking in the ResellerSales database, and for the
Resellers table. Then click Execute to run the query.
3. Open the Test CT.sql file in the « LABS-Atelier SID\Lab05\Ex4 » folder, and note that it contains
statements to perform the following tasks:
• Get the current change tracking version number.
• Retrieve all data from the Resellers table.
• Store the current version number as the previously-retrieved version.
• Update the Resellers table.
• Get the new current version number.
• Get all changes between the previous and current versions.
• Store the current version number as the previously-retrieved version.
• Update the Resellers table again.
• Get the new current version number.
• Get all changes between the previous and current versions.
4. Click Execute and view the results. Note that:
• The first resultset shows all reseller records.
• The second resultset indicates that the previously-retrieved version was numbered 0, and the
current version is numbered 1.
• The third resultset indicates that the previously-retrieved version was numbered 1, and the
current version is numbered 2.

Task 2: Create a Stored Procedure to Retrieve Modified Rows


1. In SQL Server Management Studio, open the Create SP.sql file in the « LABS-Atelier SID\Lab05\Ex4 »
folder.
2. Examine the Transact-SQL code in the query file, and note that it enables snapshot isolation and
creates a stored procedure with a single parameter named LastVersion. The stored procedure
performs the following tasks:
• Sets the isolation level to snapshot.
• Retrieves the current change tracking version number.
• If the LastVersion parameter is -1, the stored procedure assumes that no previous versions have
been retrieved, and returns all records from the Resellers table.
• If the LastVersion parameter is not -1, the stored procedure retrieves all changes between
LastVersion and the current version.
• The stored procedure updates the LastVersion parameter to the current version, so the calling
application can store the last version retrieved for next time.
• Sets the isolation level back to read “committed”.
3. Click Execute to run the Transact-SQL code and create the stored procedure.
4. Click New Query, and type the following Transact-SQL in the new query window. Then click Execute
to test the stored procedure:
USE ResellerSales
GO
DECLARE @p BigInt = -1;
EXEC GetChangedResellers @p OUTPUT;
SELECT @p LastVersionRetrieved;
EXEC GetChangedResellers @p OUTPUT;

Task 3: Modify a Data Flow to use the Stored Procedure


1. In SQL Server Management Studio, open the Reset Staging.sql file in the « LABS-Atelier
SID\Lab05\Ex4 » folder.
2. Click Execute to reset the staging database.
3. Minimize SQL Server Management Studio.
4. Start Visual Studio and open the AdventureWorksETL.sln solution in the « LABS-Atelier
SID\Lab05\Ex4 » folder.
5. In Solution Explorer, under SSIS Packages, double-click Extract Reseller Data.dtsx.
6. If the Variables pane is not visible, on the SSIS menu, click Variables. Then, in the Variables pane,
click the Add Variable button and add a variable with the following settings :
• Name: PreviousVersion
• Data Type: Decimal
• Value: 0
7. In the SSIS Toolbox, drag an Execute SQL Task to the control flow surface, above the Extract
Resellers task. Then right-click the expression task, click Rename, and change the name to Get
Previously Extracted Version.
8. Double-click the Get Previously Extracted Version task and in the Execute SQL Task Editor dialog
box, configure the following settings. Then click OK:
• On the General tab, in the ResultSet drop-down list, select Single row.
• On the General tab, in the Connection drop-down list, select localhost.Staging.
• On the General tab, in the SQLStatement box, click the ellipsis (…) button and then in the Enter
SQL Query, dialog box, enter the following Transact-SQL query and click OK:
SELECT MAX(LastVersion) LastVersion
FROM ExtractLog
WHERE DataSource = 'ResellerSales'
• On the Result Set tab, click Add, and then in the Result Name column, change NewResultName
to LastVersion, and in the Variable Name drop-down list, select User::PreviousVersion.
9. On the control flow surface, right-click the green precedence constraint between Get Last Extract
Time and Extract Resellers, and click Delete.
10. Click the Get Last Extract Time task and drag its green precedence constraint to the Get Previously
Extracted Version task. Then click the Get Previously Extracted Version task and drag its green
precedence constraint to the Extract Resellers task.
11. In the SSIS Toolbox, drag an Execute SQL Task under the Update Last Extract Time task on the
control flow surface.
12. Right-click Execute SQL Task and click Rename. Then change the name to Update Previous
Version.
13. Double-click Update Previous Version and configure the following settings. Then click OK:
• On the General tab, in the Connection drop-down list, select localhost.Staging.
• On the General tab, in the SQLStatement box, click the ellipsis (…) button and then in the Enter
SQL Query, dialog box, enter the following Transact-SQL query and click OK:
UPDATE ExtractLog
SET LastVersion = ?
WHERE DataSource = 'ResellerSales'
• On the Parameter Mapping tab, click Add and create the following parameter mapping:
Variable Name: User::PreviousVersion
Direction: Input
Data Type: LARGE_INTEGER
Parameter Name: 0
Parameter Size: -1
14. On the control flow surface, right-click the green precedence constraint between Update Last
Extract Time and Send Success Notification, and click Delete. Then click the Update Last Extract
Time task and drag its green precedence constraint to the Update Previous Version task. Click the
Update Previous Version task and drag its green precedence constraint to the Send Success
Notification task.
15. On the control flow surface, double-click the Extract Resellers task.
16. On the data flow surface, double-click the Resellers source.
17. In the OLE DB Source Editor dialog box, make the following changes to the configuration of the
Customers source. Then click OK.
• In the Data access mode drop-down list, select SQL Command.
• In the SQL command text box, type the following Transact-SQL statement:
EXEC GetChangedResellers ? OUTPUT
• Click the Parameters button, and in the Set Query Parameters dialog box, create the following
parameter mappings, and then click OK:
Parameters: @LastVersion
Variables: User::PreviousVersion
Param direction: InputOutput

Task 4: Test the Package


1. In Visual Studio, with the Extract Resellers data flow displayed in the designer, on the Debug menu,
click Start Debugging and observe the package as it executes, noting the number of rows
transferred.
2. When execution is complete, on the Debug menu, click Stop Debugging.
3. Maximize SQL Server Management Studio, and in Object Explorer, in the Staging database, rightclick
the dbo.ExtractLog table and click Select Top 1000 Rows.
4. View the data in the ExtractLog table, noting the value in the LastVersion column for the
ResellerSales database has been updated to the latest version retrieved from the source database.
5. Right-click the dbo.Resellers table and click Select Top 1000 Rows. The customer records in this
table were extracted from the ResellerSales database, where no row has been changed between the
previous LastVersion value, and the current version.
6. Close SQL Server Management Studio without saving any changes.
7. In Visual Studio, with the Extract Resellers data flow displayed in the designer, on the Debug menu,
click Start Debugging to execute the package again, noting that no rows are transferred during this
execution.
8. When execution is complete, on the Debug menu, click Stop Debugging. Then close Visual Studio.

Results: After this exercise, you should have a database in which Change Tracking has been enabled, and an
SSIS package that uses a stored procedure to extract modified rows based on changes recorded by Change
Tracking.

You might also like