SAP Data Integration Using Azure Data Factory
SAP Data Integration Using Azure Data Factory
On-premises data
Cloud data
STORE
SaaS data
Data Pipeline Orchestration & Monitoring
INGEST PREPARE TRANSFORM SERVE VISUALIZE
& ENRICH
SAP data
Typical SAP data integration scenarios:
• Ongoing batch ETL from SAP to data lake
• Historical migration from SAP to Azure
On-premises data
Cloud data
STORE
SaaS data
Data Pipeline Orchestration & Monitoring
Azure Data Factory
A fully-managed data integration service
for cloud-scale analytics in Azure
Azure Database & DW File Storage File Formats NoSQL Services & Apps Generic
Blob Storage Amazon Redshift Phoenix Amazon S3 Avro Cassandra Amazon MWS PayPal HTTP
Cosmos DB – SQL API DB2 PostgreSQL File System Binary Couchbase CDS for Apps QuickBooks OData
Cosmos DB – MongoDB API Drill Presto FTP Common Data Model MongoDB Concur Salesforce ODBC
ADLS Gen1 Google BigQuery SAP BW Open Hub Google Cloud Storage Delimited Text Dynamics 365 SF Service Cloud REST
ADLS Gen2 Greenplum SAP BW MDX HDFS Excel Dynamics AX SF Marketing Cloud
Data Explorer HBase SAP HANA SFTP JSON Dynamics CRM SAP C4C
Database for MariaDB Hive SAP Table ORC Google AdWords SAP ECC
Database for MySQL Impala Snowflake Parquet HubSpot ServiceNow
Database for PostgreSQL Informix Spark Jira SharePoint List
File Storage MariaDB SQL Server Magento Shopify
SQL Database Microsoft Access Sybase Marketo Square
SQL Managed Instance MySQL Teradata Office 365 Web Table
Synapse Analytics Netezza Vertica Oracle Eloqua Xero
Search Index Oracle Oracle Responsys Zoho
Table Storage Oracle Service Cloud
SAP Data Integration Overview
SAP HANA Connector
SAP Table Connector
SAP BW Open Hub Connector
SAP ECC Connector
SAP BW MDX Connector
More about Azure Data Factory Copy Activity
Resources
“I want to extract data from SAP HANA database” →
ADF connector:
(Connector deep-dive)
“I want to extract data from SAP BW” →
Suggested decision direction
Table (Transparent, Pooled, Cluster Table) OData entities exposed via SAP Gateway
Objects to extract and View (BAPI, ODP)
ADF Self-
SAP HANA
hosted
ODBC
Integration
Driver
Runtime
For each copy activity run, ADF issue the specified query to source to retrieve the data.
Source data
Single Copy Activity execution
Out-of-box optimization for SAP HANA: C1 C2 PartitionCol
e.g. set Parallel Copy = 4
… … … 10000
• Built-in parallel copy by partitions to
… … … 10001
boost performance for large table
ingestion. … … … …
… … … 30000
• Options of HANA physical table partition … … … 30001
and dynamic range partition.
… … … …
… … … 50000
… … … 50001
… … … …
… … … 70000
… … … 70001
… … … …
......
… … … …
C1 C2 … LastModifiedDate SELECT * FROM MyTable
WHERE LastModifiedDate >= @{formatDateTime(pipeline().parameters.windowStartTime, 'yyyy/MM/dd’)
… … … … AND LastModifiedDate < @{formatDateTime(pipeline().parameters.windowEndTime, 'yyyy/MM/dd’)
… … … 2019/03/18
… … … 2019/03/18
Execution start time: 2019/03/19 00:00:00 (window end time)
… … … …… Delta extraction: last modified time between 2019/03/18 – 2019/03/19
… … … 2019/03/18
… … … 2019/03/19
… … … 2019/03/19 Execution start time: 2019/03/20 00:00:00 (window end time)
… … … … Delta extraction: last modified time between 2019/03/19 – 2019/03/20
… … … 2019/03/19
… … … …
Workflow Pipeline
• SAP ECC or other applications in Business Suite version 7.01 and above, on-prem or in
Supported versions the cloud
• S/4 HANA
Supported SAP objects • SAP Transparent Table, Pooled Table, Cluster Table and View
Supported server type • Connect to Application Server or Message Server
• Basic – username & password
Supported authentications
• SNC (Secure Network Communications)
• Built on top of SAP .NET Connector 3.0, pull data via NetWeaver RFC w/ field selection
Mechanism and prerequisites & row filter
• Run on Self-hosted Integration Runtime
Capabilities:
ADF Self-
SAP .NET hosted
✓ Field selection
Connector Integration ✓ Row filter (SAP query operators)
Runtime
✓ Default or custom RFC func
✓ Built-in partition + parallel load
SAP table
ADF
C1 C2 PartitionCol Single Copy Activity execution
… … … 201809 e.g. set Parallel Copy = 4 Tips:
… … … 201809
… … … … Enable partitioning when
… … … 201810
ingesting large dataset,
e.g. dozen millions of
… … … 201810
rows.
… … … …
… … … 201811 To speed up, choose the
… … … 201811 proper partition column
… … … … and partition numbers,
… … … 201812 and adjust parallel copies.
… … … 201812
Learn more
… … … …
.......
… … … …
Pattern I: “my data has timestamp column e.g. calendar date”
Solution: tumbling window trigger + dynamic query with system variables via SAP table option (filter)
Pattern II: “my data has an incremental column e.g. id/last copied date”
Solution: external control table/file + high watermark.
• Built on top of SAP .NET Connector 3.0, pull data via NetWeaver RFC
Mechanism and prerequisites • Run on ADF Self-hosted Integration Runtime
• SAP side config: create SAP OHD in SAP BW to expose data
OHD
Activate
OHD types:
DTP DTP
• InfoObject
Master Data
ADF Self-
Open Hub
SAP .NET hosted
Destination
Table Connector Integration
Runtime
SAP BW OHD table ADF
Single Copy Activity execution
Request ID Package ID Record ID … e.g. set Parallel Copy = 4
1 1 1 …
1 1 2 …
SAP BW DTP 1 1 … …
execution #1: 1 2 1 …
unique Request ID
1 2 2 …
1 2 … …
1 3 2 …
1 … … …
2 … … …
SAP BW DTP
execution #2 2 … … …
2 … … …
……
… … … …
Request ID Package ID Record ID …
… … … …
100 … … …
… … … … Exclude Last request ID:
• Applicable if DTP and Copy may run at the same time
200 … … …
…
300 … … …
300 … … …
guidance
• SAP ECC version 7.0 and above
Supported versions
• Any entities exposed by SAP ECC OData services
Pattern I: “my data has timestamp column e.g. last modified time”
Solution: tumbling window trigger + dynamic query with system variables via OData query
ADF Self-
SAP
hosted
NetWeaver
Integration
library
Runtime
Flexible control flow &
scheduling to scale out.
Pipeline Pipeline (multiple copy activities,
concurrency, partitions)
Azure IR
Data Factory
Self-hosted IR
on premises
Corporate
Firewall Boundary
Self-hosted IR deployed on Azure VM
VNet
service
endpoints
Express Route
Data stores (private peering) Self-hosted IR on
Azure VM
HDInsight Databricks
VNet
service
endpoints
Data stores
Self-hosted IR on
Azure VM
HDInsight Databricks
Azure Virtual
Network
Copy Data Tool
Solution Template
ADF Copy Activity Overview https://docs.microsoft.com/azure/data-factory/copy-activity-overview
• Analytics and Integration for SAP Global Instance running on-premises with ADF
• Reckitt Benckiser (RB): https://customers.microsoft.com/story/reckitt-benckiser-consumer-
Customer case study
goods-power-bi
• Newell: https://customers.microsoft.com/story/newell-brands-consumer-goods-azure