Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
441 views

SAP Data Integration Using Azure Data Factory

This document discusses using Azure Data Factory to integrate SAP data. It describes how ADF can ingest data from SAP HANA, SAP BW, SAP ECC and other SAP applications. The SAP HANA and SAP BW connectors in ADF support fast and scalable data extraction from large SAP databases using parallel loading. ADF provides a single tool for orchestrating data movement from various SAP and non-SAP sources to Azure data stores and services.

Uploaded by

Leandro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
441 views

SAP Data Integration Using Azure Data Factory

This document discusses using Azure Data Factory to integrate SAP data. It describes how ADF can ingest data from SAP HANA, SAP BW, SAP ECC and other SAP applications. The SAP HANA and SAP BW connectors in ADF support fast and scalable data extraction from large SAP databases using parallel loading. ADF provides a single tool for orchestrating data movement from various SAP and non-SAP sources to Azure data stores and services.

Uploaded by

Leandro
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

SAP Data Integration

Using Azure Data Factory


Update: Jun 28, 2020
INGEST PREPARE TRANSFORM SERVE VISUALIZE
& ENRICH
SAP data

On-premises data

Cloud data

STORE

SaaS data
Data Pipeline Orchestration & Monitoring
INGEST PREPARE TRANSFORM SERVE VISUALIZE
& ENRICH
SAP data
Typical SAP data integration scenarios:
• Ongoing batch ETL from SAP to data lake
• Historical migration from SAP to Azure
On-premises data

Cloud data

STORE

SaaS data
Data Pipeline Orchestration & Monitoring
Azure Data Factory
A fully-managed data integration service
for cloud-scale analytics in Azure

Connected & Scalable & Secure &


Productive
Integrated Cost-Effective Compliant

Rich connectivity Serverless scalability Certified compliance Drag & drop UI


without infra mgmt
Built-in transformation Enterprise grade security Single-pane-of-glass
Pay for use monitoring
Flexible orchestration MSI and AKV support
CICD model
Full integration with
Azure Data services
Code-free data
transformation

SAP data ingestion Azure Machine Learning integration


Single tool to enable data ingestion from SAP as well as other various sources,
and data transformation via built-in Data Flow, integration with Databricks/HDInsight/etc.

Azure Database & DW File Storage File Formats NoSQL Services & Apps Generic
Blob Storage Amazon Redshift Phoenix Amazon S3 Avro Cassandra Amazon MWS PayPal HTTP
Cosmos DB – SQL API DB2 PostgreSQL File System Binary Couchbase CDS for Apps QuickBooks OData
Cosmos DB – MongoDB API Drill Presto FTP Common Data Model MongoDB Concur Salesforce ODBC
ADLS Gen1 Google BigQuery SAP BW Open Hub Google Cloud Storage Delimited Text Dynamics 365 SF Service Cloud REST
ADLS Gen2 Greenplum SAP BW MDX HDFS Excel Dynamics AX SF Marketing Cloud
Data Explorer HBase SAP HANA SFTP JSON Dynamics CRM SAP C4C
Database for MariaDB Hive SAP Table ORC Google AdWords SAP ECC
Database for MySQL Impala Snowflake Parquet HubSpot ServiceNow
Database for PostgreSQL Informix Spark Jira SharePoint List
File Storage MariaDB SQL Server Magento Shopify
SQL Database Microsoft Access Sybase Marketo Square
SQL Managed Instance MySQL Teradata Office 365 Web Table
Synapse Analytics Netezza Vertica Oracle Eloqua Xero
Search Index Oracle Oracle Responsys Zoho
Table Storage Oracle Service Cloud
SAP Data Integration Overview
SAP HANA Connector
SAP Table Connector
SAP BW Open Hub Connector
SAP ECC Connector
SAP BW MDX Connector
More about Azure Data Factory Copy Activity
Resources
“I want to extract data from SAP HANA database” →

ADF connector:

(Connector deep-dive)
“I want to extract data from SAP BW” →
Suggested decision direction

ADF connector SAP BW SAP BW


SAP Table
options Open Hub via MDX

Table (Transparent, Pooled, DSO, InfoCube, MultiProvider,


Objects to extract InfoCubes, QueryCubes
Cluster Table) and View DataSource, etc

SAP side configuration N/A SAP Open Hub Destination N/A

Fast w/ built-in parallel loading Fast w/ built-in parallel loading


Performance based on configurable partitioning based on OHD specific schema
Slower

Well-thought-through workload Exploratory workload


Suitable workload Large volume
Large volume Small volume

(Connector deep-dive) (Connector deep-dive) (Connector deep-dive)


“I want to extract data from SAP ECC, S/4 HANA, or other SAP applications” →
Suggested decision direction

ADF connector options SAP Table SAP ECC

Table (Transparent, Pooled, Cluster Table) OData entities exposed via SAP Gateway
Objects to extract and View (BAPI, ODP)

SAP side configuration N/A SAP Gateway

Performance Fast w/ built-in parallel loading Slower

Suitable workload Large volume Small volume

(Connector deep-dive) (Connector deep-dive)


Supported versions • All SAP HANA versions, on-prem or in the cloud

• HANA Information Models (Analytic/Calculation views)


Supported SAP objects
• Row & Column Tables

• Basic – username & password


Supported authentications
• Windows – Single Sign-On via Kerberos-constrained delegation

• Built on top of SAP’s HANA ODBC driver


Mechanism and prerequisites • Pull data via custom query
• Run on Self-hosted Integration Runtime

• Built-in parallel loading option based on configurable data partitioning NEW


Performance & Scalability • Performant to handle TB level data with hundred millions to billion of rows per run,
observed several to several dozens MB/s (varies per customers’ data/env.)
Pipeline
Azure Data Stores

ADF Self-
SAP HANA
hosted
ODBC
Integration
Driver
Runtime
For each copy activity run, ADF issue the specified query to source to retrieve the data.

Source data
Single Copy Activity execution
Out-of-box optimization for SAP HANA: C1 C2 PartitionCol
e.g. set Parallel Copy = 4
… … … 10000
• Built-in parallel copy by partitions to
… … … 10001
boost performance for large table
ingestion. … … … …
… … … 30000
• Options of HANA physical table partition … … … 30001
and dynamic range partition.
… … … …
… … … 50000
… … … 50001
… … … …
… … … 70000
… … … 70001
… … … …
......
… … … …
C1 C2 … LastModifiedDate SELECT * FROM MyTable
WHERE LastModifiedDate >= @{formatDateTime(pipeline().parameters.windowStartTime, 'yyyy/MM/dd’)
… … … … AND LastModifiedDate < @{formatDateTime(pipeline().parameters.windowEndTime, 'yyyy/MM/dd’)
… … … 2019/03/18
… … … 2019/03/18
Execution start time: 2019/03/19 00:00:00 (window end time)
… … … …… Delta extraction: last modified time between 2019/03/18 – 2019/03/19
… … … 2019/03/18
… … … 2019/03/19
… … … 2019/03/19 Execution start time: 2019/03/20 00:00:00 (window end time)
… … … … Delta extraction: last modified time between 2019/03/19 – 2019/03/20
… … … 2019/03/19
… … … …
Workflow Pipeline
• SAP ECC or other applications in Business Suite version 7.01 and above, on-prem or in
Supported versions the cloud
• S/4 HANA

Supported SAP objects • SAP Transparent Table, Pooled Table, Cluster Table and View
Supported server type • Connect to Application Server or Message Server
• Basic – username & password
Supported authentications
• SNC (Secure Network Communications)

• Built on top of SAP .NET Connector 3.0, pull data via NetWeaver RFC w/ field selection
Mechanism and prerequisites & row filter
• Run on Self-hosted Integration Runtime

• Built-in parallel loading option based on configurable data partitioning


Performance & Scalability • Performant to handle TB level data, with per run dozen millions to billion of rows &
observed several to 20s MB/s (varies per customers’ data/env.)
✓ Field/column selection
✓ Row filter using SAP query operators
✓ Use default /SAPDS/RFC_READ_TABLE2 or
custom RFC module to retrieve data
Pipeline
Azure Data Stores

Capabilities:
ADF Self-
SAP .NET hosted
✓ Field selection
Connector Integration ✓ Row filter (SAP query operators)
Runtime
✓ Default or custom RFC func
✓ Built-in partition + parallel load
SAP table
ADF
C1 C2 PartitionCol Single Copy Activity execution
… … … 201809 e.g. set Parallel Copy = 4 Tips:
… … … 201809
… … … … Enable partitioning when
… … … 201810
ingesting large dataset,
e.g. dozen millions of
… … … 201810
rows.
… … … …
… … … 201811 To speed up, choose the
… … … 201811 proper partition column
… … … … and partition numbers,
… … … 201812 and adjust parallel copies.
… … … 201812
Learn more
… … … …
.......
… … … …
Pattern I: “my data has timestamp column e.g. calendar date”
Solution: tumbling window trigger + dynamic query with system variables via SAP table option (filter)

Pattern II: “my data has an incremental column e.g. id/last copied date”
Solution: external control table/file + high watermark.

Get started via solution template:


Supported versions • SAP BW version 7.01 and above, on-prem or in the cloud*

• Open Hub Destination (OHD) local table


Supported SAP objects
• Underneath objects can be DSO, InfoCube, MultiProvider, DataSource etc.

Supported server type • Connect to Application Server or Message Server NEW

Supported authentications • Basic – username & password

• Built on top of SAP .NET Connector 3.0, pull data via NetWeaver RFC
Mechanism and prerequisites • Run on ADF Self-hosted Integration Runtime
• SAP side config: create SAP OHD in SAP BW to expose data

• Built-in parallel loading option based on OHD specific schema


Performance & Scalability • Performant to handle TB level data, with per run dozens millions to billion of rows &
observed several to 20s MB/s (varies per customers’ data/env.)
✓ Base request ID for incremental copy to filter
out already copied data
✓ Exclude last request to avoid partial data
✓ Built-in parallel copy to boost perf based on
OHD’s specific schema
Open Hub
Destination
Table

OHD

• What is OHD: Data Sources Data Store Objects Cubes


Transform Transform F-fact
Extraction (DTP) (DTP)
PSA
• Supported data: ECC (InfoPackage) DSO E-fact
E-fact

Activate

OHD types:
DTP DTP
• InfoObject
Master Data

OHD OHD OHD


Pipeline
Azure Data Stores

ADF Self-
Open Hub
SAP .NET hosted
Destination
Table Connector Integration
Runtime
SAP BW OHD table ADF
Single Copy Activity execution
Request ID Package ID Record ID … e.g. set Parallel Copy = 4
1 1 1 …
1 1 2 …
SAP BW DTP 1 1 … …
execution #1: 1 2 1 …
unique Request ID
1 2 2 …
1 2 … …
1 3 2 …
1 … … …
2 … … …
SAP BW DTP
execution #2 2 … … …
2 … … …
……
… … … …
Request ID Package ID Record ID …
… … … …
100 … … …
… … … … Exclude Last request ID:
• Applicable if DTP and Copy may run at the same time
200 … … …

300 … … …
300 … … …
guidance
• SAP ECC version 7.0 and above
Supported versions
• Any entities exposed by SAP ECC OData services

• Entities exposed by SAP OData services


Supported SAP objects
• BAPI, ODP (DataExtractors/DataSource), etc.

Supported authentications • Basic – user name & password

• Though OData + SAP Gateway


• Run on Self-hosted Integration Runtime if SAP in private network
Mechanism and prerequisites
• SAP side config: set up SAP Gateway, activate OData service, and
expose entities
Pipeline
Azure Data Stores

• If your ECC is publicly


accessible, you can use
OData
ADF Self-hosted managed Azure Integration
SAP Integration Runtime Runtime instead of Self-hosted
Gateway
Integration Runtime.
• Tip: per run limit to under 1
million rows
(in general, same as HANA in earlier slides)

Pattern I: “my data has timestamp column e.g. last modified time”
Solution: tumbling window trigger + dynamic query with system variables via OData query

Pattern II: “my data has an incremental column e.g. ID”


Solution: external control table/file + high watermark.

Pattern III: “my data is small in size as dimension data”


Solution: full copy and overwrite
Supported versions • SAP BW version 7.x, on-prem or in the cloud e.g. on Azure

Supported server type • Connect to Application Server

Supported SAP objects • InfoCubes and QueryCubes (including BEx queries)

Supported authentications • Basic – username & password

• Built on top of SAP NetWeaver library, pull data via RFC


Mechanism and prerequisites
• Run on Self-hosted Integration Runtime
Pipeline
Azure Data Stores

ADF Self-
SAP
hosted
NetWeaver
Integration
library
Runtime
Flexible control flow &
scheduling to scale out.
Pipeline Pipeline (multiple copy activities,
concurrency, partitions)

Azure IR

Cloud Data Stores Azure Data Stores

On-prem Data Stores Self-hosted IR


Elastic managed infra to
handle data at scale.
(configurable DIUs per run)

Customer managed infra


with scaling options.
(powerfulness, concurrency)
Self-hosted IR deployed on premises

Data Factory

Self-hosted IR
on premises

Azure Storage Azure SQL DB Azure SQL DW HDInsight Databricks


Data stores
VNet ACL

Corporate
Firewall Boundary
Self-hosted IR deployed on Azure VM

Data Azure Storage Azure SQL DB Azure SQL DW


Factory
VNet ACL

VNet
service
endpoints

Express Route
Data stores (private peering) Self-hosted IR on
Azure VM
HDInsight Databricks

Corporate Firewall Azure Virtual


Boundary Network
Self-hosted IR deployed on Azure VM

Data Azure Storage Azure SQL DB Azure SQL DW


Factory
VNet ACL

VNet
service
endpoints

Data stores
Self-hosted IR on
Azure VM
HDInsight Databricks

Azure Virtual
Network
Copy Data Tool

Solution Template
ADF Copy Activity Overview https://docs.microsoft.com/azure/data-factory/copy-activity-overview

SAP HANA Connector https://docs.microsoft.com/azure/data-factory/connector-sap-hana

SAP Table Connector https://docs.microsoft.com/azure/data-factory/connector-sap-table

SAP BW Open Hub Connector https://docs.microsoft.com/azure/data-factory/connector-sap-business-warehouse-open-hub

SAP BW MDX Connector https://docs.microsoft.com/azure/data-factory/connector-sap-business-warehouse

SAP ECC Connector https://docs.microsoft.com/azure/data-factory/connector-sap-ecc

SAP C4C Connector https://docs.microsoft.com/azure/data-factory/connector-sap-cloud-for-customer

• Analytics and Integration for SAP Global Instance running on-premises with ADF
• Reckitt Benckiser (RB): https://customers.microsoft.com/story/reckitt-benckiser-consumer-
Customer case study
goods-power-bi
• Newell: https://customers.microsoft.com/story/newell-brands-consumer-goods-azure

You might also like