DP203 - 216 Questions
DP203 - 216 Questions
DP203 - 216 Questions
1
You use Azure Stream Analytics to receive data from Azure Event Hubs and to output the data to
an Azure Blob Storage account. You need to output the count of records received from the last five
minutes every minute. Which windowing function should you use?
a) Session
b) Tumbling
c) Sliding
d) Hopping
DP-203: Exam Q&A Series – Part 2
2
You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Users will
query data by using a variety of services including Azure Databricks and Azure Synapse Analytics
serverless SQL pools. The data will be secured by subject area. Most queries will include data from
the current year or current month. Which folder structure should you recommend to support fast
queries and simplified folder security?
a) /{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv
b) /{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
c) /{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
d) /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv
DP-203: Exam Q&A Series – Part 2
3
You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The
solution must meet the customer sentiment analytic requirements. Which three Transact-SQL DDL
commands should you run in sequence?
To answer, move the appropriate commands from the list of commands to the answer area and
arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the
correct orders you select.
Commands Answer Area
CREATE EXTERNAL DATA SOURCE CREATE EXTERNAL DATA SOURCE
CREATE EXTERNAL FILE FORMAT CREATE EXTERNAL FILE FORMAT
CREATE EXTERNAL TABLE CREATE EXTERNAL TABLE AS SELECT
CREATE EXTERNAL TABLE AS SELECT
CREATE EXTERNAL SCOPED CREDENTIALS
DP-203: Exam Q&A Series – Part 2
4
You have created an external table named ExtTable in Azure Data Explorer. Now, a database user
needs to run a KQL (Kusto Query Language) query on this external table. Which of the following
function should he use to refer to this table?
a) external_table()
b) access_table()
c) ext_table()
d) None of the above
DP-203: Exam Q&A Series – Part 2
5
You are working as a data engineer in a company. Your company wants you to ingest data onto
cloud data platforms in Azure. Which data processing framework will you use?
a) Online transaction processing (OLTP)
b) Extract, transform, and load (ETL)
c) Extract, load, and transform (ELT)
ELT is a typical process for ingesting data from an on-premises database into the Azure cloud.
DP-203: Exam Q&A Series – Part 2
6
You have an Azure Synapse workspace named
MyWorkspace that contains an Apache Spark CREATE TABLE mytestdb.myParquetTable(
EmployeeID int,
database named mytestdb. You run the
EmployeeName string,
following command in an Azure Synapse EmployeeStartDate date)
Analytics Spark pool in MyWorkspace.
USING Parquet - You then use Spark to insert a
EmployeeName EmployeeID EmplyeeStartDate
row into mytestdb.myParquetTable. The row
Peter 1001 28-July-2022
contains the following data.
One minute later, you execute the following SELECT EmployeeID -
query from a serverless SQL pool in FROM mytestdb.dbo.myParquetTable
MyWorkspace. What will be returned by the WHERE name = ‘Peter’;
query?
a) 24 b) en error c) a null value
DP-203: Exam Q&A Series – Part 2
7
In Structured data you define data type at query time.
True False
8
In Un-Structured data you define data type at query time.
True False
The schema of unstructured data is typically defined at query time. This means
that data can be loaded onto a data platform in its native format.
Parquet stores data in columns. By their very nature, Avro stores data in a row-based format. Row-based
column-oriented data stores are optimized for read- databases are best for write-heavy transactional
heavy analytical workloads. workloads. An Avro schema is created using JSON
format. Avro format supports timestamps.
You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to
count the tweets in each 10-second window. The solution must ensure that each tweet is counted
only once.
Solution: You use a session window that uses a timeout size of 10 seconds.
Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 3
14
You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to
count the tweets in each 10-second window. The solution must ensure that each tweet is counted
only once.
Solution: You use a sliding window, and you set the window size to 10 seconds. Does this meet the
goal?
Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 3
15
You are designing an Azure Stream Analytics solution that will analyze Twitter data. You need to
count the tweets in each 10-second window. The solution must ensure that each tweet is counted
only once.
Solution: You use a tumbling window, and you set the window size to 10 seconds. Does this meet
the goal?
Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 3
16
What are the key components of Azure Data Factory. [Select all options that are applicable]
a) Database
b) Connection String
c) Pipelines
d) Activities
e) Datasets
f) Linked services
g) Data Flows
h) Integration Runtimes
DP-203: Exam Q&A Series – Part 3
17
Which of the following are valid trigger types of Azure Data Factory. [Select all options that are
applicable]
a) Monthly Trigger
b) Schedule Trigger
c) Overlap Trigger
d) Tumbling Window Trigger
e) Event-based Trigger
DP-203: Exam Q&A Series – Part 3
18
You are designing an Azure Stream Analytics solution that receives instant messaging data from
an Azure Event Hub. You need to ensure that the output from the Stream Analytics job counts the
number of messages per time zone every 15 seconds. How should you complete the Stream
Analytics query? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Select TimeZone, count(*) as MessageCount
From MessageStream Last CreatedAt
Over
SYSTEM.TIMESTAMP()
TIMESTAMP BY
Yes No
20
Duplicating customer content for redundancy and meeting service-level agreements (SLAs) is
Azure High availability.
Yes No
DP-203: Exam Q&A Series – Part 3
21
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts.
Contacts contains a column named Phone. You need to ensure that users in a specific role only
see the last four digits of a phone number when querying the Phone column. What should you
include in the solution?
a) column encryption
b) dynamic data masking
c) a default value
d) table partitions
e) row level security (RLS)
DP-203: Exam Q&A Series – Part 3
22
A company has data lake which is accessible only via an Azure virtual network. You are building a
SQL pool in Azure Synapse which will use data from the data lake and is planned to
load data to the SQL pool every hour. You need to make sure that the SQL pool can load the data
from the data lake. Which TWO actions should you perform?
a) Create a service principal
b) Create a managed identity
c) Add an Azure Active Directory Federation Service ( ADFS ) account
d) Configure managed identity as credentials for the data loading process
DP-203: Exam Q&A Series – Part 3
23
You have an Azure Data Lake Storage Gen2 container. Data is ingested into the container, and then
transformed by a data integration application. The data is NOT modified after that. Users can read files in
the container but cannot modify the files. You need to design a data archiving solution that meets the
following requirements:
• New data is accessed frequently and must be available as quickly as possible.
• Data that is older than five years is accessed infrequently but must be available within one second when
requested.
• Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the
lowest cost possible.
• Costs must be minimized while maintaining the required availability.
How should you manage the data? To answer, select the appropriate options in the answer area.
Five-Year-old data Seven-Year-old data
Delete the Blob Delete the Blob
Move to Hot storage Move to Hot storage
Move to Cool storage Move to Cool storage
Move to Archive storage Move to Archive storage
DP-203: Exam Q&A Series – Part 3
24
As a data engineer you need to suggest Stream Analytics data output format to make sure that the queries
from Databricks and PolyBase against the files encounter with less errors. The solution should make sure
that the files can be queried fast, and the data type information is kept intact. What should you suggest?
a) JSON
b) XML
c) Avro
d) Parquet
DP-203: Exam Q&A Series – Part 3
25
Which role works with Azure Cognitive Services, Cognitive Search, and the Bot Framework?
a) A data engineer
b) A data scientist
c) An AI engineer
DP-203: Exam Q&A Series – Part 4
26
Which role is correct for a person who works being responsible for the provisioning and configuration of
both on-premises and cloud data platform technologies?
a) A data engineer
b) A data scientist
c) An AI engineer
27
Who performs advanced analytics to help drive value from data.
a) A data engineer
b) A data scientist
c) An AI engineer
DP-203: Exam Q&A Series – Part 4
28
Choose the valid examples of Structured Data.
a) Microsoft SQL Server
b) Binary Files
c) Azure SQL Database
d) Audio Files
e) Azure SQL Data Warehouse
f) Image Files
DP-203: Exam Q&A Series – Part 4
29
Choose the valid examples of Un-Structured Data.
a) Microsoft SQL Server
b) Binary Files
c) Azure SQL Database
d) Audio Files
e) Azure SQL Data Warehouse
f) Image Files
DP-203: Exam Q&A Series – Part 4
30
Azure Databricks is?
a) data analytics platform
b) AI platform
c) Data ingestion platform
DP-203: Exam Q&A Series – Part 4
31
Azure Databricks encapsulates which Apache Storage technology?
a) Apache HDInsight
b) Apache Hadoop
c) Apache Spark
Shared Access Keys are a security feature used within Azure storage accounts.
Azure Active Directory and Role-based access are supported security features in
Azure Databricks.
DP-203: Exam Q&A Series – Part 4
33
Which of the following Azure Databricks is used for support for R, SQL, Python, Scala, and Java?
a) MLlib
b) GraphX
c) Spark Core API
a) MLlib is the Machine Learning library consisting of common learning algorithms and
utilities, including classification, regression, clustering, collaborative filtering,
dimensionality reduction, as well as underlying optimization primitives.
b) GraphX provides graphs and graph computation for a broad scope of use cases from
cognitive analytics to data exploration.
c) Spark Core API support for R, SQL, Python, Scala, and Java in Azure Databricks.
DP-203: Exam Q&A Series – Part 4
34
Which Notebook format is used in Databricks?
a) DBC
b) .notebook
c) .spark
DBC file types are the supported Databricks notebook format. There is no .notebook or
.spark file format available
DP-203: Exam Q&A Series – Part 4
35
You configure version control for an Azure Data Factory instance as shown in the following exhibit
Use the drop-down menus to select the answer
choice that completes each statement based on
the information presented in the graphic.
NOTE: Each correct selection is worth one point.
Azure Resource Manager (ARM) templates for the
pipeline's assets are stored in
adf_publish
main
Parameterization template
A Data Factory Azure Resource Manager(ARM)
template named contososales can be found in
/contososales
/dw_batchetl/adf_publish/contososales
/main
DP-203: Exam Q&A Series – Part 4
36
You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.
Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file
contains the same data attributes and data from a subsidiary of your company.
You need to move the files to a different folder and transform the data to meet the following requirements:
• Provide the fastest possible query times.
• Automatically infer the schema from the underlying files.
How should you configure the Data Factory copy activity? To answer, select the appropriate options in the
answer area.
NOTE: Each correct selection is worth one point.
Copy behavior Sink File Type
Flatten hierarchy csv
Merge files json
Preserve hierarchy Parquet
TXT
DP-203: Exam Q&A Series – Part 4
37
You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as
shown in the following exhibit. All the dimension tables will be less than 2 GB after compression, and the
fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts
and updates. Which type of table should you use for each table? To answer, select the appropriate options
in the answer area. NOTE: Each correct selection is worth one point.
Dim_Customer Dim_Employee
Hash Distributed Hash Distributed
Round Robin Round Robin
Replicated Replicated
Dim_Time Fact_DailyBookings
Hash Distributed Hash Distributed
Round Robin Round Robin
Replicated Replicated
DP-203: Exam Q&A Series – Part 4
38
You are designing a data engineering solution for data stream processing. You need to recommend a
solution for data ingestion, in order to meet the following requirements:
• Ingest millions of events per second
• Easily scale from streaming megabytes of data to terabytes while keeping control over when and how
much to scale
• Integrate with Azure Functions
• Natively connected with Stream Analytics to build an end-to-end serverless streaming solution.
What would you recommend?
a) Azure Cosmos DB
b) Apache Spark
c) Azure Synapse Analytics
d) Azure Event Hubs
DP-203: Exam Q&A Series – Part 4
39
You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source
big data solution to collect, process, and maintain data. The analytical data store performs poorly.
You must implement a solution that meets the following requirements:
• Provide data warehousing
• Reduce ongoing management activities
• Deliver SQL query responses in less than one second
You need to create an HDInsight cluster to meet the requirements. Which type of cluster should you create?
a) Apache HBase
Apache Spark supports:
b) Apache Hadoop
• Interactive queries through spark-sql
c) Interactive Query • Data-warehousing capabilities
d) Apache Spark • Less management because these are out-of-the-box
features
DP-203: Exam Q&A Series – Part 4
40
Which data platform technology is a globally distributed, multi-model database that can perform queries in
less than a second?
a) SQL Database
b) Azure SQL database
c) Apache Hadoop
d) Cosmos DB
e) Azure SQL Synapse
Yes No
43
Azure Storage is the least expensive choice when you want to store data but don't need to query it?
Yes No
DP-203: Exam Q&A Series – Part 5
44
Unstructured data is stored in nonrelational systems, commonly called unstructured or NoSQL
Yes No
Examples of unstructured data include binary, audio, and image files. Unstructured data
is stored in nonrelational systems, commonly called unstructured or NoSQL systems. In
nonrelational systems, the data structure isn't defined at design time, and data is
typically loaded in its raw format. The data structure is defined only when the data is
read.
DP-203: Exam Q&A Series – Part 5
45
You are designing an Azure Stream Analytics job to process incoming events from sensors in retail
environments. You need to process the events to produce a running average of shopper counts
during the previous 15 minutes, calculated at five-minute intervals. Which type of window should
you use?
a) snapshot
b) tumbling
c) hopping
d) sliding
DP-203: Exam Q&A Series – Part 5
46
You are implementing an Azure Data Lake Gen2 storage account. You need to ensure that data will
be accessible for both read and write operations, even if an entire data center (zonal or non-zonal)
becomes unavailable. Which kind of replication would you use for the storage account? (Choose
the solution with minimum cost)
a) Locally-redundant storage (LRS)
b) Zone-redundant storage (ZRS)
c) Geo-redundant storage (GRS)
d) Geo-zone-redundant storage (GZRS)
DP-203: Exam Q&A Series – Part 5
47
You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data. You need to
ensure that the data in the container is available for read workloads in a secondary region if an
outage occurs in the primary region. The solution must minimize costs. Which type of data
redundancy should you use?
a) geo-redundant storage (GRS)
b) read-access geo-redundant storage (RA-GRS)
c) zone-redundant storage (ZRS)
d) locally-redundant storage (LRS)
DP-203: Exam Q&A Series – Part 5
48
You plan to implement an Azure Data Lake Gen 2 storage account. You need to ensure that the
data lake will remain available if a data center fails in the primary Azure region. The solution must
minimize costs. Which type of replication should you use for the storage account?
The ABS-AQS connector provides an optimized file source that uses Azure Queue
Storage (AQS) to find new files written to an Azure Blob storage (ABS) container without
repeatedly listing all of the files. This provides two advantages:
a) Lower latency: no need to list nested directory structures on ABS, which is slow and
resource intensive.
b) Lower costs: no more costly LIST API requests made to ABS.
DP-203: Exam Q&A Series – Part 5
54
You have a partitioned table in an Azure Synapse Analytics dedicated SQL pool. You need to design
queries to maximize the benefits of partition elimination. What should you include in the Transact-
SQL queries?
a) JOIN
b) WHERE
c) DISTINCT
d) GROUP BY
When you add the "WHERE" clause to your T-SQL query it allows the query optimizer
accesses only the relevant partitions to satisfy the filter criteria of the query - which is
what partition elimination is all about
DP-203: Exam Q&A Series – Part 5
55
You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The
table contains 50 columns and 5 billion rows and is a heap. Most queries against the table
aggregate values from approximately 100 million rows and return only two columns. You discover
that the queries against the fact table are very slow. Which type of index should you add to provide
the fastest query times?
a) nonclustered columnstore
b) clustered columnstore
c) nonclustered
d) clustered
Clustered columnstore indexes are one of the most efficient ways you can store your data in
dedicated SQL pool. Columnstore tables won't benefit a query unless the table has more than
60 million rows.
DP-203: Exam Q&A Series – Part 6
56
You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool. How
should you complete the Transact-SQL statement? To answer, drag the appropriate values to the
correct targets. Each value may be used once, more than once, or not at all. You may need to drag
the split bar between panes or scroll to view content.
When moving data between Azure data platform technologies, the Azure
Integration runtime is used when copying data between two Azure data platform.
DP-203: Exam Q&A Series – Part 6
61
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two
outputs. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 6
62
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one query, and two outputs. Does this
meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 6
63
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input,
two queries, and four outputs. Does this meet the goal?
Yes No
• We need one reference data input for LocationIncomes, which rarely changes.
• We need two queries, one for in-store customers, and one for online customers.
• For each query two outputs is needed. That makes a total of four outputs.
DP-203: Exam Q&A Series – Part 6
64
You have an Azure Data Lake Storage account that contains a staging zone. You need to design a
daily process to ingest incremental data from the staging zone, transform the data by executing an
R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data
to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.
Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 6
65
Which Azure Data Factory component contains the transformation logic or the analysis commands
of the Azure Data Factory’s work?
a) Linked Services
b) Datasets
c) Activities
d) Pipelines
DP-203: Exam Q&A Series – Part 6
66
You have an Azure Data Factory that contains 10 pipelines. You need to label each pipeline with its
main purpose of either ingest, transform, or load. The labels must be available for grouping and
filtering when using the monitoring experience in Data Factory. What should you add to each
pipeline?
a) a resource tag
b) a user property
c) an annotation
d) a run group ID
e) a correlation ID
• Annotations are additional, informative tags that you can add to specific
factory resources: pipelines, datasets, linked services, and triggers. By
adding annotations, you can easily filter and search for specific factory
resources.
DP-203: Exam Q&A Series – Part 6
67
You have an Azure Storage account and an Azure SQL data warehouse in the UK South region.
You need to copy blob data from the storage account to the data warehouse by using Azure Data
Factory. The solution must meet the following requirements:
• Ensure that the data remains in the UK South region at all times.
• Minimize administrative effort.
Which type of integration runtime should you use?
a) Azure integration runtime
b) Self-hosted integration runtime
c) Azure-SSIS integration runtime
DP-203: Exam Q&A Series – Part 6
68
You are planning to use Azure Databricks clusters for a single user. Which type of Databricks
cluster should you use?
a) Standard
b) Single Node
c) High Concurrency
DP-203: Exam Q&A Series – Part 6
69
You are planning to use Azure Databricks clusters that provide fine-grained sharing for maximum
resource utilization and minimum query latencies. It should also be a managed cloud resource.
Which type of Databricks cluster should you use?
a) Standard
b) Single Node
c) High Concurrency
DP-203: Exam Q&A Series – Part 6
70
You are planning to use Azure Databricks clusters with no workers and runs Spark jobs on the
driver node. Which type of Databricks cluster should you use?
a) Standard
b) Single Node
c) High Concurrency
DP-203: Exam Q&A Series – Part 7
71
Which Azure Data Factory component orchestrates a transformation job or runs a data movement
command?
a) Linked Services
b) Datasets
c) Activities
Linked Services are objects that are used to define the connection to data stores or
compute resources in Azure.
DP-203: Exam Q&A Series – Part 7
72
You have an Azure virtual machine that has Microsoft SQL Server installed. The server contains a
table named Table1. You need to copy the data from Table1 to an Azure Data Lake Storage Gen2
account by using an Azure Data Factory V2 copy activity.
Which type of integration runtime should you use?
a) Azure integration runtime
b) Self-hosted integration runtime
c) Azure-SSIS integration runtime
DP-203: Exam Q&A Series – Part 7
73
Which browsers are recommended for best use with Azure Databricks?
a) Google Chrome
b) Firefox
c) Safari
d) Microsoft Edge
e) Internet Explorer
f) Mobile browsers
DP-203: Exam Q&A Series – Part 7
74
How do you connect your Spark cluster to the Azure Blob?
a) By calling the .connect() function on the Spark Cluster.
b) By mounting it
c) By calling the .connect() function on the Azure Blob
DP-203: Exam Q&A Series – Part 7
75
How does Spark connect to databases like MySQL, Hive and other data stores?
a) JDBC
b) ODBC
c) Using the REST API Layer
JDBC stands for Java Database Connectivity. It is a Java API for connecting to databases
such as MySQL, Hive, and other data stores. ODBC is not an option, and the REST API
Layer is not available
DP-203: Exam Q&A Series – Part 7
76
You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake
Storage Gen2 container. Which resource provider should you enable?
a) Microsoft.Sql
b) Microsoft.Automation
c) Microsoft.EventGrid
d) Microsoft.EventHub
DP-203: Exam Q&A Series – Part 7
77
You plan to perform batch processing in Azure Databricks once daily. Which Azure Databricks
Cluster should you choose?
a) High Concurrency
b) interactive
c) automated
Azure Databricks has two types of clusters: interactive and automated.
• You use interactive clusters to analyze data collaboratively with interactive notebooks.
• You use automated clusters to run fast and robust automated jobs.
DP-203: Exam Q&A Series – Part 7
78
Which Azure Data Factory component contains the transformation logic or the analysis commands
of the Azure Data Factory’s work?
a) Linked Services
b) Datasets
c) Activities
d) Pipelines
DP-203: Exam Q&A Series – Part 7
79
You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be
stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and
PolyBase in Azure Synapse Analytics. You need to recommend a Stream Analytics data output
format to ensure that the queries from Databricks and PolyBase against the files encounter the
fewest possible errors. The solution must ensure that the files can be queried quickly, and that the
data type information is retained. What should you recommend?
a) JSON
b) Parquet
c) CSV
d) Avro
DP-203: Exam Q&A Series – Part 7
80
You have a self-hosted integration runtime in Azure Data Factory.
The current status of the integration runtime has the following
Lowered The integration
Fail until theruntime
nodehas the following
comes node details:
back online
configurations: • Status: Running • Name: X-M False
Concurrent Jobs (Running/Limit): 2/14 . CPU
• Type: Self-Hosted
High Availability Enabled:
• Status: Running
Utilization: 6% • Version: 4.4.7292.1 • Version: 4.4.7292.1
You are paying • Running
for 14 /concurrent
Registered Node(s):
jobs,1/1
but you • Available Memory: 7697MB
High Availability Enabled: False
are only using•• 2. You are only using 6 % of the
Linked Count: 0
• CPU Utilization: 6%
• Network (In/Out): 1.21KBps/0.83KBps
CPU you have• purchased,
Queue Length: so0 you are paying • Concurrent Jobs (Running/Limit): 2/14
for 94 % that•you Average
do notQueue Duration. 0.00s
use. • Role: Dispatcher/Worker
• Credential Status: In Sync
If the X-M node becomes unavailable, all executed pipelines will:
The number of concurrent jobs and the CPU usage
fail until the node comes back online
indicate that the Concurrent jobs (Running/Limit values
switch to another integration runtime should be:
exceed the CPU limit Raised
Lowered
Left AS-IS
DP-203: Exam Q&A Series – Part 7
81
You have an Azure Databricks resource. You need to log actions that relate to compute changes
triggered by the Databricks resources. Which Databricks services should you log?
a) workspace An Azure Databricks cluster is a set of computation resources and
b) SSH configurations on which you run data engineering, data science, and data
analytics workloads.
c) DBFS
Incorrect Answers:
d) clusters a) An Azure Databricks workspace is an environment for accessing all of your
e) jobs Azure Databricks assets. The workspace organizes objects (notebooks,
libraries, and experiments) into folders, and provides access to data and
computational resources such as clusters and jobs.
b) SSH allows you to log into Apache Spark clusters remotely.
c) Databricks File System (DBFS) is a distributed file system mounted into an
Azure Databricks workspace and available on Azure Databricks clusters.
e) A job is a way of running a notebook or JAR either immediately or on a
scheduled basis.
DP-203: Exam Q&A Series – Part 7
82
Which Azure data platform is commonly used to process data in an ELT framework?
a) Azure Data Factory
b) Azure Databricks
c) Azure Data Lake Storage
DP-203: Exam Q&A Series – Part 7
83
Which Azure service is the best choice to manage and govern your data?
a) Azure Data Factory
b) Azure Purview
c) Azure Data Lake Storage
DP-203: Exam Q&A Series – Part 7
84
Applications that publish messages to Azure Event Hub very frequently will get the best
performance using Advanced Message Queuing Protocol (AMQP) because it establishes a
persistent socket.
True False
DP-203: Exam Q&A Series – Part 7
85
You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a
partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching
table and partition definitions. You need to overwrite the content of the first partition in dbo.Sales
with the content of the same partition in stg.Sales. The solution must minimize load times.
What should you do?
a) Insert the data from stg.Sales into dbo.Sales.
b) Switch the first partition from dbo.Sales to stg.Sales.
c) Switch the first partition from stg.Sales to dbo.Sales.
d) Update dbo.Sales from stg.Sales.
DP-203: Exam Q&A Series – Part 8
86
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will
contain the following three workloads:
• A workload for data engineers who will use Python and SQL
• A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
• A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team identifies the following standards for Databricks environments:
• The data engineers must share a cluster.
• The job cluster will be managed by using a request process whereby data scientists and data
engineers provide packaged notebooks for deployment to the cluster.
• All the data scientists must be assigned their own cluster that terminates automatically after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a High Concurrency cluster for each data scientist, a High Concurrency cluster
for the data engineers, and a Standard cluster for the jobs. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 8
87
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will
contain the following three workloads:
• A workload for data engineers who will use Python and SQL
• A workload for jobs that will run notebooks that use Python, Spark, Scala, and SQL
• A workload that data scientists will use to perform ad hoc analysis in Scala and R
The enterprise architecture team identifies the following standards for Databricks environments:
• The data engineers must share a cluster.
• The job cluster will be managed by using a request process whereby data scientists and data
engineers provide packaged notebooks for deployment to the cluster.
• All the data scientists must be assigned their own cluster that terminates automatically after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the
data engineers, and a High Concurrency cluster for the jobs. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 8
88
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will
There the
contain is no need for
following a workloads:
three High Concurrency cluster for each data scientist.
• A workload for data engineers who will use Python and SQL
A workload
•Standard for jobs are
clusters that will run notebooks that
recommended for use Python,
a single Spark,
user. Scala, andclusters
Standard SQL can run
A workload developed
•workloads that data scientists
in anywill use to perform
language: ad hocR,analysis
Python, Scala,inandScala and R
SQL.
The enterprise architecture team identifies the following standards for Databricks environments:
• The data engineers must share a cluster.
•A The
highjobconcurrency cluster isbyausing
cluster will be managed managed cloud
a request resource.
process The scientists
whereby data key benefits of high
and data
concurrency clusters
engineers provide are notebooks
packaged that theyforprovide Apache
deployment to the Spark-native
cluster. fine-grained
All the data
•sharing scientists must
for maximum be assigned
resource their ownand
utilization cluster that terminates
minimum query automatically
latencies.after 120
minutes of inactivity. Currently, there are three data scientists.
You need to create the Databrick clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the
data engineers, and a Standard cluster for the jobs. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 8
89
If an Event Hub goes offline before a consumer group can process the events it holds, those
events will be lost.
True False
Each consumer group has its own cursor maintaining its position within the
partition. The consumer groups can resume processing at their cursor
position when the Event Hub is again available.
DP-203: Exam Q&A Series – Part 8
90
You are a Data Engineer for Contoso. You want to view key health metrics of your Stream Analytics
jobs. Which tool in Streaming Analytics should you use?
a) Dashboards
b) Alerts
c) Diagnostics
a) Dashboard are used to view the key health metrics of your Stream Analytics
jobs.
b) Alerts enable proactive detection of issues in Stream Analytics.
c) Diagnostic logging is turned off by default and can help with root-cause
analysis in production deployments.
DP-203: Exam Q&A Series – Part 8
91
You are designing a real-time dashboard solution that will visualize streaming data from remote
sensors that connect to the internet. The streaming data must be aggregated to show the average
value of each 10-second interval. The data will be discarded after being displayed in the
dashboard. The solution will use Azure Stream Analytics and must meet the following
requirements:
- Minimize latency from an Azure Event hub to the dashboard.
- Minimize the required storage.
- Minimize development effort.
What should you include in the solution?
Azure Stream Analytics input type Azure Stream Analytics output type Aggregation Query location
Azure Event Hub Azure Event Hub Azure Event Hub
Azure SQL Database Azure SQL Database Azure SQL Database
Azure steam analytics Azure steam analytics Azure steam analytics
Azure Power BI Azure Power BI Azure Power BI
DP-203: Exam Q&A Series – Part 8
92
Publishers can use either HTTPS or AMQP. AMQP opens a socket and can send multiple messages
over that socket. How many default partitions are available?
a) 1
b) 2
c) 4
d) 8
e) 12
You need to distribute the large fact table across multiple nodes to optimize performance of the
table. Which technology should you use?
a) hash distributed table with clustered index
b) hash distributed table with clustered Columnstore index
c) round robin distributed table with clustered index
d) round robin distributed table with clustered Columnstore index
DP-203: Exam Q&A Series – Part 8
95
You have an enterprise data warehouse in Azure Synapse Analytics. Using PolyBase, you create an
external table named [Ext].[Items] to query Parquet files stored in Azure Data Lake Storage Gen2
without importing the data to the data warehouse. The external table has three columns. You
discover that the Parquet files have a fourth column named ItemID. Which command should you
run to add the ItemID column to the external table?
a b
c d
DP-203: Exam Q&A Series – Part 8
96
You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a
complex SELECT query that contains multiple JOIN and CASE statements to transform data for use
in inventory reports. The inventory reports will use the data and additional WHERE parameters
depending on the report. The reports will be produced once daily. You need to implement a
solution to make the dataset available for the reports. The solution must minimize query times.
What should you implement?
a) an ordered clustered columnstore index
b) a materialized view
c) result set caching
d) a replicated table
File2.csv File3.csv
You create an external table named ExtTable that has LOCATION='/topfolder/’. When you query
ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned?
a) File2.csv and File3.csv only
b) File1.csv and File4.csv only
c) File1.csv, File2.csv, File3.csv, and File4.csv
d) File1.csv only
DP-203: Exam Q&A Series – Part 9
108
You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics.
SalesFact contains sales data from the past 36 months and has the following characteristics:
a) Is partitioned by month b) Contains one billion rows c) Has clustered columnstore indexes
Beginning of each month, you need to remove data from SalesFact that is older than 36 months as
quickly as possible. Which three actions should you perform in sequence in a stored procedure?
Switch the partition containing the stale data
from SalesFact to SalesFact Work.
Create an empty table named SalesFact_Work
Truncate the partition containing the stale data
that has the same schema as SalesFact.
Drop the SalesFact_Work table.
Switch the partition containing the stale data
Create an empty table named SalesFact_Work from SalesFact to SalesFact_Work.
that has the same schema as SalesFact.
Drop the SalesFact_Work table.
Execute a DELETE statement where the value in
the Date column is more than 36 months ago.
Copy the data to a new table by using CREATE
TABLE AS SELECT (CTAS).
DP-203: Exam Q&A Series – Part 9
109
You develop data engineering solutions for a company. A project requires analysis of real-time
Twitter feeds. Posts that contain specific keywords must be stored and processed on Microsoft
Azure and then displayed by using Microsoft Power BI. You need to implement the solution. Which
five actions should you perform in sequence?
Create a Jupyter Notebook Create an HDInsight cluster with the Spark cluster type
Run a job that uses the spark streaming API to ingest data Create a Jupyter Notebook
from Twitter
Create a Runbook Create a table
Create an HDInsight cluster with the Spark cluster type Run a job that uses the spark streaming API to ingest data
from Twitter
Create a table
Load the hvac table to Power BI Desktop
Load the hvac table to Power BI Desktop
DP-203: Exam Q&A Series – Part 9
110
You have an Azure SQL database named DB1 in the East US 2 region. You need to build a
secondary geo-replicated copy of DB1 in the West US region on a new server. Which three actions
should you perform in sequence?
On the secondary server create logins that match the SIDs on From the Geo replication settings of the DB1 select West US
the primary server
Create a target server and select a pricing tier Create a target server and select a pricing tier
Yes No
DP-203: Exam Q&A Series – Part 10
119
You have an Azure subscription that contains an Azure Storage account. You plan to implement
changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100
days.
Solution: You apply an Azure Blob storage lifecycle policy. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 10
120
You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK
South region. You need to copy blob data from the storage account to the data warehouse by using
Azure Data Factory. The solution must meet the following requirements:
• Ensure that the data remains in the UK South region at all times.
• Minimize administrative effort.
Which type of integration runtime should you use?
a) Azure integration runtime
b) Azure-SSIS integration runtime
c) Self-hosted integration runtime
DP-203: Exam Q&A Series – Part 10
121
You want to ingest data from a SQL Server database hosted on an on-premises Windows Server.
What integration runtime is required for Azure Data Factory to ingest data from the on-premises
server?
a) Azure integration runtime
b) Azure-SSIS integration runtime
c) Self-hosted integration runtime
DP-203: Exam Q&A Series – Part 10
122
By default, how long are the Azure Data Factory diagnostic logs retained for?
a) 15 days
b) 30 days
c) 45 days
DP-203: Exam Q&A Series – Part 10
123
You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake
Storage Gen2 container. Which resource provider should you enable?
a) Microsoft.Sql
b) Microsoft.Automation
c) Microsoft.EventGrid
d) Microsoft.EventHub
Data Factory natively integrates with Azure Event Grid, which lets you trigger pipelines on
such events.
DP-203: Exam Q&A Series – Part 10
124
You have an Azure Data Factory instance that contains two pipelines named Pipeline1 & Pipeline2.
Pipeline1 has the activities shown in the following exhibit. Pipeline2 has the activities shown in the following exhibit.
You execute Pipeline2, and Stored procedure1 in Pipeline1 fails. What is the status of the pipeline
runs?
a) Pipeline1 and Pipeline2 succeeded.
b) Pipeline1 and Pipeline2 failed.
c) Pipeline1 succeeded and Pipeline2 failed.
d) Pipeline1 failed and Pipeline2 succeeded.
DP-203: Exam Q&A Series – Part 10
125
You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The
table will have a clustered columnstore index and will include the following columns:
• TransactionType: 40 million rows per transaction type
• CustomerSegment: 4 million per customer segment
• TransactionMonth: 65 million rows per month
• AccountType: 500 million per account type
You have the following query requirements:
• Analysts will most commonly analyze transactions for a given month.
• Transactions analysis will typically summarize transactions by transaction type, customer segment,
and/or account type
You need to recommend a partition strategy for the table to minimize query times. On which column should
you recommend partitioning the table?
a) CustomerSegment
b) AccountType
c) TransactionType
d) TransactionMonth
DP-203: Exam Q&A Series – Part 10
126
Your company wants to route data rows to different streams based on matching conditions. Which
transformation in the Mapping Data Flow should you use?
a) Conditional Split
b) Select
c) Lookup
A Conditional Split transformation routes data rows to different streams based on matching
conditions. The conditional split transformation is like a CASE decision structure in a
programming language.
A Lookup transformation is used to add reference data from another source to your Data
Flow.
DP-203: Exam Q&A Series – Part 10
127
Which transformation is used to load data into a data store or compute resource?
a) Source
b) Destination
c) Sink
d) Window
A Sink transformation allows you to choose a dataset definition for the destination output data.
You can have as many sink transformations as your data flow requires.
Specify a temporary folder to stage the data 4 Write the results to a table in Azure Synapse 5
Write the results to Data Lake Storage Perform transformations on the data frame 3
Drop the data frame Mount the Data Lake Storage onto DBFS 1
Read the file into a data frame 2 Perform transformations on the file
DP-203: Exam Q&A Series – Part 11
134
You are designing an Azure Databricks interactive cluster. You need to ensure that the cluster
meets the following requirements:
- Enable auto-termination
- Retain cluster configuration indefinitely after cluster termination.
What should you recommend?
a) Start the cluster after it is terminated.
b) Pin the cluster
c) Clone the cluster after it is terminated.
d) Terminate the cluster manually at process completion.
DP-203: Exam Q&A Series – Part 11
135
You are designing an Azure Databricks table. The table will ingest an average of 20 million
streaming events per day. You need to persist the events in the table for use in incremental load
pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load
times. What should you include in the solution?
a) Partition by DateTime fields.
b) Sink to Azure Queue storage.
c) Include a watermark column.
d) Use a JSON format for physical data storage.
DP-203: Exam Q&A Series – Part 11
136
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier.
You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must
meet the following requirements:
- Automatically scale down workers when the cluster is underutilized for three minutes.
- Minimize the time it takes to scale to the maximum number of workers.
- Minimize costs.
What should you do first?
a) Enable container services for workspace1.
b) Upgrade workspace1 to the Premium pricing tier.
c) Set Cluster Mode to High Concurrency.
d) Create a cluster policy in workspace1.
DP-203: Exam Q&A Series – Part 11
137
You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The
size of the files will vary based on the number of events that occur per hour. File sizes range from
4 KB to 5 GB. You need to ensure that the files stored in the container are optimized for batch
processing. What should you do?
a) Convert the files to JSON
b) Convert the files to Avro
c) Compress the files
d) Merge the files
DP-203: Exam Q&A Series – Part 11
138
You are planning a solution to aggregate streaming data that originates in Apache Kafka and is
output to Azure Data Lake Storage Gen2. The developers who will implement the stream
processing solution use Java. Which service should you recommend using to process the
streaming data?
a) Azure Event Hubs
b) Azure Data Factory
c) Azure Stream Analytics
d) Azure Databricks
DP-203: Exam Q&A Series – Part 11
139
You need to implement an Azure Databricks cluster that automatically connects to Azure Data
Lake Storage Gen2 by using Azure Active Directory (Azure AD) integration. How should you
configure the new cluster?
Tier Advanced option to enable
Premium Azure Data Lake Storage Credential Passthrough
Standard Table access control
Credential passthrough requires an Azure Databricks You can access Azure Data Lake Storage using Azure
Premium Plan Active Directory credential passthrough.
When you enable your cluster for Azure Data Lake
Storage credential passthrough, commands that you
run on that cluster can read and write data in Azure
Data Lake Storage without requiring you to configure
service principal credentials for access to storage.
DP-203: Exam Q&A Series – Part 11
140
Which Azure Data Factory process involves using compute services to produce data to feed
production environments with cleansed data?
a) Connect and collect
b) Transform and enrich
c) Publish
d) Monitor
DP-203: Exam Q&A Series – Part 11
141
You have a new Azure Data Factory environment. You need to periodically analyze pipeline
executions from the last 60 days to identify trends in execution durations. The solution must use
Azure Log Analytics to query the data and create charts. Which diagnostic settings should you
configure in Data Factory? To answer, select the appropriate options in the answer area.
Log Type Storage Location
ActivityRuns An Azure event hub
AllMetrics An Azure storage account
PipelineRuns Azure Log Analytics
TriggerRuns
DP-203: Exam Q&A Series – Part 11
142
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL
pool. You create a table by using the Transact-SQL statement shown in the following exhibit.
Use the drop-down menus to select the answer
choice that completes each statement based
on the information presented in the graphic.
DimProduct is a [answer choice] slowly changing
dimension (SCD)
Type 0
Type 1
Type 2
Advanced option to enable
A surrogate key
A business key
An audit column
DP-203: Exam Q&A Series – Part 11
142 Explanation
Type 2 SCD: supports versioning of dimension members. Often the source system doesn't store
versions, so the data warehouse load process detects and manages changes in a dimension
table. In this case, the dimension table must use a surrogate key to provide a unique reference
to a version of the dimension member. It also includes columns that define the date range
validity of the version.
Business key: A business key or natural key is an index which identifies uniqueness of a row
based on columns that exist naturally in a table according to business rules. For example,
business keys are customer code in a customer table, composite of sales order header number
and sales order item line number within a sales order details table.
Reference:
DP-203: Exam Q&A Series – Part 11
143
You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure
Data Lake Storage Gen2 container. Which type of trigger should you use?
a) on-demand
b) tumbling window
c) schedule
d) event
DP-203: Exam Q&A Series – Part 11
144
You have two Azure Data Factory instances named ADFdev and ADFprod. ADFdev connects to an
Azure DevOps Git repository. You publish changes from the main branch of the Git repository to
ADFdev. You need to deploy the artifacts from ADFdev to ADFprod. What should you do first?
a) From ADFdev, modify the Git configuration.
b) From ADFdev, create a linked service.
c) From Azure DevOps, create a release pipeline.
d) From Azure DevOps, update the main branch.
DP-203: Exam Q&A Series – Part 11
145
You have an Azure data factory. You need to examine the pipeline failures from the last 60 days.
What should you use?
a) the Activity log blade for the Data Factory resource
b) the Monitor & Manage app in Data Factory
c) the Resource health blade for the Data Factory resource
d) Azure Monitor
DP-203: Exam Q&A Series – Part 12
146
Your company is building a Datawarehouse where they want to keep track of changes in customer
mailing address. You want to keep the current mailing address and the previous one. Which SCD
type should you use?
a) Type 1 SCD
b) Type 2 SCD
c) Type 3 SCD
d) Type 6 SCD
DP-203: Exam Q&A Series – Part 12
147
Your company is building a Datawarehouse where they want to keep only the latest vendor’s
company name from whom your company purchases raw materials. Which SCD type should you
use?
a) Type 1 SCD
b) Type 2 SCD
c) Type 3 SCD
d) Type 6 SCD
DP-203: Exam Q&A Series – Part 12
148
Your company is building a Datawarehouse where they want to keep track of changes in customer
mailing address. You want to keep the current mailing address and the previous one. Both new and
old mailing address should be stored as different rows. Which SCD type should you use?
a) Type 1 SCD
b) Type 2 SCD
c) Type 3 SCD
d) Type 6 SCD
DP-203: Exam Q&A Series – Part 12
Cheat Sheet
Type Use Case
Type 1 SCD When you want to maintain the latest value of Record. Each record will always have one
row
Type 2 SCD Maintain version of the record using columns that define the date range validity of the
version (for example, StartDate and EndDate) and possibly a flag column (for example,
IsCurrent) to easily filter by current dimension members. Different rows
Type 3 SCD When you maintain two versions of a dimension member as separate columns. It uses
additional columns to track one key instance of history, rather than storing additional
rows to track each change like in a Type 2 SCD
DP-203: Exam Q&A Series – Part 12
149
You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write
the results to Azure Blob storage. You need to calculate the difference in readings per sensor per
hour. How should you complete the query?
SELECT sensorId,
growth = reading (reading) OVER (PARTITION BY sensorId (hour, 1)) FROM input
LAG LIMIT
DURATION
LAST
OFFSET
LEAD
WHEN
DP-203: Exam Q&A Series – Part 12
150
You have an Azure Synapse Analytics dedicated SQL pool. You need to ensure that data in the pool
is encrypted at rest. The solution must NOT require modifying applications that query the data.
What should you do?
a) Enable encryption at rest for the Azure Data Lake Storage Gen2 account.
b) Enable Transparent Data Encryption (TDE) for the pool.
c) Use a customer-managed key to enable double encryption for the Azure Synapse workspace.
d) Create an Azure key vault in the Azure subscription grant access to the pool.
Transparent Data Encryption (TDE) helps protect against the threat of malicious activity by
encrypting and decrypting your data at rest. When you encrypt your database, associated
backups and transaction log files are encrypted without requiring any changes to your
applications. TDE encrypts the storage of an entire database by using a symmetric key called the
database encryption key.
DP-203: Exam Q&A Series – Part 12
151
You have an Azure subscription that contains a logical Microsoft SQL server named Server1.
Server1 hosts an Azure Synapse Analytics SQL dedicated pool named Pool1. You need to
recommend a Transparent Data Encryption (TDE) solution for Server1. The solution must meet the
following requirements:
- Track the usage of encryption keys.
- Maintain the access of client apps to Pool1 in the event of an Azure datacenter outage that
affects the availability of the encryption keys.
What should you include in the recommendation?
To Track encryption key usage To maintain client app access in the event of a
Always Encrypted datacenter outage
TDE with customer-managed keys Create and configure Azure key vaults in two Azure
regions
TDE with platform-managed keys
Enable Advanced Data security on server1
Implement the client apps by using a Microsoft .NET
Framework data provider
DP-203: Exam Q&A Series – Part 12
152
You plan to create an Azure Synapse Analytics dedicated SQL pool. You need to minimize the time
it takes to identify queries that return confidential information as defined by the company's data
privacy regulations and the users who executed the queues. Which two components should you
include in the solution?
a) sensitivity-classification labels applied to columns that contain confidential information
b) resource tags for databases that contain confidential information
c) audit logs sent to a Log Analytics workspace
d) dynamic data masking for columns that contain confidential information
DP-203: Exam Q&A Series – Part 12
153
While using Azure Data Factory you want to parameterize a linked service and pass dynamic values
at run time. Which supported connector should you use?
a) Azure Data Lake Storage Gen2
b) Azure Data Factory variables
c) Azure Synapse Analytics
d) Azure Key Vault
DP-203: Exam Q&A Series – Part 12
154
Which file formats Azure Data Factory support?
a) Avro format
b) Binary format
c) Delimited text format
d) Excel format
e) JSON format
f) ORC format
g) Parquet format
h) XML format
i) ALL OF THE ABOVE
DP-203: Exam Q&A Series – Part 12
155
Which property indicates the parallelism, you want the copy activity to use?
a) parallelCopies
b) stagedCopies
c) multiCopies
DP-203: Exam Q&A Series – Part 12
156
Using the Azure Data Factory user interface (UX) you want to create a pipeline that copies and
transforms data from an Azure Data Lake Storage (ADLS) Gen2 source to an ADLS Gen2 sink using
mapping data flow. Choose the correct steps in right order.
a) Create a data factory account
b) Create a data factory. 1
c) Create a copy activity
d) Create a pipeline with a Data Flow activity. 2
e) Validate copy activity
f) Build a mapping data flow with four transformations. 3
g) Test run the pipeline. 4
h) Monitor a Data Flow activity 5
DP-203: Exam Q&A Series – Part 12
157
In Azure Data Factory: What is an example of a branching activity used in control flows?
a) The If-condition
b) Until-condition
c) Lookup-condition
DP-203: Exam Q&A Series – Part 12
158
Which activity can retrieve a dataset from any of the data sources supported by data factory and
Synapse pipelines?
a) Find activity
b) Lookup activity
c) Validate activity
DP-203: Exam Q&A Series – Part 12
159
You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a
complex SELECT query that contains multiple JOIN and CASE statements to transform data for use
in inventory reports. The inventory reports will use the data and additional WHERE parameters
depending on the report. The reports will be produced once daily. You need to implement a
solution to make the dataset available for the reports. The solution must minimize query times.
What should you implement?
a) an ordered clustered columnstore index
b) a materialized view
c) result set caching
d) a replicated table
DP-203: Exam Q&A Series – Part 12
160
Which Azure service should you use to provide customer-facing reports, dashboards, and analytics
in your own applications
a) Azure reports
b) Azure Power BI
c) Azure Monitor
DP-203: Exam Q&A Series – Part 13
161
You have an Azure subscription that contains an Azure Storage account. You plan to implement
changes to a data storage solution to meet regulatory and compliance standards.
Every day, Azure needs to identify and delete blobs that were NOT modified during the last 100
days.
Solution: You apply an expired tag to the blobs in the storage account. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 13
162
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text
and numerical values. 75% of the rows contain description data that has an average length of 1.1
MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You copy the files to a table that has a columnstore index. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 13
163
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text
and numerical values. 75% of the rows contain description data that has an average length of 1.1
MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You modify the files to ensure that each row is more than 1 MB. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 13
164
You have an Azure Storage account that contains 100 GB of files. The files contain rows of text
and numerical values. 75% of the rows contain description data that has an average length of 1.1
MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure
Synapse Analytics. You need to prepare the files to ensure that the data copies quickly.
Solution: You convert the files to compressed delimited text files.
Yes No
DP-203: Exam Q&A Series – Part 13
165
You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool
named Pool1. You plan to create a database named DB1 in Pool1. You need to ensure that when
tables are created in DB1, the tables are available automatically as external tables to the built-in
serverless SQL pool. Which format should you use for the tables in DB1?
a) CSV
b) ORC
c) JSON
d) Parquet
DP-203: Exam Q&A Series – Part 13
166
You are planning a solution to aggregate streaming data that originates in Apache Kafka and is
output to Azure Data Lake Storage Gen2. The developers who will implement the stream
processing solution use Java. Which service should you recommend using to process the
streaming data?
a) Azure Event Hubs
b) Azure Data Factory
c) Azure Stream Analytics
d) Azure Databricks
DP-203: Exam Q&A Series – Part 13
167
You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse
Analytics dedicated SQL pool. You plan to keep a record of changes to the available fields.
The supplier data contains the following columns.
Name Which three additional columns should you add
SupplierSystemID SupplierAddress1 to the data to create a Type 2?
SupplierName SupplierAddress2
SupplierDescription SupplierCity
a) surrogate primary key
SupplierCategory SupplierCountry b) effective start date
SupplierPostalCode c) business key
d) last modified date
e) effective end date
f) foreign key
DP-203: Exam Q&A Series – Part 13
168
You have a Microsoft SQL Server database that uses a third normal form schema. You plan to
migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQL
pool. You need to design the dimension tables. The solution must optimize read operations. What
should you include in the solution?
Transform data for dimension tables by For primary key columns in dimension tables use
Maintaining to a third normal form New IDENTITY columns
Normalizing to a fourth normal form A new computed columns
Denormalizing to a second normal form The business key column from the source system
Denormalization is the process of transforming The collapsing relations strategy can be used in this step
higher normal forms to lower normal forms via to collapse classification entities into component entities
storing the join of higher normal form relations as a to obtain flat dimension tables with single-part keys that
base relation. Denormalization increases the connect directly to the fact table. The single-part key is a
performance in data retrieval at cost of bringing surrogate key generated to ensure it remains unique over
update anomalies to a database. time.
DP-203: Exam Q&A Series – Part 13
169
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL
pool. You create a table by using the Transact-SQL statement shown in the following exhibit.
Use the drop-down menus to select the answer
choice that completes each statement based on
the information presented in the graphic.
DimProduct is a ---- slowly changing dimension (SCD)
Type 1
Type 2
Type 3
The ProductKey column is ----
a surrogate key
A business key
An audit column
DP-203: Exam Q&A Series – Part 13
170
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL
pool. You create a table by using the Transact-SQL statement shown in the following exhibit.
Which two columns should you add to the table
so that the table supports storing two versions
of a dimension member as separate columns?
Each correct answer presents part of the
solution?
a) [EffectiveStartDate] [datetime] NOT NULL,
b) [CurrentProductCategory] [nvarchar] (100) NOT
NULL,
c) [EffectiveEndDate] [datetime] NULL,
d) [ProductCategory] [nvarchar] (100) NOT NULL,
e) [OriginalProductCategory] [nvarchar] (100) NOT
NULL,
DP-203: Exam Q&A Series – Part 13
171
You are designing a data mart for the human resources (HR) department at your company. The
data mart will contain employee information and employee transactions. From a source system,
you have a flat extract that has the following fields:
● EmployeeID You need to design a star schema data model in an Azure Synapse
● FirstName
● LastName
Analytics dedicated SQL pool for the data mart. Which two tables
● Recipient should you create?
● GrossAmount a) a dimension table for Transaction
● TransactionID
● GovernmentID b) a dimension table for EmployeeTransaction
● NetAmountPaid c) a dimension table for Employee
● TransactionDate
d) a fact table for Employee
e) a fact table for Transaction
DP-203: Exam Q&A Series – Part 13
172
You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL
pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the
following columns.
FactPurchase will have 1 million rows of data added
daily and will contain three years of data.
Transact-SQL queries similar to the following query
will be executed daily.
SELECT SupplierKey, StockItemKey,
IsOrderFinalized, COUNT(*) FROM FactPurchase
WHERE DateKey >= 20210101 AND DateKey <=
20210131 GROUP By SupplierKey, StockItemKey,
IsOrderFinalized
a) replicated c) round-robin
b) hash-distributed on PurchaseKey d) hash-distributed on IsOrderFinalized
DP-203: Exam Q&A Series – Part 13
173
You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool. You need
to create a surrogate key for the table. The solution must provide the fastest query performance.
What should you use for the surrogate key?
a) a GUID column
b) a sequence object
c) an IDENTITY column
DP-203: Exam Q&A Series – Part 13
174
You are implementing a batch dataset in the Parquet format. Data files will be produced be using
Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an
Azure Synapse Analytics serverless SQL pool. You need to minimize storage costs for the solution.
What should you do?
a) Use Snappy compression for the files.
b) Use OPENROWSET to query the Parquet files.
c) Create an external table that contains a subset of columns from the Parquet files.
d) Store all data as string in the Parquet files.
DP-203: Exam Q&A Series – Part 13
175
Which Azure Data Factory component contains the transformation logic or the analysis commands
of the Azure Data Factory’s work?
a) Linked Services
b) Datasets
c) Activities
d) Pipelines
• Linked Services are objects that are used to define the connection to data stores or
compute resources in Azure.
• Datasets represent data structures within the data store that is being referenced
by the Linked Service object.
• Activities contains the transformation logic or the analysis commands of the Azure
Data Factory’s work.
• Pipelines are a logical grouping of activities.
DP-203: Exam Q&A Series – Part 14
176
You have an Azure subscription that contains an Azure Blob Storage account named storage1 and
an Azure Synapse Analytics dedicated SQL pool named Pool1. You need to store data in storage1.
The data will be read by Pool1. The solution must meet the following requirements:
• Enable Pool1 to skip columns and rows that are unnecessary in a query.
• Automatically create column statistics.
• Minimize the size of files.
Which type of file should you use?
a) JSON
b) Parquet
c) Avro
d) CSV
DP-203: Exam Q&A Series – Part 14
177
You plan to create a table in an Azure Synapse Analytics dedicated SQL pool. Data in the table will
be retained for five years. Once a year, data that is older than five years will be deleted. You need
to ensure that the data is distributed evenly across partitions. The solution must minimize the
amount of time required to delete old data. How should you complete the Transact-SQL statement?
a) CustomerKey
b) Hash
c) Round_Robin
d) Replicate
e) OrderDateKey
f) SalesOrderNumber
Hash
OrderDateKey
DP-203: Exam Q&A Series – Part 14
178
You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one
container and has the hierarchical namespace enabled. The system has files that contain data
stored in the Apache Parquet format. You need to copy folders and files from Storage1 to Storage2
by using a Data Factory copy activity. The solution must meet the following requirements:
• No transformations must be performed.
• The original folder structure must be retained.
• Minimize time required to perform the copy activity.
How should you configure the copy activity?
Source Dataset Type Copy activity copy behavior
Binary FlattenHierarchy
Paraquet Merge Files
Delimited Text PreserveHierarchy
DP-203: Exam Q&A Series – Part 14
179
You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use
in an analytical workload. You need to recommend a format for the transformed files. The solution
must meet the following requirements:
• Contain information about the data types of each column in the files.
• Support querying a subset of columns in the files.
• Support read-heavy analytical workloads.
• Minimize the file size.
What should you recommend?
a) JSON
b) CSV
c) Apache Avro
d) Apache Parquet
DP-203: Exam Q&A Series – Part 14
180
From a website analytics system, you receive data extracts about user interactions such as
downloads, link clicks, form submissions, and video plays. Data contains the following columns.
You need to design a star schema to support analytical
queries of the data. The star schema will contain four tables
including a date dimension.
To which table should you add each column? To answer,
select the appropriate options in the answer area.
Stream Analytics is a cost-effective event processing engine that helps uncover real-
time insights from devices, sensors, infrastructure, applications and data quickly and
easily. You can monitor and manage Stream Analytics resources with Azure PowerShell
cmdlets and powershell scripting that execute basic Stream Analytics tasks.
DP-203: Exam Q&A Series – Part 14
182
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts.
Contacts contains a column named Phone. You need to ensure that users in a specific role only
see the last four digits of a phone number when querying the Phone column. What should you
include in the solution?
a) column encryption
b) dynamic data masking
c) a default value
d) table partitions
e) row level security (RLS)
DP-203: Exam Q&A Series – Part 14
183
You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be
stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and
PolyBase in Azure SQL Data Warehouse. You need to recommend a Stream Analytics data output
format to ensure that the queries from Databricks and PolyBase against the files encounter the
fewest possible errors. The solution must ensure that the files can be queried quickly and that the
data type information is retained. What should you recommend?
a) Avro
b) CSV
c) Parquet The Avro format is great for data and message preservation. Avro
d) JSON schema with its support for evolution is essential for making the data
robust for streaming architectures like Kafka, and with the metadata
that schema provides, you can reason on the data.
DP-203: Exam Q&A Series – Part 14
184
You have an Azure Storage account. You plan to copy one million image files to the storage
account. You plan to share the files with an external partner organization. The partner organization
will analyze the files during the next year. You need to recommend an external access solution for
the storage account. The solution must meet the following requirements:
- Ensure that only the partner organization can access the storage account.
- Ensure that access of the partner organization is removed automatically after 365 days.
P1 P2
Set the Copy method to Bulk insert Set the Copy method to Bulk insert
Set the Copy method to PolyBase Set the Copy method to PolyBase
Set the Isolation level to Repeatable read Set the Isolation level to Repeatable read
Set the Partition option to Dynamic range Set the Partition option to Dynamic range
DP-203: Exam Q&A Series – Part 16
190
You plan to monitor an Azure data factory by using the Monitor & Manage app. You need to identify
the status and duration of activities that reference a table in a source database. Which three
actions should you perform in sequence? To answer, move the actions from the list of actions to
the answer area and arrange them in the correct order.
a) From the Data Factory monitoring app, add the Source user property to the Activity Runs table2
b) From the Data Factory monitoring app, add the Source user property to the Pipeline runs table
c) From the Data Factory authoring UI, publish the pipelines 3
d) From the Data Factory monitoring app, add a linked service to the Pipeline Runs table
e) From the Data Factory authoring UI, generate a user property for Source on all activities 1
f) From the Data Factory authoring UI, generate a user property for Source on all datasets
DP-203: Exam Q&A Series – Part 16
191
Your company has two Microsoft Azure SQL databases named db1 and db2. You need to move
data from a table in db1 to a table in db2 by using a pipeline in Azure Data Factory. You create an
Azure Data Factory named ADF1. Which two types Of objects Should you create In ADF1 to
complete the pipeline? Each correct answer presents part of the solution. NOTE: Each correct
selection is worth one point.
a) a linked service
b) an Azure Service Bus
c) sources and targets
d) input and output I datasets
e) transformations
DP-203: Exam Q&A Series – Part 16
192
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
Table1 contains the following:
• One billion rows
• A clustered columnstore index
• A hash-distributed column named Product Key
• A column named Sales Date that is of the date data type and cannot be null
Thirty million rows will be added to Table1 each month. You need to partition Table1 based on the
Sales Date column. The solution must optimize query performance and data loading. How often
should you create a partition?
Yes No
DP-203: Exam Q&A Series – Part 16
195
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. You
have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1. You plan to insert data from the files in container1 into Table1 and transform the data.
Each row of data in the files will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is
stored as an additional column in Table1.
Solution: You use an Azure Synapse Analytics serverless SQL pool to create an external table that
has an additional DateTime column.
Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 16
196
You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. You
have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1. You plan to insert data from the files in container1 into Table1 and transform the data.
Each row of data in the files will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is
stored as an additional column in Table1.
Solution: In an Azure Synapse Analytics pipeline, you use a data flow that contains a Derived
Column transformation.
Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 17
197
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has two streaming inputs, one query, and two
outputs. Does this meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 17
198
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one query, and two outputs. Does this
meet the goal?
Yes No
DP-203: Exam Q&A Series – Part 17
199
You are developing a solution that will use Azure Stream Analytics. The solution will accept an
Azure Blob storage file named Customers. The file will contain both in-store and online customer
details. The online customers will provide a mailing address. You have a file in Blob storage named
‘LocationIncomes’ that contains median incomes based on location. The file rarely changes. You
need to use an address to look up a median income based on location. You must output the data
to Azure SQL Database for immediate use and to Azure Data Lake Storage Gen2 for long-term
retention.
Solution: You implement a Stream Analytics job that has one streaming input, one reference input,
two queries, and four outputs. Does this meet the goal?
Yes No
• We need one reference data input for LocationIncomes, which rarely changes.
• We need two queries, one for in-store customers, and one for online customers.
• For each query two outputs is needed. That makes a total of four outputs.
DP-203: Exam Q&A Series – Part 17
200
An on-premises data warehouse has the following fact tables, both having the columns: DateKey,
ProductKey, RegionKey. There are 120 unique product keys and 65 unique region keys.
Table Comments Queries that use the data warehouse take a
Sales The table is 600 GB in size. DateKey is used extensively long time to complete. You plan to migrate
in the WHERE clause in queries. ProductKey is used
extensively in join operations. RegionKey is used for
the solution to use Azure Synapse
grouping. Severity-five percent of records relate to Analytics. You need to ensure that the
one of 40 regions. Azure-based solution optimizes query
Invoice The table is 6 GB in size. DateKey and ProductKey are performance and minimizes processing
used extensively in the WHERE clause in queries.
RegionKey is used for grouping.
skew. What should you recommend?
202
If an Event Hub goes offline before a consumer group can process the events it holds, those
events will be lost
True False
DP-203: Exam Q&A Series – Part 17
203
By default, how many partitions will a new Event Hub have?
1 2 3 4 8
204
What is the maximum number of activities per pipeline in Azure Data Factory?
40 60 80 100 150
DP-203: Exam Q&A Series – Part 17
205
You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the
data to an Azure Blob storage account. You need to output the count of tweets during the last five
minutes every five minutes. Which windowing function should you use?
a) a five-minute Sliding window
b) a five-minute Session window
c) a five-minute Tumbling window
d) has a one-minute hop
DP-203: Exam Q&A Series – Part 17
206
You are creating a new notebook in Azure Databricks that will support R as the primary language
but will also support Scala and SQL. Which switch should you use to switch between languages?
a) %
b) @
c) []
d) ()
• %python
• %R
• %Scala
• % sql
DP-203: Exam Q&A Series – Part 18
207
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data
Warehouse. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL
Data Warehouse.
Solution:
1. Use Azure Data Factory to convert the parquet files to CSV files
2. Create an external data source pointing to the Azure storage account
3. Create an external file format and external table using the external data source
4. Load the data using the INSERT…SELECT statement
Yes No
DP-203: Exam Q&A Series – Part 18
208
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data
Warehouse. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL
Data Warehouse.
Solution:
1. Create an external data source pointing to the Azure storage account
2. Create an external file format and external table using the external data source
3. Load the data using the INSERT…SELECT statement
Yes No
DP-203: Exam Q&A Series – Part 18
209
You develop a data ingestion process that will import data to a Microsoft Azure SQL Data
Warehouse. The data to be ingested resides in parquet files stored in an Azure Data Lake Gen 2
storage account.
You need to load the data from the Azure Data Lake Gen 2 storage account into the Azure SQL
Data Warehouse.
Solution:
1. Create an external data source pointing to the Azure Data Lake Gen 2 storage account
2. Create an external file format and external table using the external data source
3. Load the data using the CREATE TABLE AS SELECT statement
Yes No
DP-203: Exam Q&A Series – Part 18
210
You are moving data from an Azure Data Lake Gen2 storage to Azure Synapse Analytics. Which
Azure Data Factory integration runtime would you use in a data copy activity?
a) Azure - SSIS
b) Azure IR
c) Self-hosted
d) Pipelines
DP-203: Exam Q&A Series – Part 18
211
You have an enterprise data warehouse in
Azure Synapse Analytics that contains a
table named FactOnlineSales. The table
contains data from the start of 2009 to the
end of 2012. You need to improve the
performance of queries against
FactOnlineSales by using table partitions.
The solution must meet the following
requirements:
- Create four partitions based on the order
date.
- Ensure that each partition contains all the
orders placed during a given calendar year.
How should you complete the T-SQL
command?
DP-203: Exam Q&A Series – Part 18
212
You are performing exploratory
analysis of bus fare data in an
Azure Data Lake Storage Gen2
account by using an Azure
Synapse Analytics serverless
SQL pool. You execute the
Transact-SQL query shown in the
following exhibit.
DP-203: Exam Q&A Series – Part 18
212
Use the drop-down menus to select the answer choice that completes each statement based on
the information presented in the graphic.
DP-203: Exam Q&A Series – Part 18
213
You have an Azure subscription that is linked to a hybrid Azure Active Directory (Azure AD) tenant.
The subscription contains an Azure Synapse Analytics SQL pool named Pool1. You need to
recommend an authentication solution for Pool1. The solution must support multi-factor
authentication (MFA) and database-level authentication. Which authentication solution or
solutions should you include in the recommendation? To answer, select the appropriate options in
the answer area.
MFA Database level authentication
Azure AD authentication Application roles
Microsoft SQL Server authentication Contained database users
Password-less authentication Database roles
Windows authentication Microsoft SQL Server Logins
DP-203: Exam Q&A Series – Part 18
214
You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool.
The table will have a clustered columnstore index and will include the following columns:
Table Comment
EventDate One million records are added to the table each day
EventTypeID The table contains 10 million records for each event type
WarehouseID The table contains 100 million records for each warehouse
ProductCategoryTypeID The table contains 25 million records for each product category type