Exam DP-203 All QuestionsBrowse all questions from this exam
Question 156

HOTSPOT -

You have an Azure Data Factory instance named ADF1 and two Azure Synapse Analytics workspaces named WS1 and WS2.

ADF1 contains the following pipelines:

✑ P1: Uses a copy activity to copy data from a nonpartitioned table in a dedicated SQL pool of WS1 to an Azure Data Lake Storage Gen2 account

✑ P2: Uses a copy activity to copy data from text-delimited files in an Azure Data Lake Storage Gen2 account to a nonpartitioned table in a dedicated SQL pool of WS2

You need to configure P1 and P2 to maximize parallelism and performance.

Which dataset settings should you configure for the copy activity if each pipeline? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

    Correct Answer:

    Box 1: Set the Copy method to PolyBase

    While SQL pool supports many loading methods including non-Polybase options such as BCP and SQL BulkCopy API, the fastest and most scalable way to load data is through PolyBase. PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language.

    Box 2: Set the Copy method to Bulk insert

    Polybase not possible for text files. Have to use Bulk insert.

    Reference:

    https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview

Discussion
marcin1212

how to use PolyBase when copy data from Synapse to file ? I don't have idea. Moreover PolyBase option is available only when the target is Synapse it should be P1: Set the partition option to "Dynamic range " P2: PolyBase regarding to P1 https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-sql-data-warehouse?tabs=data-factory#parallel-copy-from-synapse-analytics Scenario: "Full load from large table, without physical partitions.." -> Suggested settings: Partition options: Dynamic range partition.

Matt2000

It should be: P1: PolyBase P2: PolyBase "PolyBase is the best choice when you are loading or exporting large volumes of data, or you need faster performance." Ref: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool Regarding "dynamic range partitions": " As repartitioning data takes time, Use [sic] current partitioning is recommended in most scenarios." -> dynamic partitioning is NOT selected Ref: https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-performance

mav2000

P1: Set the partition to "Dynamic range" P2: Polybase the reason for P1 is that Polybase is a technology on Azure Synapse Analytics that can read from external sources but can't insert data there, so it is able to read from Data Lake Storage, but won't be able to write there. on the second case, since we want to write to a SQL Pool, it will work, that's why P2 is Polybase. Besides, Dynamic range partitioning is a technique to partition a non-partition table that allows to parallelize the reading of the source data, which makes it more faster.

Canary_2021

P1: Copy data from SQL to Data Lake. • Bulk insert and PolyBase are not a choice in Sink tab if target is Data Lake. So they are not correct. • Isolation level can be setup if SQL database is the source. Repeatable Read means that locks are placed on all data that is used in a query. Don't think it maximize parallelism and performance. • Set the Partition option to Dynamic range Can be setup if source is SQL in copy activity. And it maximizes parallelism and performance. So I select this option. P2: Copy data from Data Lake to SQL. It is for sure to select PolyBase.

vrodriguesp

I tried to create a copy activity in adf and these were results: P1) Synapse to ADLS --> Source Partition option: None/Dynamic range Sink Copy behavior: Add dynamic content/None/Flatten hierarchy/Merge files/Preserve hierarchy P2) ADLS to Synapse --> Source Copy method: NA Sink Copy method: Copy command/PolyBase/Bulk insert/Upsert So I think correct answers should be: P1) Set the partition option to dynamic range p2) set the copy method to PolyBase

Deeksha1234

Agree with marcin1212 it should be P1: Set the partition option to "Dynamic range " P2: PolyBase

evangelist

P1: Set the Partition option to Dynamic range: 171 votes Set the Copy method to PolyBase: 26 votes Set the Copy method to Bulk insert: 3 votes Set the Isolation level to Repeatable read: 0 votes P2: Set the Copy method to PolyBase: 95 votes Set the Copy method to Bulk insert: 8 votes Set the Isolation level to Repeatable read: 0 votes Set the Partition option to Dynamic range: 1 vote

JezWalters

There's a really interesting video regarding PolyBase/COPY INTO here: https://microsoft.github.io/PartnerResources/skilling/modern-analytics-academy/vignettes/polybase-vs-copy This video indicates that PolyBase can actually be used to pull/push data from/to Azure Data Lake Storage to/from Azure Synapse Analytics Dedicated SQL Pool tables (via CTAS & CETAS statements).

Momoanwar

Correct, chatgpt For P1, where data is copied from a non-partitioned table in a SQL pool to Azure Data Lake Storage Gen2: - **Set the Copy method to PolyBase**: This is because PolyBase is designed to efficiently transfer large amounts of data to and from SQL-based data stores into Azure Data Lake Storage. For P2, which copies data from text-delimited files in Azure Data Lake Storage Gen2 to a non-partitioned table in a SQL pool: - **Set the Copy method to Bulk insert**: Bulk insert is an efficient way to load data from files into SQL tables, especially when dealing with non-partitioned tables where PolyBase might not be applicable or the most optimal choice.

positivitypeople

Got this question today on the exam

Rossana

for P1, you should set the copy method to Polybase, and for P2, you should set the copy method to Bulk. The reason is that Polybase is better suited for copying data between Azure Synapse Analytics and Azure Data Lake Storage Gen2, and can achieve better performance than Bulk copy in this scenario. On the other hand, Bulk copy is the fastest method for copying data from text-delimited files in Azure Data Lake Storage Gen2 to Azure Synapse Analytics. Setting the partition option to Dynamic range for both pipelines can help to maximize parallelism and performance by allowing the copy activity to split the data into multiple partitions based on the data range.

XiltroX

I think you can put both as PolyBase. PolyBase is much faster and supports text delimited files as well now. https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-sql-data-warehouse?tabs=data-factory#use-polybase-to-load-data-into-azure-synapse-analytics

e56bb91

ChatGPT: To maximize parallelism and performance in your Azure Data Factory pipelines, configure the dataset settings as follows: For Pipeline P1 (copying from WS1 to Azure Data Lake Storage Gen2): Copy Method: Set to PolyBase. This method is optimized for large data loads from SQL pools to Azure Data Lake Storage. For Pipeline P2 (copying from Azure Data Lake Storage Gen2 to WS2): Copy Method: Set to Bulk Insert. This is efficient for loading data into SQL pools from Azure Data Lake Storage, especially with text-delimited files. These settings will help enhance performance by leveraging the most efficient data transfer methods for each scenario.

6d954df

For the Azure Data Factory pipelines P1 and P2, the dataset settings for the copy activity should be configured as follows: P1: b. Set the Copy method to PolyBase PolyBase is a technology that accesses data outside of the database via the T-SQL language. It’s designed to leverage parallelism, which can lead to significant performance improvements when copying large amounts of data12. P2: a. Set the Copy method to Bulk insert Bulk insert is a process that can be used to import large amounts of data into a SQL Server table. It’s a highly efficient way to push data into a table, especially when dealing with text-delimited files12. Please note that the actual performance may vary depending on the specific requirements and the structure of your data12. Learn more 1 learn.microsoft.com 2 learn.microsoft.com 3 social.msdn.microsoft.com

d046bc0

P1: Dynamic range according to https://techcommunity.microsoft.com/t5/fasttrack-for-azure/leverage-copy-data-parallelism-with-dynamic-partitions-in-adf/ba-p/3692133

fahfouhi94

P1 : set the partition option to dynamic range (see here : https://techcommunity.microsoft.com/t5/fasttrack-for-azure/leverage-copy-data-parallelism-with-dynamic-partitions-in-adf/ba-p/3692133) P2: Polybase give the best performance PolyBase loads data from UTF-8 and UTF-16 encoded delimited text files. PolyBase also loads from the Hadoop file formats RC File, ORC, and Parquet. PolyBase can also load data from Gzip and Snappy compressed files. PolyBase currently does not support extended ASCII, fixed-width format, and nested formats such as WinZip, JSON, and XML.

kkk5566

P1) Set the partition option to dynamic range p2) set the copy method to PolyBase

faabbasi

P1 dynamic range, link is pretty clear: https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-data-warehouse?tabs=data-factory#parallel-copy-from-azure-synapse-analytics

DAYENKAR

Both answer are polybase