DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 86


You are designing a solution that will copy Parquet files stored in an Azure Blob storage account to an Azure Data Lake Storage Gen2 account.

The data will be loaded daily to the data lake and will use a folder structure of {Year}/{Month}/{Day}/.

You need to design a daily Azure Data Factory data load to minimize the data transfer between the two accounts.

Which two configurations should you include in the design? Each correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

Show Answer
Correct Answer: BD

To minimize data transfer and adhere to the specified folder structure, you should filter by the last modified date of the source files and specify a file naming pattern for the destination. Filtering by the last modified date ensures that only new or updated files are copied each day, minimizing the amount of data transferred. Specifying a file naming pattern allows the data to be correctly placed into the folder structure {Year}/{Month}/{Day}/ in the destination Azure Data Lake Storage Gen2 account.

Discussion

7 comments
Sign in to comment
phi618t
Jun 8, 2021

If you choose C. Delete the source files after they are copied, why do you choose B. Filter by the last modified date of the source files? I prefer BD.

Bhagya123456
Aug 19, 2021

How naming pattern gonna minimize the Data Transfer? BC should be correct answer.

Marcus1612
Sep 30, 2021

This is a basic question. Copy data from one place to another. The requirements are : 1- need to minimize transfert and 2- need to adapte data to the destination folder structure. Filter on LastModifiedDate will copy everything that have changed since the latest load while minimizing the data transfert. Specifying the file naming pattern allows to copy data at the right place to the destination Data Lake. The answer is BD

Wendy_DK
May 13, 2021

Correct answer is BC. In the source option of copy activities. There are three choices: 1. No Action 2. Delete Source files 3. Move

BigMF
Jun 11, 2021

A is obviously out and you're are not going to do both B and C so D is in by default. Your only choice at that point is B or C to go along with D. In my experience, you cannot rely 100% on any job to run every single day (assuming this process is daily). Therefore, if the job does not run for one or more days, if you were to choose B you would only copy over the most recent files and there would be files left in the storage account. Therefore, my choice would be to not filter and load everything that is in the storage account and then delete the files once they have been copied. So, C and D are my choices.

YLiu
Sep 14, 2021

B ensures minimized data transfer. If it copies everything every time, then data transfer is not minimized.

maciejt
Apr 8, 2021

The was no requirement what to do with original files, so why i the world anwer C - delete them???

BobFar
Jun 5, 2021

I guess to make sure you dont read the file again!

mter2007
Apr 21, 2021

I would like to choose CD.

Nik71
Mar 24, 2021

C seems not correcct as to deletion you can do life cycle mgmt in storage, so D should be second answer.

AlexD332
Mar 12, 2021

thought it's the only logical choice but they said copy activity not moving files

H_S
Mar 16, 2021

I think it"s BD

Anonymous
Mar 20, 2021

Wildcard path: Using a wildcard pattern will instruct ADF to loop through each matching folder and file in a single Source transformation. This is an effective way to process multiple files within a single flow. Add multiple wildcard matching patterns with the + sign that appears when hovering over your existing wildcard pattern. From your source container, choose a series of files that match a pattern. Only container can be specified in the dataset. Your wildcard path must therefore also include your folder path from the root folder.

Anonymous
Mar 20, 2021

yes BD.. i think you are right

maciejt
Apr 8, 2021

but this applies to finding a source files and D was about destintion file naming pattern... which there were no requirement to change the file name

Anonymous
Mar 20, 2021

yes BD.. i think you are right

maciejt
Apr 8, 2021

but this applies to finding a source files and D was about destintion file naming pattern... which there were no requirement to change the file name

cadio30
May 26, 2021

Agree with the answer B and D as this kind of setup doesn't perform any deletion from both storages which lessen the processing.

Anonymous
Mar 20, 2021

Wildcard path: Using a wildcard pattern will instruct ADF to loop through each matching folder and file in a single Source transformation. This is an effective way to process multiple files within a single flow. Add multiple wildcard matching patterns with the + sign that appears when hovering over your existing wildcard pattern. From your source container, choose a series of files that match a pattern. Only container can be specified in the dataset. Your wildcard path must therefore also include your folder path from the root folder.

Anonymous
Mar 20, 2021

yes BD.. i think you are right

maciejt
Apr 8, 2021

but this applies to finding a source files and D was about destintion file naming pattern... which there were no requirement to change the file name

Anonymous
Mar 20, 2021

yes BD.. i think you are right

maciejt
Apr 8, 2021

but this applies to finding a source files and D was about destintion file naming pattern... which there were no requirement to change the file name

cadio30
May 26, 2021

Agree with the answer B and D as this kind of setup doesn't perform any deletion from both storages which lessen the processing.