Exam DP-203 All QuestionsBrowse all questions from this exam
Question 16

HOTSPOT -

You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format.

You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:

✑ No transformations must be performed.

✑ The original folder structure must be retained.

✑ Minimize time required to perform the copy activity.

How should you configure the copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

    Correct Answer:

    Box 1: Parquet -

    For Parquet datasets, the type property of the copy activity source must be set to ParquetSource.

    Box 2: PreserveHierarchy -

    PreserveHierarchy (default): Preserves the file hierarchy in the target folder. The relative path of the source file to the source folder is identical to the relative path of the target file to the target folder.

    Incorrect Answers:

    ✑ FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.

    ✑ MergeFiles: Merges all files from the source folder to one file. If the file name is specified, the merged file name is the specified name. Otherwise, it's an autogenerated file name.

    Reference:

    https://docs.microsoft.com/en-us/azure/data-factory/format-parquet https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage

Discussion
EddyRoboto

This could be binary as source and sink, since there are no transformations on files. I tend to believe that would be binary the correct anwer.

michalS

I agree. If it's just copying then binary is fine and would probably be faster

jed_elhak

no it must be parquet because The type property of the dataset must be set to Binary. and it's parquet hear so answer are correct

iooj

Agree. I've checked it. With binary source and sink datasets it works.

Sr18

This was on my test on 26-Jun with 930+ score and I passed and answers are binary and preserve hierarchy

AbhiGola

Answer seems correct as data is store is parquet already and requirement is to do no transformation so answer is right

NintyFour

As question has mentioned, Minimize time required to perform the copy activity. And binary is faster than Parquet. Hence, Binary is answer

anto69

No: req1 "no transformation", req2 "Minimize time required to perform the copy activity". Both must be met hence it's Parquet cause it's the second fastest choice and it requires no transformations.

mhi

when doing a binary copy, you're not doing any transformation!

Lestrang

According to ChatGPT While "binary" dataset type would be the fastest in terms of copying the data from one Azure storage account to another, it would not be the correct option in this scenario because it does not retain the original format of the files. If the files contain data stored in the Apache Parquet format, specifying the source dataset type as "binary" would cause Data Factory to treat the files as generic binary files, and it would copy the data as is, without recognizing the original format of the files. This would result in losing the original format of the files, and possibly losing the structure of the data, it could also make it more difficult to read the data. Also, When you copy files using binary dataset type, Data Factory will not be able to detect the changes in files and it copies the entire data each time, this can be inefficient in terms of time and storage. it really gives shitty azure answers in general, but ill go for parquet for this one.

mtc9

ChatGPT is plainly wrong, binary type retais the original parquet format, ebcause it means to copy the files as they are and it;s faster than parquet dataset, because it's doesn't require parsing the files. Binary is correct.

Ajjjiii

This was in my today's exam . Answer is Binary and Preserve Hierarchy .

biafko

you the mvp

JustAnotherDBA

The answer is correct. 3 reasons. The file format is Parquet. Parquet has the 2nd fastest load time. No data transformations should happen, If we are going to quote articles, please read the WHOLE article before posting. Check out the formats that the binary can handle. "When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset."

JustAnotherDBA

https://learn.microsoft.com/en-us/azure/data-factory/format-binary

mtc9

Binary to binary copies the files as they are, retaining the same content, hence retaining the format and it;s faster than parquet, because it doesn;t require load at all just copy.

phydev

Was on my exam today (31.10.2023).

SenMia

mind helping with the right option? binary or parquet for the first box? thanks!! :)

tonyfig

Binary & PerserveHierarchy The Parquet option is used when you want to copy data stored in the Apache Parquet format and perform transformations on the data during the copy activity. However, in this scenario, the requirement is to perform no transformations and minimize the time required to perform the copy activity. The Binary option is better suited for this scenario as it copies the data as-is, without performing any transformations, and minimizes the time required to perform the copy activity.

rocky48

Answer seems correct as data is store is parquet already and requirement is to do no transformation so answer is right. Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy

kkk5566

Binary & PerserveHierarchy

auwia

Massimo Manganiello <<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d9b4b8aaaab0b4b6f7b4b8b7beb8b7b0bcb5b5b699beb4b8b0b5f7bab6b4">[email protected]</a>> 13:36 (49 minuti fa) a me When it comes to efficiency, copying data from a Parquet file to another Parquet file is generally more efficient than copying to a binary format. This is because Parquet is a columnar storage format specifically designed for efficient data compression and query performance. It leverages advanced compression techniques and data encoding to minimize storage size and optimize query execution. Copying data from a Parquet file to a binary format may require additional steps and conversions. Binary formats, such as plain text or custom binary formats, may not have the same level of built-in compression and optimization as Parquet. Therefore, the copy process may involve additional serialization and deserialization steps, resulting in increased processing overhead and potentially larger storage requirements. In summary, when the source and destination formats are both Parquet, copying between Parquet files is generally more efficient in terms of storage utilization and query performance. In my opinion, the provided answer are corrects!

Fusejonny1

Source dataset type should be set to binary. The reason for this is that you’re not performing any transformations on the data, you’re simply copying it from one location to another while retaining the original folder structure. The binary dataset in Azure Data Factory is used for copying files as-is without parsing the file data.

temacc

Binary - copy files as is in fastest way. PreserveHierarchy - for saving folder structure.

ELJORDAN23

Got this question on my exam on january 17, I firmly believe that Binary and Preserve Hierarchy are the most accurate answers, but I answered Parquet just because I wanted to be sure. Still, I was approved :)

klayytech

The answer is still Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy. Even though Binary can be used as the source dataset type, it is not the best option in this scenario. The original folder structure is important, and using Parquet as the source dataset type will ensure that it is preserved. Source dataset type: Parquet Copy activity copy behavior: Preserve Hierarchy This will ensure that the files are copied in their original format, and that the original folder structure is preserved in the destination container. This is the best option for this scenario, as it meets all of the requirements.

trantrongw

Agree. I've checked it.

Rrk07

Answer is correct .

OldSchool

Answer is correct. No transformation and preserve hierarchy