DP-200 Exam QuestionsBrowse all questions from this exam

DP-200 Exam - Question 55


HOTSPOT -

You have two Azure Storage accounts named Storage1 and Storage2. Each account contains an Azure Data Lake Storage file system. The system has files that contain data stored in the Apache Parquet format.

You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements:

✑ No transformations must be performed.

✑ The original folder structure must be retained.

How should you configure the copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Exam DP-200 Question 55
Show Answer
Correct Answer:
Exam DP-200 Question 55

Box 1: Parquet -

For Parquet datasets, the type property of the copy activity source must be set to ParquetSource..

Box 2: PreserveHierarchy -

PreserveHierarchy (default): Preserves the file hierarchy in the target folder. The relative path of the source file to the source folder is identical to the relative path of the target file to the target folder.

Incorrect Answers:

FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names.

MergeFiles: Merges all files from the source folder to one file. If the file name is specified, the merged file name is the specified name. Otherwise, it's an autogenerated file name.

Reference:

https://docs.microsoft.com/en-us/azure/data-factory/format-parquet https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-storage

Discussion

9 comments
Sign in to comment
SorinXp
May 10, 2021

The first box should be "Binary". It says - no transformation.

[Removed]
Jun 16, 2021

Binary is only for Binary format: https://docs.microsoft.com/en-us/azure/data-factory/format-binary

lgtiza
Jun 22, 2021

Every parquet file is also a binary file. I think the key is "no transformations", so why the extra work of interpreting a parquet file?! Binary and preserve hierarchy should do it imo.

lgtiza
Jun 22, 2021

Every parquet file is also a binary file. I think the key is "no transformations", so why the extra work of interpreting a parquet file?! Binary and preserve hierarchy should do it imo.

cadio30
Apr 30, 2021

Agree with the answer as both source and sink can accommodate "parquet" extension files using the behavior as seen below. Try working it on ADFv2 File Format: Parquet (source and sink) Copy behavior: Preserve Hierarchy

Hinzzz
Jun 20, 2021

The given answer is correct Parquet and preserve hierarchy

eliabsbueno
Apr 16, 2021

The first box should be "Binary". You can't use a parquet data source to load different parquet files.

dangal95
Apr 29, 2021

Answer is correct. https://docs.microsoft.com/en-us/azure/data-factory/format-parquet

medsimus
Oct 6, 2021

First box should be "Binary" . I tested it with the 2 options . using paquet i got an error with the following message : "Dataset Parquet1 location is a folder, the wildcard file name is required for Copy data1"

Dark12arrow
Apr 17, 2021

do u have any reference ? and if u cant use parquet to load parquet files whats the point of ever choosing parquet?

maciejt
May 19, 2021

It should be Binary - it copies the files as they are, no need to parse the parquet format if you don't need to transform them.

CarNama_IG
Jun 17, 2021

You can use Binary dataset in Copy activity, GetMetadata activity, or Delete activity. When using Binary dataset, ADF does not parse file content but treat it as-is. When using Binary dataset in copy activity, you can only copy from Binary dataset to Binary dataset..so the ans should be parquet

hello_there_
Jun 23, 2021

Why does it need to be parquet? Just configure the sink dataset as binary as well. This way ADF doesn't need to parse the files. You just need parquet if you want to do some transformation or when the sink dataset is an existing parquet dataset