DP-200 Exam QuestionsBrowse all questions from this exam

DP-200 Exam - Question 106


DRAG DROP -

You have an Azure Data Lake Storage Gen2 account that contains JSON files for customers. The files contain two attributes named FirstName and LastName.

You need to copy the data from the JSON files to an Azure Synapse Analytics table by using Azure Databricks. A new column must be created that concatenates the FirstName and LastName values.

You create the following components:

✑ A destination table in Azure Synapse

✑ An Azure Blob storage container

✑ A service principal

Which five actions should you perform in sequence next in a Databricks notebook? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Select and Place:

Exam DP-200 Question 106
Show Answer
Correct Answer:
Exam DP-200 Question 106

Step 1: Read the file into a data frame.

You can load the json files as a data frame in Azure Databricks.

Step 2: Perform transformations on the data frame.

Step 3:Specify a temporary folder to stage the data

Specify a temporary folder to use while moving data between Azure Databricks and Azure Synapse.

Step 4: Write the results to a table in Azure Synapse.

You upload the transformed data frame into Azure Synapse. You use the Azure Synapse connector for Azure Databricks to directly upload a dataframe as a table in a Azure Synapse.

Step 5: Drop the data frame -

Clean up resources. You can terminate the cluster. From the Azure Databricks workspace, select Clusters on the left. For the cluster to terminate, under Actions, point to the ellipsis (...) and select the Terminate icon.

Reference:

https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse

Discussion

8 comments
Sign in to comment
cadio30
May 3, 2021

It requires to mount the ALDS gen 2 thus the sequence is right "FHEAB".

niwe
May 21, 2021

Can you explain what is "FHEAB"?

maciejt
May 23, 2021

letter-numbers of steps to choose

niwe
Jun 14, 2021

Thanks!

niwe
Jun 14, 2021

Thanks!

maciejt
May 23, 2021

letter-numbers of steps to choose

niwe
Jun 14, 2021

Thanks!

niwe
Jun 14, 2021

Thanks!

hoangton
Jun 12, 2021

Correct answer should be: Step 1: Mount the Data Lake Storage onto DBFS Step 2: Read the file into a data frame. Step 3: Perform transformations on the data frame. Step 4: Specify a temporary folder to stage the data Step 5: Write the results to a table in Azure Synapse.

Aragorn_2021
Apr 20, 2021

I would go for FHEAB. Mount the storage -> Read the file to a dataframe ->transform it further -> write the data to temporary folder in storage -> and load to DWH

111222333
May 14, 2021

Agree. Service Principal (which is given in the task) is used for mounting. Mount an Azure Data Lake Gen 2 to DBFS (Databricks File System) using a Service Principal: https://kyleake.medium.com/mount-an-adls-gen-2-to-databricks-file-system-using-a-service-principal-and-oauth-2-0-ep-5-73172dd0ddeb

vrmei
Jun 6, 2021

Mount Data Lake Storage onto DBFS (Service Principal) Read the file into data frame Perform Transformation on the data frame Specify the temp folder to stage data write results to synapse table https://docs.microsoft.com/en-us/azure/databricks/scenarios/databricks-extract-load-sql-data-warehouse

vrmei
Jun 6, 2021

small correction: I don't see the mount option in ADLS account configuration in the given URL. I feel the given answer might correct. The last one should be Drop the data frame which will do cleanup ...

alf99
Apr 5, 2021

wrong, should be F,H,E,A,B. The DataLake storage has to be mounted onto DBFS before red the file

DongDuong
Apr 7, 2021

Based on the provided link, I think the keyword here is "mounted". Datalake storage is not mounted onto DBFS, instead, it is called by Databricks via API. So the given answer is correct

DongDuong
Apr 9, 2021

After revising, I think FHEAB makes more sense

DongDuong
Apr 9, 2021

After revising, I think FHEAB makes more sense

tucho
Apr 11, 2021

I agree with HEAB. But I don' know which is the missing one. I think there is no need to "drop the DF" or to "mount the DL storage"... :-( Does anybody know the right full answer?

unidigm
May 27, 2021

Do we really need to stage the data? We could directly write the dataframe to Synapse. https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/synapse-analytics

Rob77
May 27, 2021

Yes, we do. tempDir (that stages data) MUST be specified for Synapse write method.

Bhagya123456
Aug 7, 2021

The Answer is Perfect. Mounting is not Required. Drop Data Frame should be there. The question never mentioned that you have to use Service Principle. Had it be 6 steps I would have added Mounting Steps. But Considering only 5 steps, the below 5 steps have more priority then Mounting (not an essential).

satyamkishoresingh
Sep 27, 2021

why drop dataframe ?cleanup the resource is about cluster not the DF .