Data Engineering on Microsoft Azure

Here you have the best Microsoft DP-203 practice exam questions

You have 370 total questions to study from
Each page has 5 questions, making a total of 74 pages
You can navigate through the pages using the buttons at the bottom
This questions were last updated on February 13, 2025

Question 1 of 370

You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement.

You need to alter the table to meet the following requirements:

✑ Ensure that users can identify the current manager of employees.

✑ Support creating an employee reporting hierarchy for your entire company.

✑ Provide fast lookup of the managers' attributes such as name and job title.

Which column should you add to the table?

[ManagerEmployeeID] [smallint] NULL

[ManagerEmployeeKey] [smallint] NULL

[ManagerEmployeeKey] [int] NULL

[ManagerName] [varchar](200) NULL

Correct Answer: C

To meet the requirements of identifying the current manager of employees, supporting the creation of an employee reporting hierarchy, and providing fast lookup of the managers' attributes such as name and job title, you should add a column that references the primary key of the same table. In this case, using [ManagerEmployeeKey] with the data type [int] is appropriate because it directly relates to the [EmployeeKey] column, which is also an [int]. This ensures consistency in data types and adherence to relational database best practices.

Question 2 of 370

You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb.

You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace.

CREATE TABLE mytestdb.myParquetTable(

EmployeeID int,

EmployeeName string,

EmployeeStartDate date)

USING Parquet -

You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data.

One minute later, you execute the following query from a serverless SQL pool in MyWorkspace.

SELECT EmployeeID -

FROM mytestdb.dbo.myParquetTable

WHERE EmployeeName = 'Alice';

What will be returned by the query?

an error

a null value

Correct Answer: B

The correct answer is an error. The issue arises because the query references the table as 'mytestdb.dbo.myParquetTable', but the table was created as 'mytestdb.myParquetTable' without specifying a schema. In the absence of an explicit schema, the table will not be associated with the 'dbo' schema. Therefore, when the query attempts to access 'mytestdb.dbo.myParquetTable', it cannot find the table, resulting in an error. This discrepancy of schema reference causes the failure of the query execution.

Question 3 of 370

DRAG DROP -

You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics. SalesFact contains sales data from the past 36 months and has the following characteristics:

✑ Is partitioned by month

✑ Contains one billion rows

✑ Has clustered columnstore index

At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as quickly as possible.

Which three actions should you perform in sequence in a stored procedure? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Select and Place:

Correct Answer:

Step 1: Create an empty table named SalesFact_work that has the same schema as SalesFact.

Step 2: Switch the partition containing the stale data from SalesFact to SalesFact_Work.

SQL Data Warehouse supports partition splitting, merging, and switching. To switch partitions between two tables, you must ensure that the partitions align on their respective boundaries and that the table definitions match.

Loading data into partitions with partition switching is a convenient way stage new data in a table that is not visible to users the switch in the new data.

Step 3: Drop the SalesFact_Work table.

Reference:

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition

Question 4 of 370

You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as shown in the following exhibit.

You create an external table named ExtTable that has LOCATION='/topfolder/'.

When you query ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned?

File2.csv and File3.csv only

File1.csv and File4.csv only

File1.csv, File2.csv, File3.csv, and File4.csv

File1.csv only

Correct Answer: B

When you query an external table in an Azure Synapse Analytics serverless SQL pool, it does not traverse subfolders unless you specify a wildcard pattern such as /** at the end of the LOCATION path. In this scenario, without the /** wildcard, only files directly within the specified LOCATION='/topfolder/' will be returned. Thus, only File1.csv and File4.csv, which are directly under /topfolder/, will be included in the query result.

Question 5 of 370

HOTSPOT -

You are planning the deployment of Azure Data Lake Storage Gen2.

You have the following two reports that will access the data lake:

✑ Report1: Reads three columns from a file that contains 50 columns.

✑ Report2: Queries a single record based on a timestamp.

You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times.

What should you recommend for each report? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Correct Answer:

Report1: CSV -

CSV: The destination writes records as delimited data.

Report2: AVRO -

AVRO supports timestamps.

Not Parquet, TSV: Not options for Azure Data Lake Storage Gen2.

Reference:

https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Destinations/ADLS-G2-D.html