Question 6 of 370

You are designing the folder structure for an Azure Data Lake Storage Gen2 container.

Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month.

Which folder structure should you recommend to support fast queries and simplified folder security?

    Correct Answer: D

    To support fast queries and simplified folder security, the folder structure should first filter by SubjectArea and DataSource before categorizing by date. This organization allows for granular security control by subject area, ensuring that permissions can be easily managed at the SubjectArea level. Additionally, structuring the folder hierarchy in this way reduces the number of directories, aiding in efficient data processing and query performance. Organizing files by YYYY/MM/DD within each SubjectArea also facilitates efficient querying by date range, which is crucial given that most queries will focus on the current year or month.

Question 7 of 370

HOTSPOT -

You need to output files from Azure Data Factory.

Which file format should you use for each type of output? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

    Correct Answer:

    Box 1: Parquet -

    Parquet stores data in columns, while Avro stores data in a row-based format. By their very nature, column-oriented data stores are optimized for read-heavy analytical workloads, while row-based databases are best for write-heavy transactional workloads.

    Box 2: Avro -

    An Avro schema is created using JSON format.

    AVRO supports timestamps.

    Note: Azure Data Factory supports the following file formats (not GZip or TXT).

    Avro format -

    ✑ Binary format

    ✑ Delimited text format

    ✑ Excel format

    ✑ JSON format

    ✑ ORC format

    ✑ Parquet format

    ✑ XML format

    Reference:

    https://www.datanami.com/2018/05/16/big-data-file-formats-demystified

Question 8 of 370

HOTSPOT -

You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools.

Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company.

You need to move the files to a different folder and transform the data to meet the following requirements:

✑ Provide the fastest possible query times.

✑ Automatically infer the schema from the underlying files.

How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

    Correct Answer:

    Box 1: Preserver hierarchy -

    Compared to the flat namespace on Blob storage, the hierarchical namespace greatly improves the performance of directory management operations, which improves overall job performance.

    Box 2: Parquet -

    Azure Data Factory parquet format is supported for Azure Data Lake Storage Gen2.

    Parquet supports the schema property.

    Reference:

    https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction https://docs.microsoft.com/en-us/azure/data-factory/format-parquet

Question 9 of 370

HOTSPOT -

You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit.

All the dimension tables will be less than 2 GB after compression, and the fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts and updates.

Which type of table should you use for each table? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

    Correct Answer:

    Box 1: Replicated -

    Replicated tables are ideal for small star-schema dimension tables, because the fact table is often distributed on a column that is not compatible with the connected dimension tables. If this case applies to your schema, consider changing small dimension tables currently implemented as round-robin to replicated.

    Box 2: Replicated -

    Box 3: Replicated -

    Box 4: Hash-distributed -

    For Fact tables use hash-distribution with clustered columnstore index. Performance improves when two hash tables are joined on the same distribution column.

    Reference:

    https://azure.microsoft.com/en-us/updates/reduce-data-movement-and-make-your-queries-more-efficient-with-the-general-availability-of-replicated-tables/ https://azure.microsoft.com/en-us/blog/replicated-tables-now-generally-available-in-azure-sql-data-warehouse/

Question 10 of 370

HOTSPOT -

You have an Azure Data Lake Storage Gen2 container.

Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files.

You need to design a data archiving solution that meets the following requirements:

✑ New data is accessed frequently and must be available as quickly as possible.

✑ Data that is older than five years is accessed infrequently but must be available within one second when requested.

✑ Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible.

✑ Costs must be minimized while maintaining the required availability.

How should you manage the data? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point

Hot Area:

    Correct Answer:

    Box 1: Move to cool storage -

    Box 2: Move to archive storage -

    Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours.

    The following table shows a comparison of premium performance block blob storage, and the hot, cool, and archive access tiers.

    Reference:

    https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers