DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 85


HOTSPOT -

You have an Azure Storage account that generates 200,000 new files daily. The file names have a format of {YYYY}/{MM}/{DD}/{HH}/{CustomerID}.csv.

You need to design an Azure Data Factory solution that will load new data from the storage account to an Azure Data Lake once hourly. The solution must minimize load times and costs.

How should you configure the solution? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point

Hot Area:

Exam DP-201 Question 85
Show Answer
Correct Answer:
Exam DP-201 Question 85

Box 1: Incremental load -

When you start to build the end to end data integration flow the first challenge is to extract data from different data stores, where incrementally (or delta) loading data after an initial full load is widely used at this stage. Now, ADF provides a new capability for you to incrementally copy new or changed files only by

LastModifiedDate from a file-based store. By using this new feature, you do not need to partition the data by time-based folder or file name. The new or changed file will be automatically selected by its metadata LastModifiedDate and copied to the destination store.

Box 2: Tumbling window -

Tumbling window triggers are a type of trigger that fires at a periodic time interval from a specified start time, while retaining state. Tumbling windows are a series of fixed-sized, non-overlapping, and contiguous time intervals. A tumbling window trigger has a one-to-one relationship with a pipeline and can only reference a singular pipeline.

Reference:

https://azure.microsoft.com/en-us/blog/incrementally-copy-new-files-by-lastmodifieddate-with-azure-data-factory/ https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger

Discussion

8 comments
Sign in to comment
Amy007
Apr 29, 2021

But schedule is mentioned as once hourly , why would it be Tumbling window ?

cadio30
May 25, 2021

The appropriate solutions are 'incremental load' and 'fixed schedule' as the basis is 1 hour trigger and the use of tumbling window requires further configuration than the mentioned schedule earlier. It would be better if there is an option to use 'storage events' as the ADF will trigger if a blob is created or deleted. Reference: https://www.mssqltips.com/sqlservertip/6061/create-tumbling-window-trigger-in-azure-data-factory-adf/

cadio30
Jun 16, 2021

In the event the requirement requires to take in consideration the load processing time then tumbling window is the appropriate configuration as the trigger won't overlap.

mbravo
Jun 9, 2021

According the MS documentation, incremental loads are used together with tumbling window. Tumbling window is used in both of these examples where we are performing an incremental load from Blob Storage. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-lastmodified-copy-data-tool and https://docs.microsoft.com/en-us/azure/data-factory/tutorial-incremental-copy-partitioned-file-name-copy-data-tool

BobFar
Jun 5, 2021

The appropriate solutions are 'incremental load' and schedule is mentioned as once hourly which is 'Tumbling window' the answer is correct!

Debjit
Mar 26, 2021

If its tumbling window then why not new individual file as they arrive? Tumbling window works only when a new event occurs

Debjit
Mar 27, 2021

ignore. The answer is correct

tamil1006
May 14, 2021

tumbling window will be used for stream analytics...

MMM777
Jun 6, 2021

Tumbling Window trigger is a "smarter" run - what if the pipeline takes longer than an hour to run? https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#trigger-type-comparison

satyamkishoresingh
Aug 16, 2021

I believe it can be tumbling as well as fixed schedule as they both can do fixed hour