DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 42


You are designing a log storage solution that will use Azure Blob storage containers.

CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than

5,000 customers. Typically, the customers will query data generated on the day the data was created.

You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.

What naming convention should you recommend?

Show Answer
Correct Answer: D

To optimize the query performance for customers accessing their log files, it's important to structure the directory hierarchy in a way that allows them to quickly locate their specific logs. By starting with {CustomerID}, each customer can quickly access their dedicated directory without having to search through unnecessary entries belonging to other customers. The subsequent structure of {year}/{month}/{day}/{hour}/{minute}.csv ensures that logs are further organized chronologically within each customer's directory, facilitating efficient access to specific time frames. This pattern minimizes the time to query because customers can directly navigate to their own logs before filtering by date and time.

Discussion

15 comments
Sign in to comment
Geo_Barros
Mar 15, 2021

In my opinion, option "D" would be the right one.

cadio30
May 24, 2021

Referencing the link that was provided in the solution, it was stated in the blob path that it started using the "profile name" then proceed with the datetime stamp. It make sense that 'D' is the appropriate answer in this question.

rahul_t
Apr 10, 2021

I think B is correct. We want to minimize the time it takes for customers to query log files. 'Typically, the customers will query data generated on the day the data was created'. So it makes sense to include the path for a particular day i.e {Year}/{Month}/{Day} close to the start. Once we have reached a particular day then we will want to filter for a particular Customer so {Year}/{Month}/{Day}/{CustomerID}. Then we will want to aggregate down to hour and minute. The only other viable option will be D. The reason I think {CustomerID} should NOT be at the beginning of the path is in the case a Customer wants to query data related to multiple CustomerIDs on the same day.

AlexD332
Mar 12, 2021

still not clear as query should be optimized for customers - they won't request not their data.

Apox
Apr 27, 2021

I am certain that B is wrong. Why should Customer ID be put randomly in between the data formats? I think D is the right answer and the reason is that each "/" takes you to a new directory (folder). As a hierarchy it would make the most sense to have a folder per customer, and then sort by date/time. Source: "Blob Path Format" Section here: https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs#blob-path-format

KRV
May 22, 2021

By the looks of the question overall your argument holds good however if you read the question carefully it says ... 1. customers will query data generated on the day the data was created --> means it should start with a year to day granularity then 2. log files will be generated for each customer at five-minute intervals --> Now you are left with 2 options either organize by customer ID / hr/min or hr/min customer ID , given the case and nothing is explicility mentioned it is safe to assume that queries will be more customer centric and then within customer at a point in time and hence answer A happens to be logically more correct in the context of question ! {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv

tanza
May 13, 2021

I think answer is A

Alekx42
Jun 12, 2021

Since it is stated that this is a multi-tenant application, customers would not (and probably should not be able to) query data of other customers. This makes D the right answer. Moreover, while it said that typically the queries are done on the same day the data is created, this does not exclude the possibility of making queries that range across multiple days or months. With solution B this becomes unpleasant, since you cannot just query year/month since that will return data of all customers for that month. With solution D all queries are easier, since customerID/year/month returns immediately all the data for that customer of that month. Basically, while it is true that both B and D allow for rapid quering of data for a single customer for a single day, B is worse for all queries that want data of more than 1 day.

Anonymous
Jun 19, 2021

"this does not exclude the possibility of making queries " that is additional assumption made the person who is supposed to answer it.

Neha14n
Mar 14, 2021

Typically, the customers will query data generated on the day the data was created. This line clears query will be specific to date not customer. Or else D would be correct answer

DongDuong
Apr 10, 2021

agree, in this case B is more suitable

Kevin89
Apr 8, 2021

The name of the blob follows the following naming convention: resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json so it should actually be answer a

Mandar77
Jun 7, 2021

I think, Answer B is correct. This is how you would like to restrict the access. question says, customer will access log information on the same day. So if you organize containers on year - month -day -customer - hour - time way, every customer has to come to day folder of that year and month and go to his container to get logs for the day. If you organize container based on customer - year - month -day - hour - time, every customer has to traverse the long search path to get to day to get the logs. With option B, searching path would be optimum considering requirement

BigMF
Jun 9, 2021

This logic is flawed because the customer still has to traverse a long search path when they drill down into the folder structure. You either traverse it to begin with or later in the drill down.

Marcus1612
Sep 24, 2021

I think the key word is "Multi-tenant". It appears to me that the logs for a single customer need to be under its own branch. D is the right answer

maynard13x8
Apr 9, 2021

Answer is correct. D is wrong because you duplicate year and month folders. It is also worse option because consumers query data of the day so, when you set the name, you already have all the data you are interested in.

Nik71
Mar 23, 2021

confuse between A and B after reviewing https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs feels like why we avoid A here.

BigMF
Jun 9, 2021

All of these options are poor in my opinion and therefore hard to choose a “best” option. If it were me, I’d go with this: {CustomerID}/{year}/{month}/{day}/{CustomerID}_{year}{month}{day}{hour}{minute}.csv. This allows a customer to go directly to their folder and drill down quickly to the day they need. It also has the added benefit of the files being named intelligently and not just a “single bit of info”.csv. It also allows for easier maintenance down the road when customers leave by allowing you to easily archive or delete their data simply by archiving or deleting their folder. All that being said, I would go with D because I don’t think it is any slower for a customer to search for their data following that path than any of the others and in fact probably quicker. Also, it would provide easier maintenance down the road.

msn1712
Jun 26, 2021

Why now A be the correct answer? On the link - https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs, it's mentioned: The name of the blob follows the following naming convention: resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json

J4C7
Aug 30, 2021

what is correct answer i'm confused between B and D?