DP-201 Exam - Question 42

Question

You are designing a log storage solution that will use Azure Blob storage containers.

CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than

5,000 customers. Typically, the customers will query data generated on the day the data was created.

You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.

What naming convention should you recommend?

Examice · Accepted Answer

To optimize the query performance for customers accessing their log files, it's important to structure the directory hierarchy in a way that allows them to quickly locate their specific logs. By starting with {CustomerID}, each customer can quickly access their dedicated directory without having to search through unnecessary entries belonging to other customers. The subsequent structure of {year}/{month}/{day}/{hour}/{minute}.csv ensures that logs are further organized chronologically within each customer's directory, facilitating efficient access to specific time frames. This pattern minimizes the time to query because customers can directly navigate to their own logs before filtering by date and time.

Geo_Barros · Answer

In my opinion, option "D" would be the right one.

rahul_t · Answer

I think B is correct. We want to minimize the time it takes for customers to query log files. 'Typically, the customers will query data generated on the day the data was created'. So it makes sense to include the path for a particular day i.e {Year}/{Month}/{Day} close to the start. Once we have reached a particular day then we will want to filter for a particular Customer so  {Year}/{Month}/{Day}/{CustomerID}. Then we will want to aggregate down to hour and minute. The only other viable option will be D. The reason I think {CustomerID} should NOT be at the beginning of the path is in the case a Customer wants to query data related to multiple CustomerIDs on the same day.

AlexD332 · Answer

still not clear as query should be optimized for customers - they won't request not their data.

Apox · Answer

I am certain that B is wrong. Why should Customer ID be put randomly in between the data formats?

I think D is the right answer and the reason is that each "/" takes you to a new directory (folder). As a hierarchy it would make the most sense to have a folder per customer, and then sort by date/time. Source: "Blob Path Format" Section here: https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs#blob-path-format

tanza · Answer

I think answer is A

Alekx42 · Answer

Since it is stated that this is a multi-tenant application, customers would not (and probably should not be able to) query data of other customers. This makes D the right answer. 
Moreover, while it said that typically the queries are done on the same day the data is created, this does not exclude the possibility of making queries that range across multiple days or months. With solution B this becomes unpleasant, since you cannot just query year/month since that will return data of all customers for that month. With solution D all queries are easier, since customerID/year/month returns immediately all the data for that customer of that month. 
Basically, while it is true that both B and D allow for rapid quering of data for a single customer for a single day, B is worse for all queries that want data of more than 1 day.

Neha14n · Answer

Typically, the customers will query data generated on the day the data was created.
This line clears query will be specific to date not customer. Or else D would be correct answer

Kevin89 · Answer

The name of the blob follows the following naming convention:

resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json

so it should actually be answer a

Mandar77 · Answer

I think, Answer B is correct. This is how you would like to restrict the access. question says, customer will access log information on the same day. So if you organize containers on year - month -day -customer - hour - time way, every customer has to come to day folder of that year and month and go to his container to get logs for the day. 
If you organize container based on customer - year - month -day - hour - time, every customer has to traverse the long search path to get to day to get the logs. With option B, searching path would be optimum considering requirement

Marcus1612 · Answer

I think the key word is "Multi-tenant". It appears to me that the logs for a single customer need to be under its own branch. D is the right answer

maynard13x8 · Answer

Answer is correct. D is wrong because you duplicate year and month folders. It is also worse option because consumers query data of the day so, when you set the name, you already have all the data you are interested in.

Nik71 · Answer

confuse between A and B after reviewing https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs feels like why we avoid A here.

BigMF · Answer

All of these options are poor in my opinion and therefore hard to choose a “best” option. If it were me, I’d go with this: {CustomerID}/{year}/{month}/{day}/{CustomerID}_{year}{month}{day}{hour}{minute}.csv. This allows a customer to go directly to their folder and drill down quickly to the day they need. It also has the added benefit of the files being named intelligently and not just a “single bit of info”.csv. It also allows for easier maintenance down the road when customers leave by allowing you to easily archive or delete their data simply by archiving or deleting their folder. All that being said, I would go with D because I don’t think it is any slower for a customer to search for their data following that path than any of the others and in fact probably quicker. Also, it would provide easier maintenance down the road.

msn1712 · Answer

Why now A be the correct answer? On the link - https://docs.microsoft.com/en-us/azure/cdn/cdn-azure-diagnostic-logs, it's mentioned:

The name of the blob follows the following naming convention:

resourceId=/SUBSCRIPTIONS/{Subscription Id}/RESOURCEGROUPS/{Resource Group Name}/PROVIDERS/MICROSOFT.CDN/PROFILES/{Profile Name}/ENDPOINTS/{Endpoint Name}/ y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json

y={Year}/m={Month}/d={Day}/h={Hour}/m={Minutes}/PT1H.json

J4C7 · Answer

what is correct answer i'm confused between B and D?

DP-201 Exam - Question 42

Discussion