DP-201 Exam QuestionsBrowse all questions from this exam

DP-201 Exam - Question 132


HOTSPOT -

You use Azure Data Lake Storage Gen2 to store data that data scientists and data engineers will query by using Azure Databricks interactive notebooks. The folders in Data Lake Storage will be secured, and users will have access only to the folders that relate to the projects on which they work.

You need to recommend which authentication methods to use for Databricks and Data Lake Storage to provide the users with the appropriate access. The solution must minimize administrative effort and development effort.

Which authentication method should you recommend for each Azure service? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

Exam DP-201 Question 132
Show Answer
Correct Answer:
Exam DP-201 Question 132

Databricks: Personal access tokens

To authenticate and access Databricks REST APIs, you use personal access tokens. Tokens are similar to passwords; you should treat them with care. Tokens expire and can be revoked.

Data Lake Storage: Azure Active Directory

Azure Data Lake Storage Gen1 uses Azure Active Directory for authentication.

References:

https://docs.azuredatabricks.net/dev-tools/api/latest/authentication.html https://docs.microsoft.com/en-us/azure/data-lake-store/data-lakes-store-authentication-using-azure-active-directory

Discussion

12 comments
Sign in to comment
remz
Jun 7, 2020

Answer is Correct https://docs.databricks.com/dev-tools/api/latest/authentication.html https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2 https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-datalake-gen2#adls2-aad-credentials

dip17
Jul 11, 2020

To minimize the admin effort the best option would be a creating a High Concurrency cluster enable AD credential passthrough, using RBAC to assign contributor role to the AD users (data engineers and data analyst) to the databricks workspace, apply ACLs to the specific folders for the AD users. Active Directory authentication perfectly works for both.

lastname
Dec 30, 2020

All the answers above are wrong. The correct answers are: 1.Databricks: Azure Active Directory 2.Data Lake Storage: Azure Active Directory 1. There is no mention of connecting with the Databricks API, instead the descriptions days that users will connect to ADLS using Interactive Notebooks. For that they will have to log in to Databricks itself, which will be done with their AD accounts. 2. Shared access signature or shared access keys do not use ACLs, but RBAC, and are applied on container or storage account level, NOT on directory or file level. I quote from https://docs.microsoft.com/nl-nl/azure/storage/blobs/data-lake-storage-access-control "ACLs apply only to security principals in the same tenant, and they don't apply to users who use Shared Key or shared access signature (SAS) token authentication. That's because no identity is associated with the caller and therefore security principal permission-based authorization cannot be performed."

zarga
Jan 24, 2021

1.Databricks: Azure Active Directory (minimize administrative effort) 2.Data Lake Storage: Azure Active Directory

M0e
Oct 24, 2020

Where in the question does it talk about accessing the REST API? Personal access tokens are used to access the Databricks REST API. For interactive notebooks, AAD is the way to authenticate the users!

lastname
Dec 30, 2020

Indeed.

Luke97
Apr 14, 2020

I think for ADLS Gen2, it should use SAS rather than AAD (RBAC). Shared Key is not quite suitable as it make user effectively gains 'super-user' access, meaning full access to all operations on all resources, including setting owner and changing ACLs.

HCL1991
Apr 28, 2020

I concur, I also think that AAD should be the authentication method for databricks since personal access token is used to access databricks REST API instead of interactive notebook.

Leonido
May 1, 2020

Won't work - ADLS can only have account level SAS, and you need at least container wise

azurearch
May 11, 2020

AAD cant do folder level permissions in ADLS, it needs ACL to do that.

pawhit
May 31, 2020

Re: ADLS, the question concerns authentication only not authorisation, so AAD for authentication and then RBAC roles for authorisation.

Ash666
Aug 6, 2020

Databricks - Azure Key Vault https://docs.microsoft.com/en-us/azure/databricks/security/secrets/example-secret-workflow ADLS Gen 2 - AAD

Needium
Mar 10, 2021

This seems a lot like Azure Active Directory in both boxes for me. First of all, I am authenticating to the Databrick UI itself to create and run notebooks and not the REST API. I would rather use the standard AAD to access Databricks and use the same AAD credentials to access ADLS Gen 2 to for the files. Of course, I will be implementing ACL to restrict access tothe folders each user should be able to access only on AAD. Ref: https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough

Leonido
May 1, 2020

It looks like a bad wording in the question. The requirements are not to secure the Notebook, but only the storage access, so What I do in those cases - define access using KeyVault (so user of that notebook won't see the credentials) and secure ADLS2 with Service Identity in AAD - that allows granular authorization and project scope.

syu31svc
Dec 7, 2020

I would say the answer is correct https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/authentication: "To authenticate to and access Databricks REST APIs, you can use Azure Databricks personal access tokens or Azure Active Directory (Azure AD) tokens." https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control: "Always use Azure AD security groups as the assigned principal in an ACL entry"

lastname
Dec 30, 2020

I see no mention of an API, it's logging in to Databricks and then querying ADLS.

Needium
Mar 7, 2021

This seems a lot like Azure Active Directory in both boxes for me. First of all, I am authenticating to the Databrick UI itself to create and run notebooks and not the REST API. I would rather use the standard AAD to access Databricks and use the same AAD credentials to access ADLS Gen 2 to for the files. Of course, I will be implementing ACL to restrict access tothe folders each user should be able to access only on AAD. Ref: https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough

muni53
Sep 22, 2021

both shud be AD. ADB auto authenticates with AD