Exam DP-203 All QuestionsBrowse all questions from this exam
Question 269

HOTSPOT -

You use Azure Data Lake Storage Gen2 to store data that data scientists and data engineers will query by using Azure Databricks interactive notebooks. Users will have access only to the Data Lake Storage folders that relate to the projects on which they work.

You need to recommend which authentication methods to use for Databricks and Data Lake Storage to provide the users with the appropriate access. The solution must minimize administrative effort and development effort.

Which authentication method should you recommend for each Azure service? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

    Correct Answer:

    Box 1: Personal access tokens -

    You can use storage shared access signatures (SAS) to access an Azure Data Lake Storage Gen2 storage account directly. With SAS, you can restrict access to a storage account using temporary tokens with fine-grained access control.

    You can add multiple storage accounts and configure respective SAS token providers in the same Spark session.

    Box 2: Azure Active Directory credential passthrough

    You can authenticate automatically to Azure Data Lake Storage Gen1 (ADLS Gen1) and Azure Data Lake Storage Gen2 (ADLS Gen2) from Azure Databricks clusters using the same Azure Active Directory (Azure AD) identity that you use to log into Azure Databricks. When you enable your cluster for Azure Data Lake

    Storage credential passthrough, commands that you run on that cluster can read and write data in Azure Data Lake Storage without requiring you to configure service principal credentials for access to storage.

    After configuring Azure Data Lake Storage credential passthrough and creating storage containers, you can access data directly in Azure Data Lake Storage

    Gen1 using an adl:// path and Azure Data Lake Storage Gen2 using an abfss:// path:

    Reference:

    https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/azure-datalake-gen2-sas-access https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough

Discussion
ItHYMeRIsh

Accessing the ADLS via Databricks should be using Azure Active Directory with Passthrough. Accessing the files in ADLS should be SAS, based on the options provided. The explanation provided for this question is incorrect.

edba

To be more clear, for box it shall be user delegation SAS which is secured with ADD credentials.

Billybob0604

This is it. Correct

vivekazure

1. Accessing the Databricks should be using Personal Tokens 2. Accessing the ADLS should be using Shared Access Signatures. (Because of controlled access to project folders they work).

Pais

Both should be Azure Active Directory with Passthrough 1. Shared Key and SAS authorization grants access to a user (or application) without requiring them to have an identity in Azure Active Directory (Azure AD). With these two forms of authentication, Azure RBAC and ACLs have no effect. ACLs let you grant "fine-grained" access, such as write access to a specific directory or file. https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model Azure AD provides superior security and ease of use over Shared Key for authorizing requests to Blob storage. For more information, see Authorize access to data in Azure Storage. https://learn.microsoft.com/en-us/azure/storage/blobs/security-recommendations 2. Azure AD Passthrough will ensure a user can only access the data that they have previously been granted access to via Azure AD in ADLS Gen2. https://www.databricks.com/blog/2019/10/24/simplify-data-lake-access-with-azure-ad-credential-passthrough.html

sunil_smile

the question is about how to authenticate the ADLS gen2 dataset both in Databricks and ADLSGen2... Its not about how you authenticate the Databricks. 1) Credential Pass through 2) SAS

vrodriguesp

I agree with you, plus looking at the definitions here: -) SAS = A shared access signature provides secure delegated access to resources in your storage account. With a SAS, you have granular control over how a client can access your data -) Azure Active Directory with Passthrough = Credential passthrough allows you to authenticate automatically to Azure Data Lake Storage from Azure Databricks clusters using the identity that you use to log in to Azure Databricks. -) Shared Access Key = Access keys give you full rights to everything in your storage account The more explicit question will be: Which authentication method should you recommend for each Azure service to provide the users with the appropriate access? 1) how to authenticate the ADLS gen2 dataset using databricks? ---> Credential Pass through 2) how to authenticate the ADLS gen2 dataset using Data Lake Storage? ---> SAS

vrodriguesp

Sorry but I missed completely one definition: -) personal acces token = Personal Access Tokens (PATs) can be used to authenticate to the Databricks REST API, allowing for programmatic access to your Databricks workspace So by using a PAT, you can automate data movements between Databricks and Data Lake Storage Gen 2 and control user permission to appropriate access Correct answer should be: 1) how to authenticate the ADLS gen2 dataset using databricks? ---> personal acces token 2) how to authenticate the ADLS gen2 dataset using Data Lake Storage? ---> SAS

kkk5566

box1 Azure Active Directory with Passthrough box2 SAS

Ram9198

Box 1 - Pass through Databricks Box 2 - SAS - DL Gen 2

KR8055

Databricks- Azure Active Directory with Passthrough https://learn.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough Data Lake Storage - SAS https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control-model

auwia

Databricks: Azure Active Directory credential passthrough or personal access tokens. Data Lake Storage: Azure Active Directory credential passthrough. Please note that while shared access keys and shared access signatures are valid authentication methods for Data Lake Storage, they do not meet the requirement of minimizing administrative effort and providing granular access control based on projects in this scenario.

vishal10

Azure Data Lake Storage Gen2 also supports Shared Key and SAS methods for authentication. To authenticate to and access Databricks REST APIs, you can use Azure Databricks personal access tokens or Azure Active Directory (Azure AD) tokens

e56bb91

ChatGPT 4o Azure Active Directory (Azure AD) passthrough for both

Souvik_79

The whole community is confused. Everyone has their own answers and explanations. No consensus whatsoever :(

ageorgieva

Personal Token Shared Access Signatures

Alongi

1. Personal Token 2. Passthrough

Alongi

Personal access token Credential Pass through

gogosgh

I think the answers given are correct. The question is which authentication to use "for" Databricks and Gen2. So we look at authenticating for (or "into") either of them. The question then becomes which authentication can you use to access databricks and then through that which authentication can you use to authenticate for gen2?

JG1984

Personal Access Tokens are an alternative authentication method for Azure Databricks that can be used to authenticate to the Databricks REST API and to access Databricks resources. While PATs can provide a high level of security, they require more administrative effort to manage and maintain than Azure Active Directory Credential Passthrough.

OldSchool

As we need to access Databricks via ADLS use Azure Databricks access tokens or AAD tokens as explained here: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/ Data Lake Storage with Passtrough

Deeksha1234

Given answer seems correct, agree with HaBroNounen's explanation