Exam Certified Data Engineer Professional All QuestionsBrowse all questions from this exam
Question 9

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.

Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

    Correct Answer: E

    Cmd 1 will succeed because it correctly uses PySpark to filter and collect the list of countries from the 'geo_lookup' table where the continent is Africa ('AF'). This will result in 'countries_af' being a Python list of country names. However, Cmd 2 will fail because the SQL command does not recognize the 'countries_af' Python variable, leading to an error. SQL and Python operate in separate execution contexts within Databricks; therefore, Python variables are not accessible in SQL commands. Hence, 'countries_af' being a Python list cannot be used within the SQL 'IN' clause in Cmd 2.

Discussion
aragorn_bregoOption: E

Cmd 1 is a PySpark command that collects the list of countries from the 'geo_lookup' table where the continent is Africa ('AF'). This command will execute successfully, resulting in countries_af being a list of country names (strings) in Python's local memory. Cmd 2 is an SQL command intended to create a view named 'sales_af' from the 'sales' table, filtered by the cities in the countries_af list. However, this will fail because the countries_af variable exists in the Python environment and is not recognized in the SQL context. SQL does not have access to Python variables directly; they are two separate execution contexts within a Databricks notebook. There is no table or view named countries_af that SQL can reference; it is merely a Python list variable. The other options are incorrect because they either assume cross-contextual operation between Python and SQL within a Databricks notebook (which is not possible in the way described in the commands), or they do not correctly interpret the outcome of running the commands.

juliom6Option: E

E is correct. %sql create table geo_lookup (continent varchar(2), country varchar(15)); insert into geo_lookup (continent, country) values ('AF','Nigeria'), ('AF','Kenya'); create table sales (city varchar(15), continent varchar(2)); insert into sales (city, continent) values ('Nigeria','AF'), ('Kenya','AF'); %python countries_af = [x[0] for x in spark.table('geo_lookup').filter("continent='AF'").select('country').collect()] %sql create view sales_af as select * from sales where city in countries_af and continent = "AF"; ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'in'.(line 4, pos 11) i.e. countries_af is a python list of strings and can't be used inside a sql statement

PrashantTiwariOption: E

E is correct

Jay_98_11Option: E

vote for E

lucasasterioOption: E

correct

EertyyOption: E

E is right nswer

imatheushenriqueOption: E

E. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.

leopedroso1Option: C

By simulating this code in databricks we can see an error being thrown in the SQL statement ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'.(line 1, pos 38) == SQL == SELECT * FROM backup.sales WHERE CITY IN countries_af AND CONTINENT = "AF"

RiktRikt007Option: B

B shows the actual flow of spark sql, where E shows the question context, i mean from databricks point of view E never looked, it's true that question state that database has no other tables, so ?? that mean databricks will not check for that particular table ? it will right ? i also confused by "database has no other database statement" and E and B both are right, but again B state "if countries table exists then command 2 will run" here "if" used, but question want to describe the language interoperability, so most of us selected E

kz_dataOption: E

E is correct answer

ismoshkovOption: B

https://docs.databricks.com/en/notebooks/notebooks-code.html#mix-languages Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language

Karen1232123

even if it exists, a table or a view won't work in cmd 2

Naveenkm

It is mentioned there exists only 2 objects in database. so B is not an option

sturcuOption: E

correct