Certified Data Engineer Professional Exam QuestionsBrowse all questions from this exam

Certified Data Engineer Professional Exam - Question 9


A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.

Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.

Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

Show Answer
Correct Answer: E

Cmd 1 will succeed because it correctly uses PySpark to filter and collect the list of countries from the 'geo_lookup' table where the continent is Africa ('AF'). This will result in 'countries_af' being a Python list of country names. However, Cmd 2 will fail because the SQL command does not recognize the 'countries_af' Python variable, leading to an error. SQL and Python operate in separate execution contexts within Databricks; therefore, Python variables are not accessible in SQL commands. Hence, 'countries_af' being a Python list cannot be used within the SQL 'IN' clause in Cmd 2.

Discussion

12 comments
Sign in to comment
aragorn_bregoOption: E
Nov 21, 2023

Cmd 1 is a PySpark command that collects the list of countries from the 'geo_lookup' table where the continent is Africa ('AF'). This command will execute successfully, resulting in countries_af being a list of country names (strings) in Python's local memory. Cmd 2 is an SQL command intended to create a view named 'sales_af' from the 'sales' table, filtered by the cities in the countries_af list. However, this will fail because the countries_af variable exists in the Python environment and is not recognized in the SQL context. SQL does not have access to Python variables directly; they are two separate execution contexts within a Databricks notebook. There is no table or view named countries_af that SQL can reference; it is merely a Python list variable. The other options are incorrect because they either assume cross-contextual operation between Python and SQL within a Databricks notebook (which is not possible in the way described in the commands), or they do not correctly interpret the outcome of running the commands.

EertyyOption: E
Aug 27, 2023

E is right nswer

lucasasterioOption: E
Sep 6, 2023

correct

Jay_98_11Option: E
Jan 13, 2024

vote for E

PrashantTiwariOption: E
Feb 8, 2024

E is correct

juliom6Option: E
Apr 8, 2024

E is correct. %sql create table geo_lookup (continent varchar(2), country varchar(15)); insert into geo_lookup (continent, country) values ('AF','Nigeria'), ('AF','Kenya'); create table sales (city varchar(15), continent varchar(2)); insert into sales (city, continent) values ('Nigeria','AF'), ('Kenya','AF'); %python countries_af = [x[0] for x in spark.table('geo_lookup').filter("continent='AF'").select('country').collect()] %sql create view sales_af as select * from sales where city in countries_af and continent = "AF"; ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'in'.(line 4, pos 11) i.e. countries_af is a python list of strings and can't be used inside a sql statement

sturcuOption: E
Oct 11, 2023

correct

ismoshkovOption: B
Nov 4, 2023

https://docs.databricks.com/en/notebooks/notebooks-code.html#mix-languages Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language

Karen1232123
Nov 5, 2023

even if it exists, a table or a view won't work in cmd 2

Naveenkm
Nov 27, 2023

It is mentioned there exists only 2 objects in database. so B is not an option

kz_dataOption: E
Jan 10, 2024

E is correct answer

RiktRikt007Option: B
Feb 10, 2024

B shows the actual flow of spark sql, where E shows the question context, i mean from databricks point of view E never looked, it's true that question state that database has no other tables, so ?? that mean databricks will not check for that particular table ? it will right ? i also confused by "database has no other database statement" and E and B both are right, but again B state "if countries table exists then command 2 will run" here "if" used, but question want to describe the language interoperability, so most of us selected E

leopedroso1Option: C
Feb 14, 2024

By simulating this code in databricks we can see an error being thrown in the SQL statement ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near 'IN'.(line 1, pos 38) == SQL == SELECT * FROM backup.sales WHERE CITY IN countries_af AND CONTINENT = "AF"

imatheushenriqueOption: E
Jun 5, 2024

E. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.