Certified Data Engineer Associate Exam - Question 71

Question

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Examice · Accepted Answer

JSON data is a text-based format that represents values as strings by default. When Auto Loader processes JSON data without explicit type inference or schema hints, it interprets all values as strings to avoid schema evolution issues. This leads to all columns in the target table being of the string type, even if some fields contain float or boolean values.

meow_akk · Answer

The correct answer is: B. JSON data is a text-based format

JSON data is a text-based format that uses strings to represent all values. When Auto Loader infers the schema of JSON data, it assumes that all values are strings. This is because Auto Loader cannot determine the type of a value based on its string representation.

https://docs.databricks.com/en/ingestion/auto-loader/schema.html

For example, the following JSON string represents a value that is logically a boolean:

JSON
"true"
Use code with caution. Learn more
However, Auto Loader would infer that the type of this value is string. This is because Auto Loader cannot determine that the value is a boolean based on its string representation.

In order to get Auto Loader to infer the correct types for columns, the data engineer can provide type inference or schema hints. Type inference hints can be used to specify the types of specific columns. Schema hints can be used to provide the entire schema of the data.

Therefore, the correct answer is B. JSON data is a text-based format.

55f31c8 · Answer

https://docs.databricks.com/en/ingestion/auto-loader/schema.html#how-does-auto-loader-schema-inference-work

nedlo · Answer

Its B "By default, Auto Loader schema inference seeks to avoid schema evolution issues due to type mismatches. For formats that don’t encode data types (JSON and CSV), Auto Loader infers all columns as strings (including nested fields in JSON files). For formats with typed schema (Parquet and Avro), Auto Loader samples a subset of files and merges the schemas of individual files. This behavior is summarized in the following table:" https://docs.databricks.com/en/ingestion/auto-loader/schema.html

AndreFR · Answer

https://docs.databricks.com/en/ingestion/auto-loader/schema.html#how-does-auto-loader-schema-inference-work

By default, Auto Loader schema inference seeks to avoid schema evolution issues due to type mismatches. For formats that don’t encode data types (JSON and CSV), Auto Loader infers all columns as strings (including nested fields in JSON files).

Certified Data Engineer Associate Exam - Question 71

Discussion