Cloudera CCD-410 Exam Practice Questions and Answers

Question 6 of 60

Assuming default settings, which best describes the order of data provided to a reducer's reduce method:

A.

The keys given to a reducer aren't in a predictable order, but the values associated with those keys always are.

B.

Both the keys and values passed to a reducer always appear in sorted order.

C.

Neither keys nor values are in any predictable order.

D.

The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order

Question 7 of 60

You wrote a map function that throws a runtime exception when it encounters a control character in input data. The input supplied to your mapper contains twelve such characters totals, spread across five file splits. The first four file splits each have two control characters and the last split has four control characters.
Indentify the number of failed task attempts you can expect when you run the job with mapred.max.map.attempts set to 4:

A.

You will have forty-eight failed task attempts

B.

You will have seventeen failed task attempts

C.

You will have five failed task attempts

D.

You will have twelve failed task attempts

E.

You will have twenty failed task attempts

Question 8 of 60

You want to populate an associative array in order to perform a map-side join. You've decided to put this information in a text file, place that file into the
DistributedCache and read it in your Mapper before any records are processed.
Indentify which method in the Mapper you should use to implement code for reading the file and populating the associative array?

A.

combine

B.

map

C.

init

D.

configure

Question 9 of 60

You've written a MapReduce job that will process 500 million input records and generated 500 million key-value pairs. The data is not uniformly distributed. Your
MapReduce job will create a significant amount of intermediate data that it needs to transfer between mappers and reduces which is a potential bottleneck. A custom implementation of which interface is most likely to reduce the amount of intermediate data transferred across the network?

Question 10 of 60

Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in
HDFS.

A.

Yes.

B.

Yes, but only if one of the tables fits into memory

C.

Yes, so long as both tables fit into memory.

D.

No, MapReduce cannot perform relational operations.

E.

No, but it can be done with either Pig or Hive.