Professional Data Engineer Exam QuestionsBrowse all questions from this exam

Professional Data Engineer Exam - Question 284


You have a network of 1000 sensors. The sensors generate time series data: one metric per sensor per second, along with a timestamp. You already have 1 TB of data, and expect the data to grow by 1 GB every day. You need to access this data in two ways. The first access pattern requires retrieving the metric from one specific sensor stored at a specific timestamp, with a median single-digit millisecond latency. The second access pattern requires running complex analytic queries on the data, including joins, once a day. How should you store this data?

Show Answer
Correct Answer: B

When dealing with time series data from sensors that require both fast, specific lookups and complex analytics, Bigtable is an optimal choice for storage. The concatenation of sensor ID and timestamp as the row key ensures efficient and quick retrieval of specific sensor data for exact timestamps, meeting the requirement for single-digit millisecond latency. Additionally, Bigtable's inherent design handles large scale data and offers low-latency access, which is crucial for the given application. For the complex analytic queries, performing a daily export to BigQuery is suitable since BigQuery is designed for scalable, complex analytical queries and can handle joins effectively. This combination leverages the strengths of both Bigtable for real-time lookups and BigQuery for analytics.

Discussion

6 comments
Sign in to comment
raaadOption: B
Jan 10, 2024

- Bigtable excels at incredibly fast lookups by row key, often reaching single-digit millisecond latencies. - Constructing the row key with sensor ID and timestamp enables efficient retrieval of specific sensor readings at exact timestamps. - Bigtable's wide-column design effectively stores time series data, allowing for flexible addition of new metrics without schema changes. - Bigtable scales horizontally to accommodate massive datasets (petabytes or more), easily handling the expected data growth.

scaenruyOption: B
Jan 4, 2024

B. Store your data in Bigtable. Concatenate the sensor ID and timestamp and use it as the row key. Perform an export to BigQuery every day.

Smakyel79
Jan 7, 2024

Based on your requirements, Option B seems most suitable. Bigtable's design caters to the low-latency access of time-series data (your first requirement), and the daily export to BigQuery enables complex analytics (your second requirement). The use of sensor ID and timestamp as the row key in Bigtable would facilitate efficient access to specific sensor data at specific times.

Matt_108Option: B
Jan 13, 2024

Option B - agree with raaad

JyoGCPOption: B
Feb 21, 2024

Option B

hanoverquayOption: B
Mar 15, 2024

voted b

fitri001Option: B
Jun 17, 2024

agree with raaad