Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 155


You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

Show Answer
Correct Answer: D

Given the context of training an object detection model on a dataset consisting of three million 2 GB X-ray images, using the tf.distribute.Strategy API to run a distributed training job is the most effective solution. With the large dataset and considerable computational load, distributed training allows the workload to be split across multiple machines or GPUs, if available, thereby significantly speeding up the training process without reducing model performance. This approach leverages the capacity of the Compute Engine instance efficiently, making it the best option among those provided.

Discussion

9 comments
Sign in to comment
fitri001Option: D
Apr 22, 2024

Large Dataset: With millions of images, training on a single machine can be very slow. Distributed training allows you to split the training data and workload across multiple machines, significantly speeding up the process. Vertex AI Training and tf.distribute: Vertex AI Training supports TensorFlow, and the tf.distribute library provides tools for implementing distributed training strategies. By leveraging this functionality, you can efficiently distribute the training tasks across the available cores and GPU on your Compute Engine instance (32 cores and 1 NVIDIA P100 GPU).

guilhermebutzkeOption: D
Feb 4, 2024

D. Use the tf.distribute.Strategy API and run a distributed training job. Here's why: A. Increase instance memory and batch size: This might not be helpful. While increasing memory could help with loading more images at once, the main bottleneck here is likely processing these large images. Increasing the batch size can worsen the problem by further straining the GPU's memory. B. Replace P100 with K80 GPU: A weaker GPU would likely slow down training instead of speeding it up. C. Enable early stopping: This can save time but might stop training before reaching optimal performance. D. Use tf.distribute.Strategy: This allows you to distribute the training workload across multiple GPUs or cores within your instance, significantly accelerating training without changing the model itself. This effectively leverages the available hardware efficiently.

[Removed]Option: B
Jul 25, 2023

The same comment as in Q96. If we look at our training infrastructure, we can see the bottleneck is obviously the GPU, which has 12GB or 16GB memory depending on the model (https://www.leadtek.com/eng/products/ai_hpc(37)/tesla_p100(761)/detail). This means we can afford to have a batch size of only 6-8 images (2GB each) even if we assume the GPU is utilized 100% and model weights take 0 memory. And remember the training size is 3M, which means each epoch will have 375-500K steps even in this unlikely best case. With 32-cores and 128GB memory, we are able to afford higher batch sizes (e.g., 32), so moving to a K80 GPU that has 24GB of memory will accelerate the training. A is wrong because we can't afford a larger batch size with the current GPU. D is wrong because you don't have multiple GPUs and your current GPU is saturated. C is a viable option, but it seems less optimal than B.

tavva_prudhvi
Jul 26, 2023

but using the tf.distribute.Strategy API is not limited to multiple GPU configurations. Although the current setup has only one GPU, you can still use the API to distribute the training across multiple Compute Engine instances, each with its own GPU. By running a distributed training job in this manner, you can effectively decrease the training time without sacrificing model performance.

tavva_prudhvi
Nov 15, 2023

also, Replacing the NVIDIA P100 GPU with a K80 GPU is not recommended, as the K80 is an older, less powerful GPU compared to the P100. This might actually slow down the training process.

Zemni
Aug 27, 2023

What you say makes sens for the most part except that K80 GPU has only 12GB of DDR5 memory not 24 , https://cloud.google.com/compute/docs/gpus#nvidia_k80_gpus So that leaves me with the only viable option which is C.

powerby35Option: A
Jul 13, 2023

A since we just have one gpu, we could not use tf.distribute.Strategy in D

powerby35
Jul 13, 2023

And C early stopping maybe hurt the performance

TLampr
Nov 27, 2023

The increased batch size also can hurt the performance if it is not followed by further optimizations with regards to learning rate for example. If early stopping is applied according to common convention, by stopping when the validation loss starts increasing, it should not hurt the performance. However it is not specified in the answer sadly.

PST21Option: D
Jul 20, 2023

to decrease training time without sacrificing model performance, the best approach is to use the tf.distribute.Strategy API and run a distributed training job, leveraging the capabilities of the available GPU(s) for parallelized training.

ciro_liOption: D
Jul 25, 2023

https://www.tensorflow.org/guide/gpu ?

ciro_li
Jul 27, 2023

I was wrong. It's A.

bcamaOption: D
Aug 30, 2023

perhaps the fact that the second or more GPUs are created is implied and the answer is D https://codelabs.developers.google.com/vertex_multiworker_training#2

pinimichele01Option: D
Apr 21, 2024

https://www.tensorflow.org/guide/distributed_training#onedevicestrategy

Prakzz
Jul 3, 2024

Same Question as 96?