Professional Machine Learning Engineer Exam - Question 212

Question

You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop. Model training will use a large batch size, and you expect training to take several weeks. You need to configure a training architecture that minimizes both training time and compute costs. What should you do?

Examice · Accepted Answer

For pre-training a large language model with custom TensorFlow operations and a large batch size on Google Cloud, using a configuration with multiple high-GPU machines and tf.distribute.MultiWorkerMirroredStrategy would be the optimal choice. Although TPUs provide excellent performance, they are not the best fit for tasks involving custom TensorFlow operations, as stated in the Google Cloud documentation. Among the remaining options, the configuration that balances training speed and cost-effectiveness involves using 16 workers of a2-highgpu-8g machines. Using tf.distribute.MultiWorkerMirroredStrategy with these high-GPU machines ensures efficient parallel training by distributing the workload across multiple GPUs while handling the large batch size effectively.

pikachu007 · Answer

TPU Advantages:

Highly Specialized: TPUs (Tensor Processing Units) are custom-designed hardware accelerators specifically optimized for machine learning workloads, particularly those involving large batch sizes and matrix-heavy computations, common in large language models.
Exceptional Performance: TPUs can significantly outperform CPUs and GPUs in terms of speed and efficiency for these types of tasks.
Cost-Effective: While TPUs might have a higher hourly cost, their exceptional performance often leads to lower overall costs due to faster training times and reduced resource usage.
TPU Pod Slice:

Scalability: TPU Pod slices allow you to distribute training across multiple TPUv4 chips for even greater performance and scalability.
Custom Operations: The tf.distribute.TPUStrategy ensures compatibility with custom TensorFlow operations,

b1a8fae · Answer

B.
NGL quite lost on this one but if the training set is big enough to span over several weeks I would go with the most powerful resource (TPUs) but I might be completely wrong.

BlehMaks · Answer

It should be TPU but i'm a bit concerned about this point from Google documentation:
Models with no custom TensorFlow/PyTorch/JAX operations inside the main training loop
https://cloud.google.com/tpu/docs/intro-to-tpu#TPU

fitri001 · Answer

TPU Acceleration: TPUs are specifically designed for machine learning workloads and offer significant speedups compared to GPUs or CPUs, especially for large models like yours. Utilizing a TPU Pod slice provides access to a collection of interconnected TPUs for efficient parallel training.
tf.distribute.TPUStrategy: This strategy is specifically designed to work with TPUs in TensorFlow. It handles data distribution, model replication, and gradient aggregation across the TPU cores, enabling efficient training with custom TensorFlow operations.

ccb23cc · Answer

B. TPU Acceleration: the question says that uses Tensorflow custom operations in the main loop and Google documentation literatelly says about TPU use: "Models with no custom TensorFlow/PyTorch/JAX operations inside the main training loop"

C.  High-CPU Machines: Make no sense because tell you to use a cpu (which does not help us in this case)

So the correct answer is between A and D. However the question says that they are planning to use a large batch size so we need RAM. Therefore we should take the one with more.

Correct answer: Option A

info_appsatori · Answer

Should be A or D.  TPU is ok, but TPUs not suitable for TensorFlow custom operations.

Professional Machine Learning Engineer Exam - Question 212

Discussion