Professional Machine Learning Engineer Exam QuestionsBrowse all questions from this exam

Professional Machine Learning Engineer Exam - Question 212


You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop. Model training will use a large batch size, and you expect training to take several weeks. You need to configure a training architecture that minimizes both training time and compute costs. What should you do?

Show Answer
Correct Answer: CD

For pre-training a large language model with custom TensorFlow operations and a large batch size on Google Cloud, using a configuration with multiple high-GPU machines and tf.distribute.MultiWorkerMirroredStrategy would be the optimal choice. Although TPUs provide excellent performance, they are not the best fit for tasks involving custom TensorFlow operations, as stated in the Google Cloud documentation. Among the remaining options, the configuration that balances training speed and cost-effectiveness involves using 16 workers of a2-highgpu-8g machines. Using tf.distribute.MultiWorkerMirroredStrategy with these high-GPU machines ensures efficient parallel training by distributing the workload across multiple GPUs while handling the large batch size effectively.

Discussion

6 comments
Sign in to comment
pikachu007Option: B
Jan 13, 2024

TPU Advantages: Highly Specialized: TPUs (Tensor Processing Units) are custom-designed hardware accelerators specifically optimized for machine learning workloads, particularly those involving large batch sizes and matrix-heavy computations, common in large language models. Exceptional Performance: TPUs can significantly outperform CPUs and GPUs in terms of speed and efficiency for these types of tasks. Cost-Effective: While TPUs might have a higher hourly cost, their exceptional performance often leads to lower overall costs due to faster training times and reduced resource usage. TPU Pod Slice: Scalability: TPU Pod slices allow you to distribute training across multiple TPUv4 chips for even greater performance and scalability. Custom Operations: The tf.distribute.TPUStrategy ensures compatibility with custom TensorFlow operations,

b1a8faeOption: B
Jan 15, 2024

B. NGL quite lost on this one but if the training set is big enough to span over several weeks I would go with the most powerful resource (TPUs) but I might be completely wrong.

BlehMaksOption: B
Jan 16, 2024

It should be TPU but i'm a bit concerned about this point from Google documentation: Models with no custom TensorFlow/PyTorch/JAX operations inside the main training loop https://cloud.google.com/tpu/docs/intro-to-tpu#TPU

fitri001Option: B
Apr 18, 2024

TPU Acceleration: TPUs are specifically designed for machine learning workloads and offer significant speedups compared to GPUs or CPUs, especially for large models like yours. Utilizing a TPU Pod slice provides access to a collection of interconnected TPUs for efficient parallel training. tf.distribute.TPUStrategy: This strategy is specifically designed to work with TPUs in TensorFlow. It handles data distribution, model replication, and gradient aggregation across the TPU cores, enabling efficient training with custom TensorFlow operations.

fitri001
Apr 18, 2024

why not the others? A. MultiWorkerMirroredStrategy with GPUs: While GPUs offer some acceleration, TPUs are generally better suited for large language model pre-training due to their architectural optimizations. Additionally, managing 8 workers across separate machines can introduce communication overhead compared to a tightly coupled TPU Pod. C. MirroredStrategy with High-CPU Machines: CPU-based training would be significantly slower than TPUs or even GPUs for a large language model. While the high CPU count might seem beneficial for custom operations, the overall training speed would still be limited. D. MultiWorkerMirroredStrategy with Multiple High-GPU Machines: Similar to option A, using multiple high-GPU machines with this strategy would incur communication overhead and potentially be less cost-effective compared to a single TPU Pod slice.

ccb23ccOption: A
Jun 14, 2024

B. TPU Acceleration: the question says that uses Tensorflow custom operations in the main loop and Google documentation literatelly says about TPU use: "Models with no custom TensorFlow/PyTorch/JAX operations inside the main training loop" C. High-CPU Machines: Make no sense because tell you to use a cpu (which does not help us in this case) So the correct answer is between A and D. However the question says that they are planning to use a large batch size so we need RAM. Therefore we should take the one with more. Correct answer: Option A

info_appsatoriOption: A
Jun 17, 2024

Should be A or D. TPU is ok, but TPUs not suitable for TensorFlow custom operations.