✅ Explanation:
Requirements Recap:
Real-time inference: Needs low-latency predictions.
Accelerated instances: Likely GPU-backed, costly to scale inefficiently.
No cold starts: Endpoints must always be warm and responsive.
Each model has different scaling needs: Must support independent scaling of each model.
✅ Why Option C is correct:
Inference components are a new SageMaker feature that allow:
Hosting multiple models on a single endpoint.
Independent scaling of each model (component).
Avoiding cold starts via minimum number of copies.
Setting min invocations or min replicas ≥ 1 keeps the model always warm, eliminating cold starts.
This solution meets all requirements efficiently.