To ensure that your machine learning model remains accurate and effective for both existing and new products, you should extend your test dataset with images of the newer products when they are introduced to retraining. This allows the model to be evaluated against a representative set of both old and new products, ensuring comprehensive coverage and maintaining the relevance of the evaluation metrics.
To build classification workflows over structured datasets stored in BigQuery without writing code for tasks like exploratory data analysis, feature selection, model building, training, hyperparameter tuning, and serving, AutoML Tables is the best option. AutoML Tables provides a user-friendly interface with automated machine learning capabilities that handle all these steps without requiring the user to write code.
Kubeflow Pipelines is an ideal choice for this scenario. It offers end-to-end orchestration capabilities, allowing you to automate the entire workflow from training to deployment, which is crucial for maintaining the relevance of predictive models with frequent retraining requirements. Kubeflow integrates well with other Google Cloud services, providing a streamlined and scalable solution following best practices.
To minimize computation costs and manual intervention while having version control for your code, you should use Cloud Build linked with Cloud Source Repositories. This setup allows for automatic retraining triggered by new code pushes to the repository, thus meeting the requirements for automation, cost efficiency, and version control. Using Cloud Functions and sensors in Cloud Composer adds unnecessary complexity and does not provide built-in version control, while the gcloud command-line tool requires manual intervention.
For this problem, where each image can belong to one of three classes (driver's license, passport, or credit card), the appropriate loss function is categorical cross-entropy. This loss function is specifically designed for multi-class classification tasks with mutually exclusive classes and measures the dissimilarity between the predicted class probabilities and the true class labels. Using categorical cross-entropy ensures that the model is trained to produce probability distributions over the three classes, which helps in achieving accurate predictions.