Certified Generative AI Engineer Associate Exam - Question 44

Question

A Generative AI Engineer has built an LLM-based system that will automatically translate user text between two languages. They now want to benchmark multiple LLM’s on this task and pick the best one. They have an evaluation set with known high quality translation examples. They want to evaluate each LLM using the evaluation set with a performant metric.

Which metric should they choose for this evaluation?

Examice · Accepted Answer

DavidMiller · Answer

in the name really, Bilingual Evaluation Understudy (BLEU)

Certified Generative AI Engineer Associate Exam - Question 44

Discussion