AI-102 Exam - Question 69

Question

You are building an internet-based training solution. The solution requires that a user's camera and microphone remain enabled.

You need to monitor a video stream of the user and detect when the user asks an instructor a question. The solution must minimize development effort.

What should you include in the solution?

Examice · Accepted Answer

The solution must monitor a video stream of the user and detect when the user asks an instructor a question while minimizing development effort. Among the provided options, using speech-to-text in the Azure AI Speech service is the most relevant because it can transcribe spoken words into text in real-time. This allows the system to detect verbal cues or keywords that indicate a question is being asked. Facial recognition or object detection services do not directly address the requirement to detect when a user asks a question, and language detection is not specifically designed for this purpose.

Murtuza · Answer

The other options are not directly relevant to detecting user questions in a video stream:

Speech-to-text (Option A): Converts spoken language into text. While useful for transcribing audio, it doesn’t directly address identifying user questions.
Language detection (Option B): Determines the language of text. It’s not specifically designed for monitoring video streams or detecting questions.
Object detection (Option D): Identifies objects within images, but it’s not suitable for detecting user interactions or questions.
Therefore, Option C (the Face service in Azure AI Vision) is the most appropriate choice for your scenario.

Murtuza · Answer

The correct CHOICE is C. I made a silly typo but my explanations are right on point.

chandiochan · Answer

speech-to-text in the Azure AI Speech service/

This service can transcribe spoken words into written text in real-time, allowing you to monitor the audio for specific triggers, like questions, which can then be further processed or flagged for response. This solution is efficient and requires minimal development effort for integrating audio streaming and speech recognition capabilities.

Murtuza · Answer

The best option for this scenario would be A. speech-to-text in the Azure AI Speech service.

This service can transcribe the user’s spoken words into written text, which can then be analyzed to detect when a question is being asked. This would be more efficient and direct for detecting questions in a video stream, compared to the other options which focus on language detection, face recognition, and object detection. These other services might not be as effective for this specific use-case.

Barry123456 · Answer

It says video stream.  It doesn't say the video stream has audio.  I deal with video only streams all day.  Don't assume.

reiwanotora · Answer

user's camera and microphone remain enabled, so A is right.

Belicova · Answer

Go with D
From Copilot:
To monitor a video stream of the user and detect when the user asks an instructor a question while minimizing development effort, consider using object detection. Specifically, you can leverage existing models or frameworks (such as YOLOv3) to detect people in real-time from the video stream1. Once you identify a person asking a question, you can trigger further actions or alerts. This approach avoids the complexity of speech-to-text or language detection and focuses on the specific task at hand. Therefore, go with D. object detection in Azure AI Custom Vision!

Murtuza · Answer

Face Service (Azure AI Vision):
The Face service provides facial recognition capabilities, which can be used to identify when a user is facing the camera (e.g., looking at the instructor).
By analyzing facial features, expressions, and head movements, you can detect when a user is likely to be asking a question.
This approach minimizes development effort because it directly addresses the requirement of monitoring the video stream for user interactions.

NullVoider_0 · Answer

A. speech-to-text in the Azure AI Speech service

This service can transcribe the spoken words into text in real-time, which can then be analyzed to detect questions. It’s an efficient way to monitor for specific verbal cues or keywords that indicate a question is being asked, without the need for extensive programming or manual review. This approach minimizes development effort while providing a robust solution for the requirement.

sivapolam90 · Answer

A. speech-to-text in the Azure AI Speech service

anntv252 · Answer

Because user's camera and microphone remain enabled. Azure AI Speech service is recommend for using

anto69 · Answer

To minimize effort: A is enough

HaraTadahisa · Answer

A is correct answer.

SAMBIT · Answer

Definitely its not A. That's a bunker

AI-102 Exam - Question 69

Discussion