Exam AI-102 All QuestionsBrowse all questions from this exam
Question 69

You are building an internet-based training solution. The solution requires that a user's camera and microphone remain enabled.

You need to monitor a video stream of the user and detect when the user asks an instructor a question. The solution must minimize development effort.

What should you include in the solution?

    Correct Answer: A

    The solution must monitor a video stream of the user and detect when the user asks an instructor a question while minimizing development effort. Among the provided options, using speech-to-text in the Azure AI Speech service is the most relevant because it can transcribe spoken words into text in real-time. This allows the system to detect verbal cues or keywords that indicate a question is being asked. Facial recognition or object detection services do not directly address the requirement to detect when a user asks a question, and language detection is not specifically designed for this purpose.

Discussion
MurtuzaOption: C

The correct CHOICE is C. I made a silly typo but my explanations are right on point.

MurtuzaOption: C

The other options are not directly relevant to detecting user questions in a video stream: Speech-to-text (Option A): Converts spoken language into text. While useful for transcribing audio, it doesn’t directly address identifying user questions. Language detection (Option B): Determines the language of text. It’s not specifically designed for monitoring video streams or detecting questions. Object detection (Option D): Identifies objects within images, but it’s not suitable for detecting user interactions or questions. Therefore, Option C (the Face service in Azure AI Vision) is the most appropriate choice for your scenario.

BelicovaOption: D

Go with D From Copilot: To monitor a video stream of the user and detect when the user asks an instructor a question while minimizing development effort, consider using object detection. Specifically, you can leverage existing models or frameworks (such as YOLOv3) to detect people in real-time from the video stream1. Once you identify a person asking a question, you can trigger further actions or alerts. This approach avoids the complexity of speech-to-text or language detection and focuses on the specific task at hand. Therefore, go with D. object detection in Azure AI Custom Vision!

reiwanotoraOption: A

user's camera and microphone remain enabled, so A is right.

Barry123456Option: C

It says video stream. It doesn't say the video stream has audio. I deal with video only streams all day. Don't assume.

MurtuzaOption: A

The best option for this scenario would be A. speech-to-text in the Azure AI Speech service. This service can transcribe the user’s spoken words into written text, which can then be analyzed to detect when a question is being asked. This would be more efficient and direct for detecting questions in a video stream, compared to the other options which focus on language detection, face recognition, and object detection. These other services might not be as effective for this specific use-case.

chandiochanOption: A

speech-to-text in the Azure AI Speech service/ This service can transcribe spoken words into written text in real-time, allowing you to monitor the audio for specific triggers, like questions, which can then be further processed or flagged for response. This solution is efficient and requires minimal development effort for integrating audio streaming and speech recognition capabilities.

AlviraTony

[ChatGPT] A. Speech-to-text in the Azure AI Speech service. Explanation: Speech-to-text functionality can convert spoken words into text, allowing you to analyze the content of the speech. By using speech-to-text, you can transcribe the user's spoken questions and then analyze the text to detect if a question is being asked to the instructor. This option aligns with the requirement to monitor the user's speech in real-time without significant development effort.

SAMBITOption: B

Definitely its not A. That's a bunker

HaraTadahisaOption: A

A is correct answer.

anto69Option: A

To minimize effort: A is enough

anntv252Option: A

Because user's camera and microphone remain enabled. Azure AI Speech service is recommend for using

sivapolam90Option: A

A. speech-to-text in the Azure AI Speech service

NullVoider_0Option: A

A. speech-to-text in the Azure AI Speech service This service can transcribe the spoken words into text in real-time, which can then be analyzed to detect questions. It’s an efficient way to monitor for specific verbal cues or keywords that indicate a question is being asked, without the need for extensive programming or manual review. This approach minimizes development effort while providing a robust solution for the requirement.

MurtuzaOption: A

Face Service (Azure AI Vision): The Face service provides facial recognition capabilities, which can be used to identify when a user is facing the camera (e.g., looking at the instructor). By analyzing facial features, expressions, and head movements, you can detect when a user is likely to be asking a question. This approach minimizes development effort because it directly addresses the requirement of monitoring the video stream for user interactions.

Mehe323

The user can talk, but it doesn't have to be a question. I think the focus should be on detecting whether something is a question or not and for that, you need speech to text first. Face doesn't make sense as identifying questions is not the purpose of that service: 'The Azure AI Face service provides AI algorithms that detect, recognize, and analyze human faces in images. Facial recognition software is important in many different scenarios, such as identification, touchless access control, and face blurring for privacy.' https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/overview-identity