Professional Cloud DevOps Engineer Exam - Question 80

Question

You support a user-facing web application. When analyzing the application's error budget over the previous six months, you notice that the application has never consumed more than 5% of its error budget in any given time window. You hold a Service Level Objective (SLO) review with business stakeholders and confirm that the SLO is set appropriately. You want your application's SLO to more closely reflect its observed reliability. What steps can you take to further that goal while balancing velocity, reliability, and business needs? (Choose two. ).

Examice · Accepted Answer

If your application consistently consumes a very small portion of its error budget, you can afford to take more frequent or potentially risky application releases. This can accelerate feature development without compromising reliability. Announcing planned downtime to consume more of the error budget also helps to prevent users from becoming overly dependent on an exceptionally high availability that exceeds the agreed-upon SLO, aligning user expectations more closely with the SLO.

Trony · Answer

I would go for B+D:
-A: no, there's no reason to add capacity if we are barely scratching error budget;
-B: everything seems fine, so it's ok to dare with more innovative/risky releases;
-C: no, stakeholders said SLO is ok;
-D: adding additional SLIs (and so SLOs) might be a way to reflect observer reliability more closely;
-E: put the servers down for no reason is a no-no.

Sekierer · Answer

I vote for D+E if you read "The Global Chubby Planned Outage"
https://sre.google/sre-book/service-level-objectives/

PhilipKoku · Answer

B - You can increase the frequency of your releases and take higher risks as you have never exceeded your error budget.
E - Planned downtime to use some of your error budget will help to make sure end users don’t get use a higher availability of your service.

[Removed] · Answer

D+E
You want the application's SLO to more closely reflect it's observed reliability.  The key here is error budget never goes over 5%.  This means they can have additional downtime and still stay within their budget.  
E is correct as per Google SRE handbook (https://sre.google/sre-book/service-level-objectives/) 
'You can avoid over-dependence by deliberately taking the system offline occasionally (Google’s Chubby service introduced planned outages in response to being overly available)'
D is a good answer because with more SLI's, this may more accurately reflect the system's reliability.
A is wrong because adding more serving capacity would make the system even more available.
C is wrong because:  The question states 'The SLO is set appropriately'.

eks4x · Answer

B+E 
B because this if you constantly have a lot of spare error budgets it is an indication that you are not taking enough risk ie releasing new features.  And you are ultimately depriving the users of new functionalities by being too cautious.
E: Everyone agrees on E as it was mentioned in the SRE book as part of the The Global Chubby Planned Outage

Re: why not D) The review indicated that the existing SLOs are good. So  adding more SLIs not useful here plus does nothing to the user perceived reliability.

JayDeng · Answer

B and E.
When you only consume 5% of your error budget consistently it means that you can take more risk by releasing features more often (B) and/or bring down service to set user expectation close to SLO (and business has confirmed that this SLO is appropriate)

TNT87 · Answer

These are the correct choices

eliC · Answer

B & D are correct.

shefalia · Answer

This was asked on (12/24/22), passed the exam . I opted for D & E

TNT87 · Answer

https://cloud.google.com/blog/products/management-tools/sre-error-budgets-and-maintenance-windows   This is the link to the answers of this question

zygomar · Answer

chekc link from Sekierer for why E is valid (https://sre.google/sre-book/service-level-objectives/)
Then D is logical as well.

Greg123123 · Answer

B and E:
A. not relevant
B. Yes because we have a lot of budget. Risky isn't necessary a negative word in SRE because what we learn from SRE is to embrace risk and failure.
C. SLO is set appropriately they say.
D. adding more SLI doesn't necessarily help.
E. SRE practice suggest that we can have planned downtime.

Catweazle1983 · Answer

B is correct because when you dont use your error budget you can increase the release frequency. In the question it even mentions "balancing velocity, reliability, and business needs". The balance here can shift from reliability to velocity and business needs.
D is correct as multiple other users already mentioned because of:  "The Global Chubby Planned Outage" https://sre.google/sre-book/service-level-objectives/

dobby_elf · Answer

DE - You want your application's SLO to more closely reflect its observed reliability.

JonathanSJ · Answer

I will go with D and E.
Option B sounds good, but introducing new changes could add errors, that do not match the current objectives  "You want your application's SLO to more closely reflect its observed reliability. "

A doesn't make sense.
C neither, because the SLO has been reviewed and its ok.

izekc · Answer

BE is correct

jomonkp · Answer

Option B and D

Professional Cloud DevOps Engineer Exam - Question 80

Discussion