In this podcast Jason and Chander talk about Incidents, Outages, High Availability and Site Reliability Engineering and a lot more. They cover the following questions...
- How do you define an Incidents and Outages?
- In your experience, what do you think are the major reasons behind these incidents and outages?
- How can we prevent incidents and outages?
- What type of toolings do you use? Can you please get into details about why did you pick those tools and what are the top features that you love?
- How do we achieve high availability with complex systems?
- What do you mean by Site Reliability Engineering and why is it becoming a very popular role in organizations lately?
- Is there any advice you would like to give to our listeners to make our applications more responsive and highly available?