Every website experiences downtime from time to time, that is absolutely ok. What is important, however, is how downtime is communicated with users, customers, and overall people that matter to the company.
In today’s world where users can quickly switch between different services, it is a key to success to provide the best user experience possible. And part of this user experience is of course the communication of issues. Slack is a great example of a company that is handling its system problems and outages correctly. By communicating with their users honestly and swiftly they build trust in the service even if it is not working at the moment.
Transparency is one of the highly regarded traits that successful internet companies possess and because of that, even downtime can provide a great opportunity for companies to build their brands and often even benefit from the downtimes overall. Let’s have a look at the best incident management practices for communicating downtime.
Dedicated status pages
Number one is of course a dedicated status page. If you don’t know what a dedicated status page is, check the one of wallmine or stripe. This page is a place where you can see whether the website, APIs, or other systems of the given company are working as they should. Furthermore, the status page provides a historical overview of downtimes (incidents) so you can see how the given services are reliable in the long term.
Once an incident happens the status page serves as the single source of truth for everyone interested. It is best practice to automatically connect your status page to any incident management tool you use so that incident updates can be shared immediately. Those updates can be either made via a single text update or via integrating social media like Twitter for example.
Since some incidents last longer it is best practice to offer a subscription option on the status page so that anyone interested can provide their email address and be immediately notified once there is an update. This saves you a lot of time since you don’t have to notify everyone manually. Especially in the case of larger companies, this “subscribe to status page” feature helps a great deal to get everyone around the world up to speed about the situation. So forget forwarding email updates and switch to subscribable status pages.
Embedded system status
In the case of minor incidents, where the majority of systems are functioning, but there is a slowdown of the service, the best communication channel is the embedded status page. Embedded status is a clickable widget, which is usually placed on the top of the website. It states the basic information the users of visitors need to be aware of. Its main purpose apart from being honest is, of course, to let users know that there is no need to contact support about the issues they are experiencing because the response team is already looking into it.
In most cases, the embedded system status contains a single sentence describing what might some users experience. Sometimes an emoji is added as well to create a more easy-going feel. No matter what is in the embedded status the best practice is to link it to a status page so that users have the opportunity to see all the updates and details of the incident.
Twitter and other social media are now a pillar in incident communication of many modern companies. The main benefit of Twitter is that the updates are very easy to write and publish. But the ease of crafting the incident update is not the only thing. The fact that people can essentially “subscribe” to status updates by following your Twitter status profile without entering their email address is a great benefit. Since users are often reluctant to enter their emails and some are not even aware of the existence of a status page when they can subscribe to downtime updates, the Twitter status poses an easy solution.
When it comes to creating a Twitter status page the best practice is to go with a standardized handle for this purpose: @companynamestatus. This is very easy to find and recognize and can be crosslinked with the main company profile.
And yes, correct communication won’t improve your incident management on its own or decrease your MTTR (Mean time to recovery), but it will build trust with your users and create a better user experience.