Recently, one of our clients, Circonus, posted a blog about a fault they experienced on one of their small but misson-critical components. We were excited to see Backtrace deeply involved in expediting their time to resolution. The experience is a great example why sophisticated software organizations need more than just monitoring and reporting to build a robust error management process. Deep introspection, analysis, automation, and workflow integration are all critical to ensure that systems come back up and stay up with minimal impact.
Error Management Use Case
Circonus uses the Backtrace Platform to detect and capture faults throughout their global network. As soon as a fault happens, the Backtrace Platform notifies Circonus directly in their Slack channel. Messages sent by Backtrace, example below, contain detailed information captured from the application at the time of fault along with highlighted signals so that their team can immediately jump into investigation and begin discovering the root-cause.
“Backtrace was able to reduce a highly inconsistent hours-to-days process to minutes. We now have a 2 minute MTTD and 23 minute MTTR for a tested fix involving a core library.”
– Theo Schlossnagle, Circonus
We’re fortunate to have technically savvy customers integrating our debugging platform as a critical part of their error management workflow. Having them publicly share their experiences is icing on the cake.