From 10:23 AM to 11:18 AM UTC the Peach API was not responding to database backed API requests. Requests were still able to access basic API functionality. The root cause of this outage was an automated database failover which was not handled gracefully by the API application. This API issue affected several PeachWorks consumer-facing sites including go.peachworks.com, developer.peachworks.com, and apps.peachworks.com.
At 10:23 AM UTC Amazon RDS initiated an automatic rollover procedure due to low level node failure. This automatic rollover process caused the Peach API to drop all active database connections.
When the backup Amazon RDS instance came online at 10:24 AM UTC, the Peach API did not reconnect to the database. This severely limited the API functionality.
At 10:44 AM UTC, the monitoring system alerted PeachWorks engineers who immediately began diagnosing the issue. The diagnosis took longer than expected due to the Peach API reporting itself healthy despite being unable to connect to the database.
After deeper log analysis was conducted, the database connection issue was diagnosed and the API services restarted at 11:17 AM UTC. Complete service was restored shortly after.
An internal review was conducted and the following actions are being taken to prevent further issues of this nature and to improve the response times of diagnosing future incidents: