Elevated URL API Errors
Incident Report for Uploadcare
Postmortem

Incident Summary

On 26 November 2024, an issue with our URL API was identified, causing a partial outage of the service from 14:31 to 15:00 UTC.

Timeline

  • 14:20 UTC – A CDN configuration change was made to improve the Auto Format feature in the URL API.
  • 14:31 UTC – Service degradation began as increased traffic overwhelmed our origin servers due to cache key changes.
  • 14:37 UTC – Alert notifications were received by our team.
  • 14:58 UTC – The CDN configuration was rolled back.
  • 15:00 UTC – The incident was fully resolved, and service was restored.

Root Cause

The issue was caused by a CDN configuration change that altered cache key behavior, significantly increasing traffic to our origin servers. This overwhelmed our autoscaling groups, which hit their scaling limits, resulting in a service outage.

Impact

URL API degradation lasted for approximately 29 minutes.

Challenges During Resolution

This type of issue is difficult to catch in a staging environment because the traffic profile does not reflect production-level demand. Although the alert system worked as intended, delays in responding to the alert contributed to the resolution time.

Resolution

The configuration change was identified as the root cause and rolled back. The service stabilized immediately after the rollback, and full functionality was restored by 15:00 UTC.

Action Items

Short-term

  • Review and improve escalation processes for alert notifications to reduce resolution time.
  • Adjust staging environment testing to better simulate production traffic patterns for similar changes.

Long-term

  • Assess autoscaling limits to ensure sufficient capacity for handling unexpected traffic spikes.

We sincerely apologize for the disruption and are committed to learning from this incident to improve the reliability of our services. Thank you for your understanding and continued support.

Posted Nov 27, 2024 - 09:39 UTC

Resolved
This incident has been resolved.
Posted Nov 26, 2024 - 18:00 UTC
Update
Systems are back to normal
Posted Nov 26, 2024 - 15:16 UTC
Investigating
We're experiencing an elevated level of URL API errors and are currently looking into the issue.
Posted Nov 26, 2024 - 14:56 UTC
This incident affected: CDN.