Due to an outage of Google Cloud Networking (https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5REh), the API returned 404 errors with a Google Load Balancer HTML error page.
The issue occurred on Nov 16th, 2021 between 17:35 and 18:10. All times are UTC.
Bitmovin’s API uses the Google Load Balancer in front of the ingress to the API. This is done to do proper SSL termination and also to have stable ingress for our API.
All API calls returned a 404 error page with a Google Load Balancer HTML error page. Running encodings might also have not finished successfully or stalled, as the encoder couldn’t patch updates back to the API due to this incident.
Once Bitmovin's engineering team found that Google’s Load Balancer could be the issue, the team decided to set up a fallback solution using Bitmovin’s DNS provider to bypass the Load Balancer for the time of the service disruption. However, these changes take some time as also the DNS for api.bitmovin.com needs to be updated. During that time, Google’s Load Balancer came back to normal operation and the incident was resolved.
17:35 - Google Load Balancer issue occurred
17:48 - Investigation started
17:55 - Issue was found and mitigation work was underway
18:10 - Google Load Balancer recovered and the API was fully operational again
As Bitmovin’s API is built on Cloud services, preventing issues in the underlying infrastructure is hard. However, Bitmovin’s team identified possible improvements which will lead to faster reactions in similar situations: