Monitoring - We’re happy to report that the system has fully recovered and all encodings are now being processed normally across all regions and infrastructure environments.

As a next step, we are conducting a full root cause analysis to understand what led to the incident and to define measures that will prevent similar issues in the future.
We will provide a detailed postmortem once the analysis is completed.

Thank you again for your patience and trust throughout this incident.

Jul 15, 2025 - 16:41 UTC
Update - We have now manually recovered all previously stuck encodings and successfully cleaned up the instances that were blocking our GCP SSD quota.

As a result, GCP-based encoding jobs are once again processing successfully. We are currently performing a controlled ramp-up of system capacity to work through the backlog of affected jobs and ensure platform stability.

We continue to monitor closely and will provide further updates as full performance is restored.

Thank you for your patience during this incident.

Jul 15, 2025 - 15:10 UTC
Update - We have successfully recovered encoding operations for all non-GCP accounts and fully processed the existing backlog.

We are now beginning a controlled and gradual recovery of our GCP (Google Compute Engine) operations. Scheduling of encoding jobs on GCP remains halted as we work to stabilize the environment and resolve the underlying SSD quota constraints.

We will continue to monitor the situation closely and provide further updates as we increase GCP capacity and restore full service.

Thank you for your continued patience and understanding.

Jul 15, 2025 - 10:15 UTC
Identified - We have identified the root cause of the current encoding disruption as an exhaustion of SSD quota within our Google Cloud Platform (GCP) infrastructure. This quota issue is preventing us from provisioning additional instances required to scale encoding jobs, which is causing GCP-based workloads to become extremely slow and trigger our overall system safeguards.

To mitigate the impact, we have paused scheduling of encoding jobs on GCP and are actively working to restore service functionality for customers using non-GCP infrastructure.

Thank you for your patience as we work to fully restore encoding operations.

Jul 15, 2025 - 09:14 UTC
Monitoring - We are currently experiencing scheduling delays across our encoding platform. Our monitoring systems detected increased queue times and longer job scheduling intervals starting at approximately 04:00 UTC today.

Our engineering team has been notified and is actively investigating the root cause. We are seeing signs of recovery with scheduling times beginning to improve.

We will provide updates as more information becomes available and will post a follow-up once the issue has been fully resolved.
We apologize for any inconvenience this may have caused and appreciate your patience as we work to restore normal scheduling performance.

Jul 15, 2025 - 06:37 UTC
Bitmovin API Operational
90 days ago
99.99 % uptime
Today
Account Service Operational
90 days ago
100.0 % uptime
Today
Input Service Operational
90 days ago
100.0 % uptime
Today
Encoding Service Operational
90 days ago
99.99 % uptime
Today
Output Service Operational
90 days ago
99.99 % uptime
Today
Statistics Service Operational
90 days ago
100.0 % uptime
Today
Infrastructure Service Operational
90 days ago
100.0 % uptime
Today
Configuration Service Operational
90 days ago
100.0 % uptime
Today
Manifest Service Operational
90 days ago
100.0 % uptime
Today
Player Service Operational
90 days ago
100.0 % uptime
Today
Player Licensing Operational
90 days ago
100.0 % uptime
Today
Analytics Service Operational
90 days ago
100.0 % uptime
Today
Analytics Ingress Operational
90 days ago
100.0 % uptime
Today
Query Service Operational
90 days ago
100.0 % uptime
Today
Export Service Operational
90 days ago
100.0 % uptime
Today
Bitmovin Dashboard Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
API Response Time
Fetching
Jul 15, 2025
Resolved - This incident has been fully resolved.
All services have operational, and all buffered data has been successfully backfilled as of 06:57 UTC. No data was lost during the incident.

Jul 15, 07:59 UTC
Monitoring - We have resolved the underlying issues affecting the Analytics backend database. All services have fully recovered, and query error rates have returned to normal levels.

We are now actively backfilling the buffered data to ensure all historical events are written and available in the system. No data has been lost.

We will continue to monitor the system closely and provide a final update once the backfill is fully complete.

Jul 15, 06:47 UTC
Investigating - We are currently investigating elevated query error rates and backend insert failures affecting our main Analytics database.
While queries may intermittently fail and analytics dashboards may show incomplete or delayed data, no data is being lost — all incoming events that cannot currently be written are being safely buffered.

Our engineering team is actively working to identify the root cause and restore full functionality.
We will continue to provide updates as the situation evolves.

Jul 15, 06:40 UTC
Jul 14, 2025

No incidents reported.

Jul 13, 2025

No incidents reported.

Jul 12, 2025

No incidents reported.

Jul 11, 2025

No incidents reported.

Jul 10, 2025
Resolved - On 2025-07-10 between 00:06 UTC and 00:26 UTC, and again from 01:19 UTC to 01:39 UTC, our API experienced elevated latency and an increased rate of 503 Service Unavailable and 504 Gateway Timeout errors. The issue was detected by our monitoring systems during the first incident window, prompting an immediate investigation.

The issue was traced to a customer workflow that generated unexpected high volume traffic, causing excessive load on our API gateway system. The gateway was unable to keep up with the volume of rate limiting decisions required, leading to memory pressure on the gateway nodes. This memory pressure resulted in longer request processing times and upstream timeouts.

Once identified, additional API gateway capacity was added and memory pressure on the gateway nodes was alleviated. Response times and error rates returned to normal levels as of 01:39 UTC.

The high traffic volume during the incident resulted in a significant backlog of encoding jobs. Queue times and processing throughput have been impacted and are slowly returning to normal levels as the system processes through the accumulated backlog.

We have already tweaked our API gateway configuration to increase the number of available nodes and allocate more memory per node to better handle traffic spikes. Additionally, we will be implementing auto-scaling capabilities over the coming weeks to further prevent similar incidents in the future.

This incident also triggered a comprehensive review of our rate limiting configuration. As a result of this analysis, we have adjusted our rate limits to better balance system protection with customer workflow requirements. To see our current API rate limits, please check the following documentation: https://developer.bitmovin.com/encoding/reference/introduction-of-api-rate-limits

As a general best practice, we recommend implementing retries with exponential backoff in workflows that depend on our API, to gracefully handle occasional transient errors like 503/504 responses.

We apologize for any inconvenience this may have caused.

Jul 10, 00:30 UTC
Jul 9, 2025

No incidents reported.

Jul 8, 2025

No incidents reported.

Jul 7, 2025

No incidents reported.

Jul 6, 2025

No incidents reported.

Jul 5, 2025

No incidents reported.

Jul 4, 2025

No incidents reported.

Jul 3, 2025

No incidents reported.

Jul 2, 2025

No incidents reported.

Jul 1, 2025

No incidents reported.