Notice history

Postmortem

April 15, 2026 at 3:28 PM

Postmortem

April 15, 2026 at 3:28 PM

Summary

On April 14, 2026, starting at approximately 06:00 UTC, historic data in Bitmovin Observability became partially unavailable. API queries for timeframes before 06:00 UTC returned empty or incomplete results, and minute-level granularity in the Dashboard was unavailable for affected timeframes. Hourly granularity for historic data in the Dashboard remained unaffected. API queries for data after 06:00 UTC were fully operational throughout the incident.

Root Cause

On April 13, 2026, we upgraded our database software to a new version. This version had been running successfully in our QA environment for over one month prior to the production rollout. On April 14, our daily retention job, which removes data that has exceeded its retention period, executed normally but triggered a pre-existing bug in the new database version. This bug caused a metadata corruption that led the database to misidentify the storage location of existing data. As a result, queries against historic data returned empty results even though the underlying data was still intact on disk.

The bug was specific to tables that had been created on an older version of the database and subsequently upgraded, which is why it did not surface in our QA environment. We worked together with our database vendor to identify the root cause and determine the best recovery path. The vendor has confirmed the issue and developed a fix, which we expect to be released today (April 15, 2026).

Impact

API queries for timeframes before 06:00 UTC on April 14, 2026 returned empty or incomplete results.
Minute-level granularity in the Observability Dashboard was unavailable for affected timeframes.
Session details and unique viewers metrics were unavailable for affected timeframes.
Data exports for timeframes before 06:00 UTC were also affected.
During the final recovery steps, brief data inconsistencies were visible in the Dashboard while recovered data was being reconciled with live data.
No data was lost.

All real-time monitoring, alerting, and data ingestion remained fully operational throughout the incident. Hourly granularity for historic data in the Dashboard was unaffected. All functionality related to data after 06:00 UTC, including API queries, exports, and Dashboard views at all granularities, was fully operational. The impact was limited to minute-level granularity, session details, unique viewers, and exports for timeframes before 06:00 UTC.

Timeline (all times UTC)

Time	Event
April 13	Database software upgraded to new version
April 14, 06:00	Daily retention job runs, triggering the metadata corruption
April 14, 06:35	Routine data integrity check triggers an alarm, investigation begins
April 14, 07:08	Issue escalated to database vendor support
April 14, 07:18	Database vendor joins investigation call
April 14, 09:20	Retention job identified as the trigger for the issue
April 14, 10:00	Issue successfully reproduced
April 14, 12:19	Root cause identified and bug fixed. Analysis concludes that recovery via code change alone is not feasible. Recovery plan using backups and secondary data stores initiated
April 14, 13:00	Data recovery begins for the 30 day retention data store
April 14, 19:10	30 day retention data store recovery completed. 90 day retention data store recovery begins
April 15, 12:20	90 day retention data store recovery completed. Incident resolved

Mitigation & Recovery

Once the root cause was identified, we worked with our database vendor who developed a fix to prevent future occurrences. Since the corrupted metadata was already present in the system, recovery via a code change alone was not feasible. We performed a full data recovery using backups and secondary data stores. Recovery was carried out in two phases: first for the 30 day retention data store, then for the 90 day retention data store. Both were completed successfully with all historic data fully restored.

Preventive Measures

The database vendor has developed a fix that prevents the metadata corruption from occurring during partition deletions. We expect the fix to be released today (April 15, 2026).
We will apply the fix to our production systems once it is available.
We will ensure our QA environment includes tables that mirror production conditions, including tables originally created on older database versions and subsequently upgraded, to catch similar edge cases before production rollouts.

Resolved
April 15, 2026 at 12:23 PM
Resolved
April 15, 2026 at 12:23 PM
The data recovery is now fully complete. All data across both 30 day and 90 day retention data stores is fully available again in the Observability Dashboard and via the API.
We will perform a full root cause analysis and publish it by tomorrow (April 16, 2026) at the latest.
We apologize for any inconvenience caused and thank you for your patience.
Update
April 15, 2026 at 11:38 AM
Update
April 15, 2026 at 11:38 AM
We are now performing the cut-over for the 90 day retention data store. We expect this to take approximately 30-45 minutes. There may be brief inconsistencies in high-granularity data during the process.
We will post a final update once the operation is complete and all data is fully restored.
Update
April 14, 2026 at 7:11 PM
Update
April 14, 2026 at 7:11 PM
We have completed the data recovery for the 30 day retention data store. All historic data for affected customers on this retention tier is now fully available again.
We will now continue with recovery for the 90 day retention data store. We expect this recovery to be completed by tomorrow morning (April 15, 2026).
We will post a final update once all data has been fully restored.
Update
April 14, 2026 at 6:16 PM
Update
April 14, 2026 at 6:16 PM
We are now performing a cut-over for the 30 day retention data store to the recovered data.
Historic data will be available again within the next 10 minutes at 2026-04-14 18:25 UTC.
During the cut-over time you might see short gaps in the data while we reconcile the recovered data set with the live data.
We will post a final update once the operation is complete and 30 day retention data is fully restored.
Update
April 14, 2026 at 3:24 PM
Update
April 14, 2026 at 3:24 PM
We are currently working on mitigating the issue affecting historic data in Bitmovin Observability. The underlying bug in our database software has been identified and fixed. However, the corrupted metadata is already present in the system, which requires us to recover data from backup.
No data has been lost. As part of the recovery process, we will need to perform a short table switch during which high granularity data may contain a small gap for a few minutes. Once the data recovery is complete, all historic data will be fully available and everything will be back to normal.
We will post an update once the recovery operation is complete.
Update
April 14, 2026 at 11:04 AM
Update
April 14, 2026 at 11:04 AM
We have identified the root cause of the issue affecting historic data in our Observability product. A bug in our database software caused a metadata mismatch, making historic data in our high granularity data store unavailable for timeframes prior to 06:00 UTC today (April 14, 2026).
We have successfully reproduced the issue and our database vendor is currently working on a hotfix to resolve it.
We will provide a further update once we have a timeline for mitigation.
Identified
April 14, 2026 at 9:00 AM
Identified
April 14, 2026 at 9:00 AM
We are continuing to investigate the issue affecting historic data on the Observability platform. We can now confirm that all data prior to 06:00 UTC today (April 14, 2026) is affected.
Currently impacted:
- API queries for timeframes before 06:00 UTC today are returning empty or incomplete results.
- Zooming into minute granularity in the Observability Dashboard for timeframes before 06:00 UTC today will not display data.
- Session details and unique viewers metrics in the Dashboard are unavailable for affected timeframes.
Viewing data at hourly granularity in the Dashboard is working normally.
All data from 06:00 UTC today onwards remains fully operational and unaffected, both in the Dashboard and via the API.
We are actively investigating the root cause and working on mitigation. We will continue to provide updates as we make progress.
Investigating
April 14, 2026 at 7:31 AM
Investigating
April 14, 2026 at 7:31 AM
We have identified an issue affecting the Observability platform where historic data is currently not being served correctly.
API queries targeting timeframes prior to today may return empty or incomplete results. Additionally, session details and unique viewers metrics in the Observability Dashboard are impacted for historic timeframes.
Real-time data and all data for today remain fully operational and unaffected, both in the Observability Dashboard and via the API.
Our engineering team is actively investigating the root cause and working on a resolution. We will provide updates as more information becomes available.

All systems operational

May 2026

Apr 2026

Summary

Root Cause

Impact

Timeline (all times UTC)

Mitigation & Recovery

Preventive Measures

Mar 2026

Bitmovin - Notice history

All systems operational

Notice history

May 2026

Apr 2026

Summary

Root Cause

Impact

Timeline (all times UTC)

Mitigation & Recovery

Preventive Measures

Mar 2026