Bitmovin - Delayed Observability Data Ingestion (US Region) – Incident details

Experiencing partially degraded performance

Delayed Observability Data Ingestion (US Region)

Identified
Major outage
Started 21 minutes ago

Affected

Observability/Analytics

Degraded performance from 4:37 PM to 12:00 AM

Data Ingress

Degraded performance from 4:37 PM to 12:00 AM

Query Service

Degraded performance from 4:37 PM to 12:00 AM

Updates
  • Identified
    Identified

    We have identified the cause of the connectivity issue between our US datacenter and our central Observability data store and have failed over to an alternative transport protocol. Real-time ingestion of US analytics data has been fully restored.

    We are now draining the backlog of buffered data into the database. Some recently buffered records may appear with a short delay until this process completes. No data loss is expected.

    We will continue to monitor the system and will post a final update once all buffered data has been backfilled.

  • Investigating
    Investigating

    We are currently investigating connectivity issues between our US datacenter and our central Observability data store happening since 16:26 UTC. As a result, approximately 20% of analytics requests originating in the US are not being ingested in real time.

    Affected data is being buffered and will be inserted into the database once connectivity is restored, so no data is being lost. Querying of previously ingested data is unaffected.

    We are working with our cloud providers to identify the root cause and will provide an update as soon as we have more information.