Friday 24th January 2020

Logs System Logs ingestion delayed

We are investigating an issue with the logs collection pipeline which is noticeably delayed.

14:17 UTC: A component of the "live logs" part of the pipeline was a bit overloaded and started slowing everything down slowly until it became actually noticeable. It has been restarted and the pipeline is now working on the delayed logs waiting in queue.

14:21 UTC: The load came back up soon after the restart, we are working on bringing it down; we may have to shut it down temporarily to scale it up (quick note: we are working on a new pipeline which can be scaled at will without any downtime)

14:25 UTC: We are temporarily shutting down the Logs API to make things easier.

14:34 UTC: Logs API is back and delay is back to <5 seconds, we are still watching the situation closely.

14:58 UTC: Everything is indeed back to normal.