Some systems are experiencing issues

Past Incidents

Saturday 14th March 2020

No incidents reported

Friday 13th March 2020

No incidents reported

Thursday 12th March 2020

No incidents reported

Wednesday 11th March 2020

Deployments Deployments issue

We are experiencing issues on internal systems. We have disabled deployments to limit potential impacts on our internal systems.

EDIT 16:25UTC: fixed.

N.B. between issues and the deployments deactivation, some applications were responding HTTP 503. It's now fixed.

Infrastructure PAR unreachable from multiple networks

From 00:14:40 UTC to 00:23:10 UTC, the PAR zone was unreachable from multiple networks.

We don't know exactly what happened at this time but it looks like the impact was fairly minimal on actual users as we can't see any meaningful dip in aggregated incoming bandwidth usage of load balancers.

This post will be updated once we get more details from our network operator.

Tuesday 10th March 2020

No incidents reported

Monday 9th March 2020

Console Status of app not correctly displayed

The status of application is not correctly displayed in the console but this has no impact on the fact that they are up or down

EDIT: it's now fixed, app status and ssh access are now operational.

Access logs Metrics unavailability

Metrics cannot be queried currently, any request will return an empty result.

This is caused by multiple instances of the same component crashing at the same time.

We are working on fixing this, this may take a while for a definitive fix (30 minutes at best, 1h30 at worst).

14:41 UTC: Metrics are currently available but this will probably not last as there is only partial redundancy on the affected component and the cause of the crash is not fixed

15:23 UTC: Metrics cannot be queried again

15:33 UTC: Metrics can be queried, but issues may still arise from time to time, issue is still not fixed.

15:45 UTC: Two nodes of the storage backend crashed under the load caused by the reload of the first components, this caused a delay in the ingestion and a pause in the reload of the first components. At this time, ingestion is catching up on the delay and queries are running fine despite the issues. You will most likely encounter issues as we work our way through this.

16:48 UTC: We have complete redundancy, this issue is now fixed.

Sunday 8th March 2020

No incidents reported