Tuesday 5th March 2019

Cellar Elevated errors and response times on Cellar

We are investigating an elevated error rate and elevated response times on Cellar. Only some buckets / files are affected by this issue.

EDIT 14:01 UTC: Error rate is back to normal. Response times are going down, we are still watching the situation closely.

EDIT 15:40 UTC: We are seeing an elevated error rate again, this was caused by a restart of a node which triggered a very high load on other nodes (which is not supposed to happen). We are investigating.

EDIT 16:30 UTC: The error rate went down significantly but it's not over yet. We sadly cannot give any meaningful ETA as of now.

EDIT 16:55 UTC: The error rate is close to normal. One node is still in trouble and it's causing a few errors; it should resolve quickly.

EDIT 17:15 UTC: The failing node went back to normal at 17:02. We are still seeing a few errors for write requests as of now.

EDIT 17:23 UTC: The error rate is back to normal. A few nodes are still a bit slower than usual so performance is a bit hit or miss but it should go completely back to normal in up to an hour.