the matrix.org homeserver is having problems: https://status.matrix.org/incidents/mm9hdm78svgv apologies for the inconvenience…
the matrix.org homeserver is having problems: https://status.matrix.org/incidents/mm9hdm78svgv apologies for the inconvenience…
But this is also a good reminder to use your own server. I am totally new to Matrix but started using it with my own instance (based on Synapse) from the beginning.
So: the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26. We're trying to restore the primary DB FS (which could be fastish), while also doing a point-in-time backup restore from last night (which takes >10h). We believe the incremental DB traffic since last night is intact however. Apologies for the downtime; folks on their own homeserver are of course not impacted.
(J/k, of course. Good luck with the recovery and thanks!)
Sorry, but it's bad news: we haven't been able to restore the DB primary filesystem to a state we're confident in running as a primary (especially given our experiences with slow-burning postgres db corruption). So we're having to do a full 55TB DB snapshot restore from last night, which will take >10h to recover the data, and then >4h to actually restore, and then >3h to catch up on missing traffic. Huge apologies for the outage. Again, folks using their own homeservers are not impacted.
When you restore and old save state every key generated and shared in the meantime is gone.
Right?
Status update: we’re 47TB through restoring the 55TB db snapshot of the matrix.org DB, but then have to rebuild the DB and replay the subsequent 17h of DB traffic, which will take several hours. Thank you for your patience, and apologies once again for the outage.
Status update: we've restored the 55TB snapshot and subsequent incremental backups, and are about to replay the remaining traffic since the backup. There are still several unknowns, but if things go well the matrix.org instance should be back in 3-4 hours.
Right, matrix.org is back online as of 17:00 UTC. The server is struggling a bit as it catches up. Huge apologies again for the outage; postmortem + ways to avoid a repeat will be forthcoming. See also https://www.theregister.com/2025/09/03/matrixorg_raid_failure/ & https://www.heise.de/en/news/Matrix-main-server-down-millions-of-users-affected-10630524.html. Thanks all for your patience.
Congratulation on the recovery, @matrix
While the postmortem should focus on what went wrong and how any likely reoccurrence of failures can be mitigated at acceptable cost, be sure to celebrate the successful recovery from catastrophic failure in production, including meaningful communication to us.
Many organisations with far more resources and responsibilities fail to achieve even a fraction of this.
That must have been rough and tough.
We love you! 🧡
ya queda menos... vaya fastidio.
@matrix@mastodon.matrix.org
A space for Bonfire maintainers and contributors to communicate