- ResolvedResolved
Everything seems to be running smoothly now. It is still unknown when the issues started so Fire may have been offline for quite a while. I have systems in place to alert me if a cluster dies but due to a misconfiguration, it didn't see the one offline cluster as it was only set to recognize 2 of the 4 clusters meaning 1 out of 4 crashing didn't trigger the alert. This issue has since been resolved and any future issues like this should be resolved much sooner. I apologise for the inconvenience. I strive to have near perfect uptime for Fire as poor uptime is one of the issues I've had with other bots but while this issue was caused by something not 100% in my control I do consider it unacceptable that it wasn't able to automatically recover and will be working to improve detection of issues like this which alongside the fixed alerting should ensure this doesn't happen again. An interesting note on this incident: The ongoing rewrite of the Fire website was able to correctly identify and display the outage status for cluster 3 which means the issue was detected therefore it's possible it was actually an issue on Discord's end that lead it to not reconnect.
- MonitoringMonitoring
Fire should now be online in all servers. If you still see Fire as offline, first try restarting Discord and if that does not resolve the issue, let me know in the #fire-help channel in Fire's Discord server ( https://inv.wtf/fire )
- UpdateUpdate
While attempting to get cluster 3 back online, it got assigned a different cluster id and shard which indicates another cluster may be having issues. I will instead perform a full clean restart of Fire (taking all clusters offline and then bringing them up one by one to ensure each process gets the correct id) so it will go offline temporarily in all servers.
- IdentifiedIdentified
It seems that Fire's VPS had some intermittent network issues and that specific cluster did not recover (the process is still alive but not connected to Discord) It should come back online in ~30 seconds
- InvestigatingInvestigating
It seems a cluster (specifically cluster 3) has crashed and didn't automatically recover. I am investigating the cause and will get it back online as soon as possible.