Fire - Fire is offline in some guilds – Incident details

All systems operational

Fire is offline in some guilds

Resolved
Partial outage
Started over 3 years agoLasted 27 minutes
Updates
  • Resolved
    Resolved

    Everything seems to be running smoothly now. It is still unknown when the issues started so Fire may have been offline for quite a while. I have systems in place to alert me if a cluster dies but due to a misconfiguration, it didn't see the one offline cluster as it was only set to recognize 2 of the 4 clusters meaning 1 out of 4 crashing didn't trigger the alert. This issue has since been resolved and any future issues like this should be resolved much sooner. I apologise for the inconvenience. I strive to have near perfect uptime for Fire as poor uptime is one of the issues I've had with other bots but while this issue was caused by something not 100% in my control I do consider it unacceptable that it wasn't able to automatically recover and will be working to improve detection of issues like this which alongside the fixed alerting should ensure this doesn't happen again. An interesting note on this incident: The ongoing rewrite of the Fire website was able to correctly identify and display the outage status for cluster 3 which means the issue was detected therefore it's possible it was actually an issue on Discord's end that lead it to not reconnect.

  • Monitoring
    Monitoring

    Fire should now be online in all servers. If you still see Fire as offline, first try restarting Discord and if that does not resolve the issue, let me know in the #fire-help channel in Fire's Discord server ( https://inv.wtf/fire )

  • Update
    Update

    While attempting to get cluster 3 back online, it got assigned a different cluster id and shard which indicates another cluster may be having issues. I will instead perform a full clean restart of Fire (taking all clusters offline and then bringing them up one by one to ensure each process gets the correct id) so it will go offline temporarily in all servers.

  • Identified
    Identified

    It seems that Fire's VPS had some intermittent network issues and that specific cluster did not recover (the process is still alive but not connected to Discord) It should come back online in ~30 seconds

  • Investigating
    Investigating

    It seems a cluster (specifically cluster 3) has crashed and didn't automatically recover. I am investigating the cause and will get it back online as soon as possible.