Fire - Notice history

All systems operational

Bot - Operational

100% - uptime
Oct 2021 · 100.0%Nov · 100.0%Dec · 100.0%
Oct 2021
Nov 2021
Dec 2021

Website - Operational

100% - uptime
Oct 2021 · 100.0%Nov · 100.0%Dec · 100.0%
Oct 2021
Nov 2021
Dec 2021

Notice history

Nov 2021

Oct 2021

Fire is offline in some guilds, responding twice in others
  • Resolved
    Resolved

    This issue has been resolved. The cause of the issue was a queue that clusters get put into after they identify with Aether based on the max concurrency returned by Discord in /gateway/bot to prevent too many shards starting at once. This queue took too long for some clusters to be assigned an id and shards so they disconnected and entered a race condition that allowed two of them be assigned the same id and shards, causing the double responses and the shard it was supposed to be given was now stuck offline. I noticed this issue pretty quickly but I have added more robust monitoring for this exact scenario to alert me almost immediately.

  • Monitoring
    Monitoring

    A fix has been implemented and is currently being deployed

  • Identified
    Identified

    The issue has been identified and a fix is being made. Unfortunately I cannot bring cluster 3 online without this fix as the issue will just occur again.

  • Investigating
    Investigating

    It seems a recent deploy has caused some issues with assigning cluster & shard ids. Cluster 2 (Fire will say 3/4 if this is your cluster, id is zero indexed whereas the bot's status is not) is currently responding twice and shard 3 (which should be on cluster 3) is offline as cluster 3 is currently assigned the id 2. I will attempt to resolve the issue without rolling back changes but it may be necessary to do so and make a more permanent fix.

Most if not all services unavailable
  • Resolved
    Resolved

    This incident has been resolved.

  • Monitoring
    Monitoring

    Everything that was running is once again running (after pain with poetry), I am now going to check to make sure nothing has broken in the process

  • Update
    Update

    All services listed on the statuspage are up and running! Working on internal services now. These services being down may affect some features in external services so you may encounter some errors.

  • Update
    Update

    Some services are back online, continuing to work on the rest!

  • Update
    Update

    Restoring from the PM2 dump was not successful so I will be manually bringing all processes back. This may take a while but I will try and start important processes first. I apologise for the inconvenience

  • Update
    Update

    While working on restoring, I had noticed PM2 was still trying to use the old Node version. It's been a couple minutes and I have figured out why so restoring should hopefully not take too much longer!

  • Identified
    Identified

    While switching from nodesource to nvm for managing Node versions, PM2 was killed and did not restore the process list when restarting. I am working on restoring everything now

Oct 2021 to Dec 2021

Next