Fire - Services offline/slow – Incident details

All systems operational

Services offline/slow

Resolved
Major outage
Started over 1 year agoLasted about 5 hours
Updates
  • Resolved
    Resolved

    Everything is back to normal!

  • Monitoring
    Monitoring

    Services have been restored and I am monitoring to ensure they perform as expected!

  • Identified
    Identified

    With the data from just Aether removed, influxdb booted up without any issues, so my assumption was indeed correct. Now it's time to figure out *why* that data caused this issue and figure out if the data can be restored and preventative measures put in place to ensure this does not happen again Services will begin to come back online soon.

  • Update
    Update

    I've managed to download the data from influx and am now going to try and get it working again. With influx currently stopped & the machine no longer compressing/uploading all the data, it is running as smoothly as it normally would so I think my assumption is correct. I've taken Aether & Fire offline again temporarily while I work on this

  • Update
    Update

    The issue with Fire & Aether using the wrong Node version has been resolved and they should come back online shortly. The backup of influxdb data is still ongoing so unfortunately no progress can be made on getting everything back to normal just yet

  • Update
    Update

    After getting some things back online, I noticed that they're using the wrong version of Node so I'm working on rectifying that and will bring them back up as soon as I can

  • Update
    Update

    I've updated Fire/Aether to add a toggle for using Influx, disabled that and have got them back up and running. Performance will be degraded though

  • Update
    Update

    The investigation is still ongoing. My current suspect is influxdb as it is failing to start after I did a reboot of the system so I am currently trying to download all the data from it so that I can clear it out and see if that changes anything. I have a lot of data stored in influx and have not cleared it out recently so this process will likely take some time. Services are currently offline as a lot of them make use of influx (especially Fire/Aether) and I don't want anything impacting this process so unfortunately, they will be staying offline until I can get everything downloaded.

  • Investigating
    Investigating

    There are currently some issues impacting services run on the VPS that hosts Fire which may cause services to go offline or be unstable if they're online. I'm currently investigating the cause and will work on resolving this as soon as possible