Hacker News Clone

Hacker News Clone new | comments | show | ask | jobs | submit | github repo

		Fly.io is having a complete outage (status.flyio.net)
		111 points by punkpeye 1 hour ago \| hide \| past \| web \| 55 comments \| favorite

benhoyt 1 hour ago [-]

My fly.io-hosted website went down for 5 minutes (6 hours ago), but then came right back up, and has been up ever since. I use a free monitoring service that checks it every 5 minutes, so it's possible it missed another short bit of downtime. But fly.io has been pretty reliable overall for me!

nomilk 24 minutes ago [-]

Would be fascinated to see your data over a period of months.

Application up time is flakey, but what was worse were fly deploys failing for no clear reason. Sometimes layers would just hang and eventually fail for no particular reason; I'd run the same command an hour or two later without any changes and it would just work as expected.

I'd love to make a monitoring service to deploy a basic app (i.e. run the fly deploy command) every 5 minutes and see how often those deploys fail or hang. I'd guess ~5% inexplicably fail, which is frustrating unless you've got a lot of spare time.

Joel_Mckay 6 minutes ago [-]

Doesn't Rust make Firecracker VM more robust, and thus more reliable... =3

LorenzoGood 1 minute ago [-]

What does rust have to do with fly.io?

akshayshah 20 minutes ago [-]

The series of outages early in 2023 also had some Corrosion-related pain: https://community.fly.io/t/reliability-its-not-great/11253

HellsMaddy 1 hour ago [-]

Suspiciously, Turso started having issues around the same time. Their CEO confirmed on Discord it's due to the Fly outage:

> Ok.I caught up with our oncall and This seems related to the Fly.io incident that is reported in our status page. Our login does call things in the Fly.io API

> we are already in touch with Fly and will see if we can speed this up

pier25 9 minutes ago [-]

My apps on Fly have not gone down this time.

arusahni 1 hour ago [-]

Oof, hugops to the team.

redslazer 1 hour ago [-]

fly.io just has the weirdest outages. It has issues so regularly we dont even need to run mock outages to make sure our system fail overs work.

duxup 1 hour ago [-]

When I worked for a company who worked with big banks / financial institutions we used to run disaster recovery tests. Effectively a simulated outage where the company would try to run off their backup sites. They ran everything from those sites, it was impressive.

Once in a while we'd have a real outage that matched the test we ran as recently as the weekend before.

I was helping a bank switch over to the DR site(s) one day during such a real outage and I left my mic open when someone asked me what the commotion was on the upper floors of our HQ. I said "super happy fun surprise disaster recovery test for company X".

VP of BIG bank was on the line monitoring and laughed "I'm using that one on the executive call in 15, thanks!" Supposedly it got picked up at the bank internally after the VP made the joke and was an unofficial code for such an outage for a long time.

NetOpWibby 14 minutes ago [-]

Thankfully your comment was positive!

benreesman 58 minutes ago [-]

In fairness to the fly.io folks (who are extremely serious hackers), they’re standing up a whole cloud provider and they’ve priced it attractively and they’re much customer-friendlier than most alternatives.

I don’t envy the difficulty of doing this, but I’m quite confident they’ll iron the bugs out.

redslazer 28 minutes ago [-]

The tech is impressive and the pricing is attractive which is why we use them. I just wish there was less black magic.

E.g. we had an issue last year where about half the machines allocated to us would only sporidically be able to connect to Neon database. They insist it was on our side, we just hot swapped to DO for a couple of months, and went back to fly.io once the issue disappeared.

stevefan1999 1 hour ago [-]

Yep...can confirm my self hosted Bitwarden there is completely FUBAR connection wise even if it is in EA, so it should be a worldwide outage...lemme guess, some internal tooling error, consensus split brain, or if it looks like someone leaked BGP routes again?

satoru42 24 minutes ago [-]

Mine is in Asia and it's still accessible.

jasonjayr 1 hour ago [-]

DNS. It's always DNS. /s

jart 1 hour ago [-]

https://github.com/jart/cosmopolitan/blob/master/third_party...

monkaiju 1 hour ago [-]

Might be! Shameless plug of a DNS tool i wrote years ago for anyone this pushes to learn more about DNS

https://dug.unfrl.com/

punkpeye 1 hour ago [-]

It is not reflected in their status page, but fly.io itself is not even loading.

nomilk 22 minutes ago [-]

https://fly.io/ loading for me

duxup 1 hour ago [-]

Confirmation ;)

teaearlgraycold 38 minutes ago [-]

I'm grateful to HN for keeping me well aware of Fly's issues. I'll never use them.

kachapopopow 33 minutes ago [-]

It's still 99.99+% SLA? Would you really pay 100% more for <0.01% more uptime?

runako 4 minutes ago [-]

No dog in this fight, all props to the Fly.io team for having the gumption to do what they are doing, I genuinely hope they are successful...

> It's still 99.99+% SLA

But this is simply not accurate. 99.99% uptime is < 52m 9.8s annually of downtime. They apparently blew well through that today. Looks like they essentially had the equivalent of 4 years of 99.99% uptime equivalent this evening.

Four nines is so unforgiving that it's almost the case that if people are required to be in the loop at any point during an incident, you will blow the fourth nine for the whole year in a single incident.

Again, I know it's hard. I would not want to be in the space. That fourth nine is really difficult to earn.

In the meanwhile, <hugops> to the Fly team as they work to resolve this (and hopefully get some rest).

cj 16 minutes ago [-]

I think what a lot of people fail to understand is that there are certain categories of apps that simply “can never go down”

Examples include basically any PaaS, IaaS, or any company that provides a mission-critical service to another company (B2B SaaS).

If you run a basic B2C CRUD app, maybe it’s not a big deal if you service goes down for 5 minutes. Unfortunately there are quite a few categories of companies where downtime simply isn’t tolerated by customers. (I operate a company with a “zero downtime” expectation from customers - it’s no joke, and I would never use any infrastructure abstraction layer other than AWS, GCP or Azure - preferably AWS us-east-1 because, well, if you know the joke…)

mrcwinn 15 minutes ago [-]

This is not my experience at all, as a former paying customer.

MaxfordAndSons 1 hour ago [-]

Kinda funny that they've named their global state store "Corrosion"... not really a word I'd associate with stability and persistence.

lordofgibbons 56 minutes ago [-]

It's an internal project based on Rust, not a product. So I don't think it matters too much what they name it. It's opens source which is great, but still not a product that they need to market.

SOLAR_FIELDS 36 minutes ago [-]

And to be fair, it’s a bit of a cute meme to name rust projects things that relate to it. Oxide, etc

dumah 17 minutes ago [-]

I take your point but corrosion-resistant metals such as Aluminum, Titanium, Weathering Steel and Stainless Steel don’t avoid corrosion entirely but form a thin and extremely stable corrosion layer (under the right conditions).

kermatt 1 hour ago [-]

https://community.fly.io/t/reliability-its-not-great/11253

https://github.com/superfly/corrosion

EGreg 48 minutes ago [-]

What exactly does flyio.net do?

HellsMaddy 32 minutes ago [-]

If you mean specifically flyio.net and not just fly.io the company, I'm guessing they host their status page on a separate domain in case of DNS/registrar issues with their primary domain.

vachina 6 minutes ago [-]

It’s basically what Heroku used to be but with CDN-like presence.

stackghost 28 minutes ago [-]

IIRC their value prop is that they let you rapidly spin up deployments/machines in regions that are closest to your users, the idea being that it will be lower latency and thus better UX.

michaelbuckbee 34 minutes ago [-]

Hosting service that has a lot of interesting distributed features.

eek2121 26 minutes ago [-]

WEB 2.0. SEE. TOLD YA! THEY SHOULDA UPGRADED TO THAT NEWFANGLED 3.0! ;)

shubhamjain 1 hour ago [-]

This is probably 5th or 6th major outage from Fly.io that I have personally seen. Pretty sure there were many others and some just went unnoticed. I recommended the service to a friend, and within two days he faced two outages.

Fly.io seriously needs to get it together. Why it hasn’t happened yet is a mystery to me. They have a good product but stability needs to be an absolute top for a hosting service. Everything else is secondary.

SOLAR_FIELDS 35 minutes ago [-]

I get this but I think if people can give GitHub a pass for shitting the bed every two weeks maybe Fly should get a bit of goodwill here. I am not affiliated with Fly at all but I do think that people should temper their expectations when even mega corp can’t get it right

I guess the secret is to be the incumbent with no suitable replacement. Then you can be complete garbage in terms of reliability and everyone will just hand wave away your poor ops story

ojame 21 minutes ago [-]

The biggest difference is GitHub in your infrastructure is (nearly always) internal. Fly in your infrastructure is external. Users generally don't see when you have issues with GitHub, but they do generally see when you have issues with Fly.

That's the core difference.

fragmede 5 minutes ago [-]

Who's giving GitHub a pass on shitting the bed? They go down often enough that if you don't have an internal git server setup for your CICD to hit, that's on you.

adityapatadia 1 hour ago [-]

We left it about a year ago due to reliability issues. We now use digitalocean apps and working like a charm. Zero downtime with DO.

mcqueenjordan 1 hour ago [-]

Reliability is hard when your volume is (presumably) scaling geometrically.

paxys 52 minutes ago [-]

Can't use the "reliability is hard" excuse when you are quite literally in the business of selling reliability.

mcqueenjordan 36 minutes ago [-]

It’s just not that big of a mystery. It’s not an excuse; it’s just true. Also, they’re not especially selling reliability as much as they’re selling small geo-distributed deployments.

ilrwbwrkhv 1 hour ago [-]

Does anyone use them beyond the free tier? Same with Vercel for example.

gk1 1 hour ago [-]

Vercel has revenue of over $100M. So yes at least a few companies use them beyond the free tier.

DataOverload 58 minutes ago [-]

We switched from Fly to CF workers a while ago, and never looked back

punkpeye 26 minutes ago [-]

They are fundamentally different. If Cloudflare provided a way to host docker containers with volumes though, that would be game over for some many paas platforms.

frakkingcylons 23 minutes ago [-]

I switched from apples to oranges and never looked back.

rstupek 25 minutes ago [-]

How are they equivalent?

eek2121 28 minutes ago [-]

congrats on not developing a playbook for the time you have to 'look back'.

Providers will fail. good contingencies won't.

...hears faint sound...I SAID GOOD, QUIET YOU!

mrcwinn 48 minutes ago [-]

I tried Fly early. I was very excited about this service, but I've never had a worse hosting experience. So I left. Coincidentally I tried it again a few days ago. Surely things must be better. Nope. Auth issues in the CLI, frustrations deploying a Docker app to a Fly machine. I wouldn't recommend it to anyone.

veggieWHITES 1 hour ago [-]

I was considering these guys the other day until I saw their pricing page: https://fly.io/pricing/

(There's not a single price on there, why even create the page?)

rascul 1 hour ago [-]

There's a link to what appears to be the actual pricing page https://fly.io/docs/about/pricing/

There's also a link to the pricing calculator https://fly.io/calculator

totetsu 44 minutes ago [-]

Is that calculator hourly or monthly?

schmichael 1 hour ago [-]

The prices are just one click deeper. Hardly a nefarious dark pattern.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact