Dear Heroku: Quit blaming all of us when you fail. Do this instead…

Dear Heroku,

We developers think you do a good job keeping our apps running smoothly.

But when you have the inevitable outage from time to time, you need to stop telling our customers it’s our fault.

Currently, when I fail (such as a typo in my source code), a visitor to my site sees this:

And currently, when you fail (like this morning), a visitor to my site sees this:

Same message. In other words, when you fail, you’re telling the world it is your customer’s fault.

You apparently assume it has to be this way because many of your platform errors are indistinguishable from actual app errors, so you have no way to differentiate which error message to display.

But that is NOT true. Here’s your fix.

AFTER you have confirmed you are having trouble and posted that on your status screen, MANUALLY flip a switch that redirects the default “app error” screen (which you host on AWS) to a different file:

Then once you have things back to normal, flip the switch back to your normal app error message. (Of course, flipping the which-app-error-screen-do-we-show switch doesn’t actually have to be manual, it might happen whenever your internal alarms are triggered. My point is, even if you do it manually it’s really simple and ‘good enuff’.)

The worst case scenario is that apps with an actual app error shows a message suggesting it might not be the app’s fault. But if your underlying platform is unstable, so what?

It’s simple. It’s the right thing to do.

Other than that, keep up the good work.

P.S. Why I love Heroku: To my surprise, this blog post hit the top spot on HN at least briefly. My blog started throwing some app errors. I went to the command line and typed
heroku ps:scale web=10
to throw more dynos at it and was back up. Gotta love that.

This entry was posted in Business. Bookmark the permalink.

37 Responses to Dear Heroku: Quit blaming all of us when you fail. Do this instead…

  1. daudi says:

    Agree with Pardner: my users do need to know that the platform is temporarily down and they should check back in xx minutes. Totally different than an actual application error. When the PG&E line transformer blows up, the lights are out for a couple of hours until it’s fixed. Stuff happens; non-techie users can understand that and adjust. Heroku is great for what Heroku is built for. It doesn’t claim to be a 5 nines operation.

  2. Nicolas says:

    You care if the problem comes from heroku or your application. But yours users don’t care. They care that your service is down.

    That’s the point. The rest is just technical details.

    So yes you need a mean to check heroku status to check what the problem is (so you know if you need to try to correct something or just need to wait).

    I think with some clever JS you could even check if it is heroku fault or not and display it to users if you really care. I’d prefer that heroku take time to improve its uptime rather than improving its error page.

    What more? I don’t want heroku to change or tweak my error page, I want to be mine, with all the design, relevant information (like phone number if applicable or whatever).

    • Pardner says:

      In the case of a Heroku platform failure you don’t get to specify a custom error page – Heroku controls the error page in that case. So the issue is pretty simple – do you want them to display an error message that says YOUR app is THE problem, or do you want them to display a message that the problem is upstream?

  3. Lewis says:

    The architecture of your application is your decision.
    Whether or not you pass the buck back up the line for a failure, it is still a failure of the application you are responsible for.

    • Pardner says:

      An incorrect error message should be fixed, period. The current Heroku error message says an application error has occurred even when it is a platform error. I suggest one way they can make their error messages more accurate. You say “pass the buck” I say “tell the truth”.

  4. I love Heroku but yesterday was a difficult day. It wasn’t until after complaining on Twitter that I found out about the @herokustatus account. I get that Heroku has a way of doing things where they communicate status updates through that account and not @heroku but in the case of serious outage like this there should have been communication on the regular @heroku twitter account asking people to follow @herokustatus. Over 15k people follow @heroku, less than 4k follow @herokustatus. It is easy to feel like Heroku is not communicating with anyone if you don’t know that other account exists.

  5. Robert Ross says:

    It is the application providers responsibility to determine what should be shown. The real solution is letting heroku give us the option to serve a page for their errors. Give us the option to pick our error page.

  6. Customer doesn’t really care who’s fault is that. It either works or it doesn’t, the rest is irrelevant.

    And if you dig a little deeper – what should Heroku do if this a fault of someone they depend on? Would you tolerate error message like this: “This application is hosted at Heroku, which experiencing technical difficulties due to temporary Amazon EC2 unavailability related to recent tornado hit in Northern Virginia”?

    • Pardner says:

      Wouldn’t bother me one bit if they provided a link to their status page so my users could see what the prognosis is.

  7. Michael Y says:

    In my opinion, your perception here is wrong. It is always your (our) fault when the app breaks.

    When your site fails to work the customer doesn’t care about whether it’s hosted on provider A or provider B, or whether it’s using language X or language Y with bindings to runtime Z. It’s 100% your problem that you are using something that sometimes breaks and do nothing about it. Prepare a backup site (in AWS, Rackspace, your PC, whatever) that can handle some of the traffic but still have the service online.

    If you are using libfoo in your app’s code, and something breaks because of that bug, will you display a message saying “Oh, it looks like libfoo is broken, try again after foocorp will release a patch”?
    No, you obviously won’t. You’ll fix it somehow.

    Whether it’s the code, the hardware or the platform that fails, it doesn’t matter. It’s your site and your responsibility. The customer will always complain to you.

    Obviously, the world is not perfect, and using a single platform is a great idea (with the right SLA), but _you_ chose Heroku, not the customer. So you ought to give the answers, not Heroku.

  8. Tyler Johnson says:

    I agree it would be nice to have a slightly different error message, but I think the only person that it will cheer up is the website admin. Everyone else will still be wasting time waiting for a page to load, trying to do something (hopefully) productive on your website, and not getting it. As Dave said, none of your customers know or care who Heroku is, they just know your website isn’t working.

    That said, should they even really have to “flip a switch”, be it manual or not? Can’t they just see that it’s an H99 error before serving the appropriate error page?

    • Pardner says:

      Might depend on the customer. But my main point to Heroku is: this is a free solution so it does not need to be perfect to be an improvement over the status quo. If a service I use is down because of a wide-spread hosting issue I cut THEM more slack than if it’s down because they pushed a bad build. Yeah, it’s down either way, and yeah, they chose the hosting provider, but if they made a reasonable hosting choice (Heroku vs El-cheapo Host Company), yeah I’ll cut them some extra slack. So, IMO, Heroku ought to ‘own’ the app errors when it’s a platform issue when ten thousand sites stop working.

      • Ben says:

        Heroku is not a free service. We are paying them upwards of a thousand dollars a month at our startup. It’s unacceptable to have outages of 2 hours on a production system with no fail over system, regardless of the cause.

        It’s only free if you are running a small scale app with no extra resources.

        • Pardner says:

          We run quite a few sites, and for all the production sites run multiple dynos and workers, so it’s far from free…. and the cost for us to establish the same reliability 24×7 as Heroku would be prohibitive. So while I gnash my teeth when things get flaky for 30+ minutes like today, it hasn’t been an issue with our current customer based as yet. (On the other hand, one of my prior gigs, we ran everything ourself because we needed two more 9′s of reliability than heroku provides. So it’s always a tradeoff.

        • Terry says:

          I can’t express how much I second this comment.

          I’m so tired of hearing about Heroku getting slack because their service is free — it is not! It sure as hell isn’t for my company. Not only do we pay for dynos, workers, but we also pay for Heroku’s affiliates. We pay, they provide. ish.

          Two days ago Postgres has a credential malfunction, leaving me powerless, without a database, with my only recourse being a support ticket, unresolved for nearly 30 hours. And my customers/employer thought it was my fault.

          Unacceptable. And paying the $500 monthly support fee for 2-hour response times is also completely crap.

          Pull it together, Heroku. Segregate your customers, I don’t give a shit, just give me what I pay for!

  9. Eric Boyer says:

    Wow, you’re paying the price of 10 dyno’s to host a WordPress blog?

    • Pardner says:

      And THAT’s why I love Heroku. I’m paying for 10 dynos (total: 45 cents/hour) but ONLY while the traffic is ridiculous, then I’ll dial it right back down to normal. Say it’s 4 hours… it costs me $2 to handle the spike instead of crashing.

      • Emil Hajric says:

        Why not just use WPEngine?

        • Pardner says:

          Because to me it’s not worth paying 99/month to be able to handle 100k visits/month. Setting up WP on Heroku was trivial, and free, except when I crank up the dynos to handle brief spikes like today.

  10. Johnathan says:

    It’s your fault for not having a fault tolerant site that runs on another service provider. This is what happens when you put your eggs in one basket and that basket bursts into flames.

    If reliability is so important, make it a priority instead of just expecting stuff to work or for a more politically correct error message — which leads me to my next point: who cares about the ERROR message?

    • Pardner says:

      I DO have a fault tolerant site – Heroku. It’s not perfect, but it “takes a licking and keeps ticking” the vast majority of the time. Nor do I expect it to be perfect. For most of my web apps, the reliability level has been adequate and cost effective.

  11. sten frigs says:

    Why do web people insist on putting crap like “oops” in error messages? It’s not cute, it’s irritating and unprofessional. If a word in your error message doesn’t directly convey information, get it the hell off my screen.

    • Pardner says:

      Guilty as charged. You would have *really* hated the Haiku version of the error message I almost used. ;)

    • Alan Hogan says:

      Agreed so hard, Sten. It’s lazy, unprofessional, and perhaps even flippant. If your users care about your app and its availability, take uptime and stability seriously — and reflect that in your messaging.

    • Patrick says:

      However, “Application Error” is no more informational than “Oops” and just makes the error feel enterprise-y and not all that personal, which isn’t a good feeling. There’s no harm in shared surprise in these cases.

    • Florian says:

      Unforseen error messages can’t really be indicated in an informative way, So I believe you’re asking for this:

      Error! [retry]

    • Yurij says:

      No. Keep it informal. Its a human mistake tripulated by machines. An “Application Error” infers something intangible and rather sterile. Something sterile I can easily despise, judge and neglect. A human is far harder to do so to. However, “Ooops!” Could still be improved upon.

    • Jeremy Russell says:

      In all my conversations with all the different people who’s computer’s I’ve maintained. I haven’t seen a single one that would prefer some technical jargon over an “Oops” any day.

      There’s also this point to make.

      As a developer, lets say I prefer happy-go-lucky laid back customers. Ones that don’t prefer technical when they could get simple. Well how would I get those customers to stick around or come while keeping less desirable customers away? You portray yourself (the company) as an entity that is laid back and prefers the simple over the overly technical.

      TL;DR – People have so many different opinions on that, it’s whatever the people that made it want to show their image off as anyways.

  12. Robert Coker says:

    Heroku supports custom error pages. It’s up to you to create them.

    https://devcenter.heroku.com/articles/error-pages

    • Pardner says:

      Thanks, yes, but when the platform itself fails it seems that doesn’t always work. Hence my suggestion they temporarily change the default error message.

  13. Kent Fenwick says:

    Well said.

    This seems like a simple and effective fix for a common problem we run into each time this happens.

    Heroku take notice!

    • Pardner says:

      You’re right, and that seems to address the issue when the app actually has a error. Although I did read a comment on HN that sometimes (today) the app-specific custom error pages don’t work when the platform itself fails.

  14. Dave says:

    In the customer’s eyes, it’s your fault the application is failing and they probably don’t know or care who heroku is. It IS your fault, your fault for depending on Heroku, your fault for not building redundancy. Dear Customer, please stop blaming Heroku for your infrastructure decisions.

    • Pardner says:

      I wish building additional redundancy behind a Heroku app was simpler. As is, Heroku seems extremely cost-effective to me. But when there is a failure at their end, the error message they throw should reflect that.

    • Pardner says:

      Depends on the app, right? Sometimes 3 nines of reliability is okay, sometimes’s it’s not. You pay through the nose for each additional nine, so there’s a tradeoff to make. But *whatever* you do, there’s going to be outages. And when they do occur, IMO, it is fair and reasonable for a hosting platform to distinguish between the cases where they fail, vs the app itself.