Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

So what on earth did I do?

Options
123468

Comments

  • Registered Users Posts: 14,009 ✭✭✭✭wnolan1992


    I tried and failed to get #nowyerecravin trending. :(


  • Closed Accounts Posts: 5,628 ✭✭✭Femme_Fatale


    Right.
    Yeh it is. He gave a very detailed explanation for what happened, which is pretty laudable.
    Sheesh, if he didn't go into detail, no doubt you'd be whingeing about lack of transparency.

    And as someone who has posted a LOT here over the past several years - too much! - I'm at a loss to understand why people would be getting snarky with Boards on Twitter over the outage for a few hours of what isn't actually an essential service, particularly on a Sunday. It's amazing how people take out their annoyance over I.T. issues on people whom they know can't do anything about it.


  • Posts: 31,118 ✭✭✭✭ [Deleted User]


    40 million new posts in one of the airsoft forums, that would have been great craic for anyone following that forum with instant email notifications set up. :D
    GAAman wrote: »
    Please PLEASE tell me this happened!!!

    Perhaps someone should post the question in the relevant forum.

    Should soon flush out anyone with a blocked mailbox ;)


  • Registered Users Posts: 41,067 ✭✭✭✭Annasopra


    Is there an issue now with images on the site? I cant see any images in posts!

    It was so much easier to blame it on Them. It was bleakly depressing to think that They were Us. If it was Them, then nothing was anyone's fault. If it was us, what did that make Me? After all, I'm one of Us. I must be. I've certainly never thought of myself as one of Them. No one ever thinks of themselves as one of Them. We're always one of Us. It's Them that do the bad things.

    Terry Pratchet



  • Posts: 31,118 ✭✭✭✭ [Deleted User]


    I just asked this in the AH thread.
    OK DEV slipped up and put the entire 40 million posts in one of the airsoft forums. eek.png

    So who had instant email notification set up for new posts. tongue.png

    Just imagine logging on and seeing "40,000,000 new messages!"

    I doubt that I'll get a serious answer.. :cool:


  • Advertisement
  • Closed Accounts Posts: 6,925 ✭✭✭RainyDay


    Dav wrote: »
    I was not as careful with the vBulletin tools that manage this as I should have been and I accidentally copied every post from the site into one of the Airsoft forums instead of just the posts from one of the forums that was getting merged.

    As you might imagine, moving 40+ million posts caused something of a catastrophic problem for the site.
    Benny_Cake wrote: »
    I think everyone who works in IT has had at least one horrible moment when you realise that you've done something incredibly f**ked up. The time I ran a test script in our live environment rather than our test environment springs to mind!

    Dav wrote: »
    It's worth pointing out some of the technical reasons (and I'm trying not to be very technical in my explanations mostly cause I'm not fully versed in them myself) why it takes 8 hours to put things back in place. You can't simply "un-do" a mass-move of threads in VBulletin. Even if I had direct database access and could remember how to MySQL properly (and I used to be reasonably competent with it 10+ years ago), I couldn't have rolled this error back. It's yet another on the giant list of reasons why we're breaking away from VBulletin.

    Just a constructive suggestion, and apologies if I'm teaching my granny how to suck eggs - most web environments would have separate development, test and live environments to avoid this kind of problem.

    The kind of changes that were being done to the Airsoft forums should ideally have been done in a development environment, and then once tested and confirmed, migrated to the live environment. The discipline and segregation of duties involved in this kind of situation generally prevents this kind of minor error having a catastrophic effect on a live system.

    Perhaps it is a limitation of vBulletin that prevents this kind of structure. If you're moving to a new architecture, you might like to get this kind of structure as part of the new architecture.


  • Registered Users Posts: 7,819 ✭✭✭fussyonion


    I'd give ya a cuddle, Dav!

    Not a lick, I pwomise. I mean, I'M not a lick...and also, I won't lick you.
    Right, I've dug myself a hole now..I'll shut up.


  • Registered Users Posts: 66 ✭✭Guest0000


    Couldn't you just have switched it off at the wall, and waited a few minute, before turning back on..:)


  • Registered Users Posts: 2,045 ✭✭✭OzCam


    There are two kinds of computer users: those who have lost data, and those who will.

    That's why God gave us backups. And why you always do stuff like this on a Sunday (of a Bank Holiday weekend if possible).

    Well recovered, and well communicated guys.


  • Closed Accounts Posts: 31,967 ✭✭✭✭Sarky


    I've waited nearly 30 years for this video to become relevant:



  • Advertisement
  • Registered Users Posts: 10,658 ✭✭✭✭The Sweeper


    /spits tea down screen


  • Registered Users Posts: 10,981 ✭✭✭✭dulpit


    Ultimately this is a great lesson in how to customer service. Boards went down, we were informed via twitter (where else?) and got a prompt and detailed update once it was back online.

    Some larger and (I assume) far more profitable companies should take note.

    Hats off to boards, nice one.

    (I still think a forfeit for Dav is needed, maybe in the form of a video AMA?)


  • Registered Users Posts: 7,934 ✭✭✭Renegade Mechanic


    Hula dance. With the grass skirt and coconut bra :D


  • Registered Users Posts: 4,767 ✭✭✭cython


    RainyDay wrote: »
    Just a constructive suggestion, and apologies if I'm teaching my granny how to suck eggs - most web environments would have separate development, test and live environments to avoid this kind of problem.

    The kind of changes that were being done to the Airsoft forums should ideally have been done in a development environment, and then once tested and confirmed, migrated to the live environment. The discipline and segregation of duties involved in this kind of situation generally prevents this kind of minor error having a catastrophic effect on a live system.

    Perhaps it is a limitation of vBulletin that prevents this kind of structure. If you're moving to a new architecture, you might like to get this kind of structure as part of the new architecture.

    I doubt that this is necessarily a limitation of vBulletin specifically, and sounds more like human error, i.e. Dav clicked a wrong button somewhere in a UI that he has likely used countless times before. While you can use a/test dev environment to "rehearse" this operation, this does not completely preclude the possibility of an error being made on the production run, as ultimately the "migration" of such changes is simply manually carrying out the same set of steps.

    Now maybe I also have the wrong end of the stick here, but unless the changes were being carried out by means of a SQL script of DML updates, then the "migration" of changes from a dev to prod environment is not quite as straightforward as suggested, and could still see the same thing result.


  • Registered Users Posts: 2,094 ✭✭✭Liamario


    NEVER go full retard...


  • Registered Users Posts: 619 ✭✭✭white_westie


    Dav

    I call this the 'o f**k it' command - you realise you should not do it just as you are about to press the enter key.

    Anyway, all real sys admins admit their mistakes, and luckily enough you ever make a few of them throughout your career.

    Hope you its didn't ruin your day

    WW


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    cython wrote: »
    I doubt that this is necessarily a limitation of vBulletin specifically, and sounds more like human error, i.e. Dav clicked a wrong button somewhere in a UI that he has likely used countless times before. While you can use a/test dev environment to "rehearse" this operation, this does not completely preclude the possibility of an error being made on the production run, as ultimately the "migration" of such changes is simply manually carrying out the same set of steps.
    This. I can visualise exactly the interface that Dav had to use for this, and it's a whole pile of boxes where you input values, and if you don't input a value, vBulletin interprets that to mean "all" instead of "none".

    So even if you're careful, you can still make a mistake like this by overlooking one box.

    Add on top of that, the maintenance page is very much a traditional web interface where you click the button and then control is removed from you. The web server goes off and does what it's told, and short of logging into the box and killing the process, there is nothing you can do to stop it.
    No progress bars, no "Cancel" button, you're just left watching a "Page loading" icon while the realisation of what you've done dawns on you as you review the inputs you've entered. They sit there taunting you, you could change them, but it will have no effect now. They're just echoes of the damage which has already been done as soon as you released that mouse button.

    I'm sure removing/upgrading those interfaces is a story in a very long backlog.


  • Closed Accounts Posts: 6,925 ✭✭✭RainyDay


    cython wrote: »
    I doubt that this is necessarily a limitation of vBulletin specifically, and sounds more like human error, i.e. Dav clicked a wrong button somewhere in a UI that he has likely used countless times before. While you can use a/test dev environment to "rehearse" this operation, this does not completely preclude the possibility of an error being made on the production run, as ultimately the "migration" of such changes is simply manually carrying out the same set of steps.

    Now maybe I also have the wrong end of the stick here, but unless the changes were being carried out by means of a SQL script of DML updates, then the "migration" of changes from a dev to prod environment is not quite as straightforward as suggested, and could still see the same thing result.
    seamus wrote: »
    This. I can visualise exactly the interface that Dav had to use for this, and it's a whole pile of boxes where you input values, and if you don't input a value, vBulletin interprets that to mean "all" instead of "none".

    So even if you're careful, you can still make a mistake like this by overlooking one box.

    Add on top of that, the maintenance page is very much a traditional web interface where you click the button and then control is removed from you. The web server goes off and does what it's told, and short of logging into the box and killing the process, there is nothing you can do to stop it.
    No progress bars, no "Cancel" button, you're just left watching a "Page loading" icon while the realisation of what you've done dawns on you as you review the inputs you've entered. They sit there taunting you, you could change them, but it will have no effect now. They're just echoes of the damage which has already been done as soon as you released that mouse button.

    I'm sure removing/upgrading those interfaces is a story in a very long backlog.

    I don't know much about vBulletin, but normally, moving from dev to test and test to live is more than just repeating the same set of commands manually. It will often involve moving updated code and config files, rather than repeating the same commands manually.

    Even in worst case scenario, the requirement to rehearse the change in dev and test first reduces the chances of a simple error. Involving other people as the change moves from dev to test reduce the chances of a simple error.


  • Registered Users Posts: 4,767 ✭✭✭cython


    RainyDay wrote: »
    I don't know much about vBulletin, but normally, moving from dev to test and test to live is more than just repeating the same set of commands manually. It will often involve moving updated code and config files, rather than repeating the same commands manually.
    Indeed it does, but nowhere did Dav mention that this was a deployment, nor that there were any config file changes or a release, or anything similar. In fact, he stated that this was done with a tool in vBulletin, and as such it could be at the most rehearsed, as you said, on another environment, but the same thing could still have happened. It''s worth noting that this is nothing specific to vBulletin, and would be applicable to any software that allows bulk updates of any sort through the UI,
    Dav wrote: »
    There was some maintenance to be done - the Airsoft forums are getting a bit of a tidy up, so I started on that.

    I was not as careful with the vBulletin tools that manage this as I should have been and I accidentally copied every post from the site into one of the Airsoft forums instead of just the posts from one of the forums that was getting merged.

    As you might imagine, moving 40+ million posts caused something of a catastrophic problem for the site.

    RainyDay wrote: »
    Even in worst case scenario, the requirement to rehearse the change in dev and test first reduces the chances of a simple error. Involving other people as the change moves from dev to test reduce the chances of a simple error.

    I agree with the above for the most part, but in this instance, the only way that this might have been avoided would have been by means of the four eyes principle, as even if rehearsed in another environment, the same mistake could have been made if prod run was done single-handedly, and similar mistake have been made in plenty of systems in the past, and will continue to be made. Such is life.


  • Closed Accounts Posts: 6,925 ✭✭✭RainyDay


    cython wrote: »

    I agree with the above for the most part, but in this instance, the only way that this might have been avoided would have been by means of the four eyes principle, as even if rehearsed in another environment, the same mistake could have been made if prod run was done single-handedly, and similar mistake have been made in plenty of systems in the past, and will continue to be made. Such is life.

    Yes indeed, mistakes will continue to be made. In fact, the same mistake will continue to be made, unless somebody tries looking at what caused the mistake and changing things so that it doesn't happen again.


  • Advertisement
  • Registered Users Posts: 6,783 ✭✭✭knucklehead6


    RainyDay wrote: »
    Yes indeed, mistakes will continue to be made. In fact, the same mistake will continue to be made, unless somebody tries looking at what caused the mistake and changing things so that it doesn't happen again.


    which is why boards are moving away from vbull


  • Administrators, Entertainment Moderators, Social & Fun Moderators, Society & Culture Moderators Posts: 18,727 Admin ✭✭✭✭✭hullaballoo


    Just for the sake of clarity, the interface Dav was likely using looks like the image below. It's not the case that it involved any coding or db queries as such. It's literally just select from drop-down menus, radio buttons and check-boxes. Occasionally, the interface will ask for a value, e.g., the forum number (Feedback is f=82, so the value will be 82).

    Unfortunately, vbulletin often assumes "all" instead of "none" where a value is left blank, a box unticked etc. It's literally the easiest thing in the world to make this sort of mistake given the below interface. It has about the same probability as making a typo in a long post.

    Pic:
    Bdr09200.png


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    RainyDay wrote: »
    Even in worst case scenario, the requirement to rehearse the change in dev and test first reduces the chances of a simple error. Involving other people as the change moves from dev to test reduce the chances of a simple error.
    It's not a config change though. It's routine administration. Aside from perhaps some form of system which has massive HA and audit requirements, there isn't a system in the world where people rehearse day-to-day administrative activities in a non-production environment. It would be like having bank tellers lodge every cheque in a pre-prod environment before actually doing it for real.

    You're right, it would probably have avoided this. But it would be a cumbersome pointless exercise for the other 99.9% of tasks. A decent system wouldn't allow this kind of unfettered change in the first place, which is why vB is being ditched.
    RainyDay wrote: »
    Yes indeed, mistakes will continue to be made. In fact, the same mistake will continue to be made, unless somebody tries looking at what caused the mistake and changing things so that it doesn't happen again.
    I bet you're the first person to suggest that.

    Yes, that's sarcasm. The guys in the office are professional developers and overall very smart people. I'm pretty sure the very first thing they said after it was all fixed was, "How do we stop this happening again?".


  • Closed Accounts Posts: 3,407 ✭✭✭lkionm


    Woooop computers.


  • Closed Accounts Posts: 5,797 ✭✭✭KyussBishop


    Out of curiosity, what forum software is being looked at other than VBulletin?

    I would imagine most other software out there would have its own different (but no less significant) set of drawbacks, and I can't (offhand) think of any other forum software that is comparable to VBulletin for user experience (so that kind of a change, might have a big negative impact, greater than any temporary outage).


  • Registered Users Posts: 26,578 ✭✭✭✭Creamy Goodness


    People are actually questioning how this was an honest mistake and one that vbulletin's UX/UI didn't help with. Woah.

    The simple fact that vbulletin has a "copy all posts to this forum" option just shows you it wasn't built to handle 40million posts!


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    Out of curiosity, what forum software is being looked at other than VBulletin?
    They're building their own software rather than relying on generic software.


  • Closed Accounts Posts: 6,925 ✭✭✭RainyDay


    seamus wrote: »
    It's not a config change though. It's routine administration. Aside from perhaps some form of system which has massive HA and audit requirements, there isn't a system in the world where people rehearse day-to-day administrative activities in a non-production environment. It would be like having bank tellers lodge every cheque in a pre-prod environment before actually doing it for real.
    It's by no means clear to me how routine or otherwise this operation was, based on the details provided. Yes, there is a point about the effort that goes into admin activities, which has to be balanced by the time and effort that goes into recovering from screw-ups. If a routine activity has the potential to bring the site down for half-a-day, I'd respectfully suggest that it is worthwhile building appropriate procedures around such activities to limit the risk.
    seamus wrote: »
    I bet you're the first person to suggest that.

    Yes, that's sarcasm. The guys in the office are professional developers and overall very smart people. I'm pretty sure the very first thing they said after it was all fixed was, "How do we stop this happening again?".
    I went through the thread before I posted, assuming that somebody would have posted something about how to stop similar problems happening in future. There was no sign of any intention from the boards folks to take steps to address the root cause, and there was no sign of any suggestions from other posters as to how this might be achieved.

    So I made a constructive suggestion. I really couldn't give two hoots as to whether it is take on board or not. I'd have thought that if Boards.ie wants to be seen as a professional operation, it would be a fairly obvious and important requirement that somebody explains what steps have been taken to avoid reoccurance. But it's really not that big a deal for me personally, so I'll leave the discussion there.


  • Closed Accounts Posts: 8,840 ✭✭✭Dav


    Here's a screen shot of what I was looking at.

    287716.png

    You get to that page from a "Mass-Move Threads" link that appears to Admins on every forum. My error was wonderfully simple - I forgot that VBulletin was written and built by software people and has probably never had a UI/UX person ever look at it :)

    When you click into that page from a forum, it doesn't populate the listbox on the bottom of the page with that forum's name as I expected it to (and I don't think it an unreasonable expectation). It's set to "All Forums" by default and that's what got clicked.

    A test environment might have helped, or it might not. I do dozens of changes to the site every week, many of which have the facility to kill the site and all on the live environment (but I'll run it through a test machine internally if it's a procedure I've never done before so that I will know how it works and the tech team will have an indicator as to what the impact of the task is on the servers etc). Normally I'm careful about how I do it and tech team will be keeping an eye so they can monitor the servers for any unwanted fruitiness.

    The simple fact of the matter is I wasn't careful enough on this occasion. The process has quite obviously been reviewed internally (I didn't explicitly state that because I thought that the fact that we'd carry out a post mortem of the entire incident would have been self evident). Rainy Day, you have indeed made plenty of solid and helpful suggestions, but with respect, I would say that you do us a dis-service to suggest that there was no intention to take steps to prevent such things again just because we didn't explicitly post about them here on this thread. But there's no harm done in your suggestions and certainly no offence taken on my part.


  • Advertisement
  • Subscribers Posts: 19,425 ✭✭✭✭Oryx


    Wonderfully simple is right. For something that can cause such mayhem, it needs an
    "ARE YOU SURE? ARE YOU REALLY SURE??? FOR DEFINITE LIKE? WE'RE NOT MESSING!"
    button.


Advertisement