Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi all! We have been experiencing an issue on site where threads have been missing the latest postings. The platform host Vanilla are working on this issue. A workaround that has been used by some is to navigate back from 1 to 10+ pages to re-sync the thread and this will then show the latest posts. Thanks, Mike.
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

So what happened this time?

  • 14-01-2014 6:03pm
    #1
    Closed Accounts Posts: 8,840 ✭✭✭


    You go all year with no outages and then 2 come along within 10 days.

    So what happened today?

    First of all, it wasn't me! :D

    One of the database slaves had a major failure with it's hard disks. Before anyone asks, yes they were in RAID (1 to be exact), but both disks failed. It's rare that your redundancy fails at the same time as the main device, but not unheard of.

    We all probably noticed the slow down around lunch time as the knock on effects of the disks dying. At 14:30 one of our sys admins was in his car on the way out to Digiweb to see what was wrong. At 15:00, we took the site offline as it was clear that we were risking data loss and corruption if we continued to let the database machines get out of sync.

    After a quick investigation, we realised that if we didn't allow posting, people could still browse Boards, so we got busy making the site Read Only and set that live whilst Chris went about rebuilding the server with new hard disks.

    So here we are at 5pm and work is ongoing. You can read this, but can't reply yet, but we wanted to let you know what's going on with as much detail as possible.
    Post edited by Shield on


«1

Comments

  • Registered Users, Registered Users 2 Posts: 22,799 ✭✭✭✭The Hill Billy


    Is a Thank not a write?


  • Closed Accounts Posts: 8,570 ✭✭✭Rovi


    I've found that non-mods can still send PM's :D


  • Registered Users, Registered Users 2 Posts: 26,090 ✭✭✭✭Mrs OBumble


    Do threads all look locked to ordinary users?

    I've had a reported post saying "Why is this thread locked???can you re open it please this applies to me yet again"

    Due to the no-text-only-emails "feature" (grr), I cannot go directly to the thread that's referred to. But am wondering if that's what really happened ... and if so, que lots of reported posts :-)


  • Closed Accounts Posts: 8,840 ✭✭✭Dav


    They're not locked, the person will get a message telling them they can't post because of the problem if they try.

    I've had to close some forums manually because they have weird permissions, so that will say "forum is closed for posting" but that'll all be switched back when we're live again.


  • Closed Accounts Posts: 33,733 ✭✭✭✭Myrddin


    Dav wrote: »
    One of the database slaves had a major failure

    This is why slavery has largely been abolished throughout the world, it's just not damned reliable enough. I recommend getting in some paid workers, yes it costs in the long run, but it balances out with reliability...


  • Advertisement
  • Closed Accounts Posts: 9,330 ✭✭✭Gran Hermano


    How old was the database server/drives?

    Could one say they were twelve years a slave?


  • Registered Users, Registered Users 2 Posts: 8,584 ✭✭✭TouchingVirus


    How old was the database server/drives?

    Could one say they were twelve years a slave?

    Boooooo :pac:
    Is a Thank not a write?

    My guess on this is a thank is a write but writes aren't really a problem...on a small scale. The problem would be with a slave out of action you'd massively load the other slave(s?) leading to slowdowns, strange "missing" posts when you post in a thread but when it's displayed your post isn't there. Cue the posts about the "missing" posts, notifications without there being new posts etc etc.

    mySQL's caching is surprisingly efficient but it does get nuked every time you write and when Boards is down a slave you really can't be affording to nuke the caches for posts/threads that often.

    All hail the new and improved hamster wheels, whenever they arrive. And fair play for implementing the Read-only site on such short notice. Commiserations to Chris, it has been a rough 10 days for him :p


  • Moderators, Recreation & Hobbies Moderators Posts: 27,654 Mod ✭✭✭✭Posy


    All hail the new and improved hamster wheels, whenever they arrive. And fair play for implementing the Read-only site on such short notice. Commiserations to Chris, it has been a rough 10 days for him :p
    Whatever. This sort of thing never happened back when Danny was here. :rolleyes: :pac:


  • Registered Users, Registered Users 2 Posts: 12,438 ✭✭✭✭El Guapo!


    Is it safe to come out now?

    <_<

    >_>


  • Closed Accounts Posts: 8,840 ✭✭✭Dav


    Not quite, but it's getting safer...


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 44,080 ✭✭✭✭Micky Dolenz


    El Guapo! wrote: »
    Is it safe to come out now?

    <_<

    >_>


    We are all open minded here.

    Good on you.


  • Registered Users, Registered Users 2 Posts: 6,843 ✭✭✭knucklehead6


    Are mods the only ones allowed post?


  • Moderators, Education Moderators, Technology & Internet Moderators Posts: 35,101 Mod ✭✭✭✭AlmightyCushion


    I think we all know what really happened here. Dav messed something up again and didn't want to get in trouble so he poured a big jug of coffee all over the server and blamed it on a hardware malfunction. For shame, Dav, for shame.


  • Registered Users, Registered Users 2 Posts: 8,584 ✭✭✭TouchingVirus


    Are mods the only ones allowed post?

    For now :) Enjoy it while it lasts :pac:


  • Registered Users, Registered Users 2 Posts: 6,843 ✭✭✭knucklehead6


    For now :) Enjoy it while it lasts :pac:

    Weird, cos I've been demoted to reg user! And my grumpy baby avatar is AWOL.

    Ah well, I'm only a hosted mod, not like all you real mods. :-(


  • Registered Users, Registered Users 2 Posts: 8,584 ✭✭✭TouchingVirus


    Your avatar is present and accounted for. I also see your HMOD status


  • Registered Users, Registered Users 2 Posts: 6,843 ✭✭✭knucklehead6


    Your avatar is present and accounted for. I also see your HMOD status

    Must be my phone so. Feckin iPhones!!


  • Registered Users, Registered Users 2 Posts: 44,080 ✭✭✭✭Micky Dolenz


    Your avatar is present and accounted for. I also see your HMOD status


    Stop messing with his head.


  • Registered Users, Registered Users 2 Posts: 8,584 ✭✭✭TouchingVirus


    Stop messing with his head.

    I'm expecting you to steal his avatar any moment now


  • Registered Users, Registered Users 2 Posts: 6,843 ✭✭✭knucklehead6


    I'm expecting you to steal his avatar any moment now

    I'm getting onto the mods to report you all for bullying! ;-)


  • Advertisement
  • Moderators, Entertainment Moderators, Politics Moderators Posts: 14,535 Mod ✭✭✭✭johnnyskeleton


    Any chance that important announcements could be made available on the mobile site as well as the full site?


  • Registered Users, Registered Users 2 Posts: 17,875 ✭✭✭✭MugMugs


    Stop messing with his head.

    Heheh...... Messing with his head.....



    I'll get me coat.


  • Registered Users, Registered Users 2 Posts: 12,438 ✭✭✭✭El Guapo!


    We are all open minded here.

    Good on you.

    I'm fabulous!!!


  • Moderators, Society & Culture Moderators Posts: 24,420 Mod ✭✭✭✭robindch


    Are mods the only ones allowed post?
    Suits me.


  • Moderators, Technology & Internet Moderators Posts: 4,621 Mod ✭✭✭✭Mr. G


    Are mods the only ones allowed post?

    Yep, the announcement said mods and verified reps.


  • Registered Users, Registered Users 2 Posts: 6,871 ✭✭✭CrowdedHouse


    Back ?

    Seven Worlds will Collide



  • Registered Users, Registered Users 2 Posts: 8,584 ✭✭✭TouchingVirus


    Back ?

    Yes. Pats on the back all round for the sysadmins & developers - some effort :)


  • Moderators, Regional East Moderators Posts: 23,231 Mod ✭✭✭✭GLaDOS


    Oh look, Dav's getting all the thanks again. What a coincidence.

    Cake, and grief counseling, will be available at the conclusion of the test



  • Subscribers Posts: 342 ✭✭NicsM


    Fair play Chris and Colm, that was the last thing you needed on a Tuesday :)


  • Advertisement
  • Subscribers Posts: 32,855 ✭✭✭✭5starpool


    A private forum I post in still isn't available for posting, yet another private one I'm a member of is. Are there still glitches in the matrix?

    Edit: Private forum working ok now - Thanks for getting it all back up and running!


  • Registered Users, Registered Users 2 Posts: 17,797 ✭✭✭✭hatrickpatrick


    Jaysus lads, I thought I'd been sitebanned without explanation.
    Given that I do things on a daily basis which would definitely justify such a ban, this incident unnerved me considerably. :D


  • Registered Users, Registered Users 2 Posts: 3,745 ✭✭✭laugh


    Do you just have one big unsharded schema?

    How many read DBs do you guys use?


  • Closed Accounts Posts: 8,840 ✭✭✭Dav


    Tonight's kudos go to Chris, Colm, Conor and Alvis.

    We're flipping all the switches back to where they were this morning and we'll continue to monitor if for the next couple of hours.

    Our servers sit in Digiweb in Blanchardstown, there is no way I could have poured anything on them and I don't drink coffee :p


  • Posts: 0 [Deleted User]


    Seriously bad luck to have both disks in a RAID 1 fail at the same time. :(

    Well done for getting everything up and running again.


  • Registered Users, Registered Users 2 Posts: 21,730 ✭✭✭✭entropi


    GJ guys! Managed to work hard at it again to return us back to relative normality. Kudos :)


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 14,009 ✭✭✭✭wnolan1992


    Well done to the tech team again. Certainly earning their paycheques this week. :pac:


    FYI, the Talk To... fora have reverted to the old style instead of the new swanky style.


  • Registered Users, Registered Users 2 Posts: 17,399 ✭✭✭✭r3nu4l


    Dav wrote: »
    ...and I don't drink coffee :p
    Yeah, I'm gonna have to ask you to hand back your nerd badge. Sorry it had to come to this :(

    :pac:


    Fair play to one and all involved. Thanks for the hard work and effort :)


  • Registered Users, Registered Users 2 Posts: 51,054 ✭✭✭✭Professey Chin


    Good work guys :)
    Horrible luck with the disks but nice to be back!


  • Registered Users, Registered Users 2 Posts: 33,664 ✭✭✭✭Princess Consuela Bananahammock


    Dav wrote: »
    You go all year with no outages...

    What, 14 days?

    Everything I don't like is either woke or fascist - possibly both - pick one.



  • Moderators, Technology & Internet Moderators Posts: 4,621 Mod ✭✭✭✭Mr. G


    In fairness it was unexpected very rare for both to disks to fail. Fair play for getting it all back up and running.


  • Advertisement
  • Moderators, Technology & Internet Moderators Posts: 4,621 Mod ✭✭✭✭Mr. G


    entropi wrote: »
    GJ guys! Managed to work hard at it again to return us back to relative normality. Kudos :)

    Here's for another Sheldon pic :D

    sheldon-cooper-7.jpg


  • Moderators, Motoring & Transport Moderators Posts: 6,522 Mod ✭✭✭✭Irish Steve


    Dav wrote: »
    You go all year with no outages and then 2 come along within 10 days.

    So what happened today?

    First of all, it wasn't me! :D
    Ya reckon?:D:D

    It was all those 40 million messages being put into one bucket last week.....

    polished all the oxide off the surface of the discs, it was only a matter of time.....:P

    Seriously, that was not good news, though it makes the MTBF concept of new discs a little challenging, they're not supposed to fail quite that close together. Any danger that it was power supply related, rather than mechanical, as that could take several drives out at the same time.

    Whatever, well done to get it back that quickly.

    Shore, if it was easy, everybody would be doin it.😁



  • Registered Users, Registered Users 2 Posts: 9,529 ✭✭✭irishgeo


    Are boards not using SSD drives?


  • Subscribers Posts: 4,076 ✭✭✭IRLConor


    laugh wrote: »
    Do you just have one big unsharded schema?

    I don't know about now, but as of 2 years ago sharding the boards.ie database would have been hilariously difficult to do. Pretty much every page served joined against the post table which accounts for the majority of the data. It's quite tricky to identify an axis along which the post table could be efficiently sharded without either rewriting large swaths of the code or creating maintenance nightmares.

    Ross and I learned a lot about sharding the data when Ross was building the search system and that was a much simpler schema, with no joins and no legacy code to convert.
    laugh wrote: »
    How many read DBs do you guys use?

    Two years ago it was one master and two slaves.


  • Registered Users, Registered Users 2 Posts: 20,830 ✭✭✭✭Taltos


    Hi guys.

    When someone gets a chance can you please re-open the "Separation & Divorce" forum? Currently marked as closed.

    Cheers.


  • Registered Users, Registered Users 2 Posts: 4,774 ✭✭✭cython


    Karsini wrote: »
    Seriously bad luck to have both disks in a RAID 1 fail at the same time. :(

    Well done for getting everything up and running again.

    Definitely. I presume that the possibility of a controller issue resulting in an earlier failure somehow not being reported has been ruled out? I've seen a lot stranger happen with RAID 1, to be fair, such as one of the disks being weeks out of date and suddenly being switched over to as the read source. It resulted in the (temporary) apparent loss of all data entered in the meantime until it could be identified that the disks had been out of sync, and the up to date one was still working, just not in use.


  • Registered Users, Registered Users 2 Posts: 1,012 ✭✭✭route66


    Dav wrote: »
    One of the database slaves had a major failure with it's hard disks. Before anyone asks, yes they were in RAID (1 to be exact), but both disks failed. It's rare that your redundancy fails at the same time as the main device, but not unheard of.

    For both disks to fail at the same time would be - I guess - a "winning the lotto" type chance.

    More common would be a failed shared component - a backplane, a disk controller, a cable, etc. If this is the case, then the failure may come back. :eek:

    Another common scenario with RAID 1 is that one disk (or bank of disks) fails, goes unnoticed/unreported, then the other disk fails - BANG!

    Must go now and check my Lotto numbers ;)


  • Boards.ie Employee Posts: 12,597 ✭✭✭✭✭Boards.ie: Niamh
    Boards.ie Community Manager


    Taltos wrote: »
    Hi guys.

    When someone gets a chance can you please re-open the "Separation & Divorce" forum? Currently marked as closed.

    Cheers.
    Alvis has re-opened that now :)


  • Registered Users, Registered Users 2 Posts: 18,798 ✭✭✭✭kippy


    Mr. G wrote: »
    In fairness it was unexpected very rare for both to disks to fail. Fair play for getting it all back up and running.

    They usually don't all right.
    What tends to happen is one disk fails...........there is more pressure then on the other disk and that fails also within a shorter enough period of time, so it's critical to know that one disk failed as soon as possible in order to replace it before things get more awkward!
    Been caught like that myself in the past on a RAID 5.

    Well done on sorting it.


  • Registered Users, Registered Users 2 Posts: 1,012 ✭✭✭route66


    kippy wrote: »
    They usually don't all right.
    What tends to happen is one disk fails...........there is more pressure then on the other disk and that fails also within a shorter enough period of time, so it's critical to know that one disk failed as soon as possible in order to replace it before things get more awkward!
    Been caught like that myself in the past on a RAID 5.

    Well done on sorting it.

    With RAID 1, if a disk or bank of disks fail, the remaining healthy one(s) just continue to do their normal work; the extra copy of data just doesn't get written anywhere.

    The exception is read activity on a RAID 1 setup: many make use of both sides of the setup to reduce read time. If there is a failure, then this extra efficiency is no longer available but I would expect this to just increase read time rather than causing the remaining healthy one(s) to die!

    RAID 5 is completely different with data and parity data being written across all disks in the array.


  • Advertisement
Advertisement