Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Bioinformatics question: How much CPU grunt would it take to do a genome?

Options
  • 20-03-2013 5:47pm
    #1
    Registered Users Posts: 179 ✭✭


    I was at a good lecture last night about how SAP HANA is able to process Big Data quicker than traditional database models thanks to its in-memory approach. Got me to thinking. In terms of converting DNA from base pairs to codons, how much time would it take if we could compute 1 codon every second? Now we'll say there's 3 billion base pairs so that's 1 billion codons to guess at converting. Reference tables here: http://en.wikipedia.org/wiki/Genetic_code

    I was at the hospital there yesterday afternoon, thinking how many CPU cores you'd need to 'number crunch' a genome in a sensible amount of time, say 10 minutes (600 seconds). Start with a 6 core processor at 2.4Ghz. We'll bring in the RAM question later.


Comments

  • Moderators, Recreation & Hobbies Moderators, Science, Health & Environment Moderators, Technology & Internet Moderators Posts: 91,543 Mod ✭✭✭✭Capt'n Midnight


    It depends on what size DNA fragments you are trying to assemble together.

    Smaller pieces mean it's harder to do the jigsaw

    opensource project :)
    http://seqbarracuda.sourceforge.net/
    BarraCUDA can align a paired-end library containing 14 million pairs of 76bp reads to the Human genome in about 27 minutes (from fastq files to SAM alignment) using a £380 NVIDIA Geforce GTX 680*. The alignment throughput can be boosted further by using multiple GPUs (up to 8) at the same time.

    There are also ASIC and FPGA sequencers


  • Registered Users Posts: 179 ✭✭Shtanto


    Cool - this'll be just the thing for my parallelisation class of a Wednesday. We're due to start CUDA after Easter. Thank you :)

    I suppose fragment size speaks the course or fine grained approaches to parallelisation. Good to see it's been tackled

    Moore's law will probably break before 2020 at this rate, unless we start moving to a type of computer that uses something other than transistors. Still, I have to wonder if we'll ever have a processor that's fast enough to do everything we can humanly require before we notice something taking a while. 27 minutes is great. I remember the SAP HANA man talking about how the speedup was so significant that folks lost the luxury of a coffee break they used to have from the old waiting around part. In silico testing will need a lot of processor grunt. It'll negate the need for test animals though, so it's very worthwhile.


  • Moderators, Recreation & Hobbies Moderators, Science, Health & Environment Moderators, Technology & Internet Moderators Posts: 91,543 Mod ✭✭✭✭Capt'n Midnight


    Shtanto wrote: »
    Moore's law will probably break before 2020 at this rate, unless we start moving to a type of computer that uses something other than transistors. Still, I have to wonder if we'll ever have a processor that's fast enough to do everything we can humanly require before we notice something taking a while. 27 minutes is great. I remember the SAP HANA man talking about how the speedup was so significant that folks lost the luxury of a coffee break they used to have from the old waiting around part.
    Roll the clock back even further and people used to have a week before getting the response to the post. Then the fax machine was invented and there'd be queries the same day.



    As for Moore's law - International Technology Roadmap for Semiconductors
    http://www.itrs.net/Links/2012ITRS/Home2012.htm
    http://www.itrs.net/Links/2010ITRS/IRC-ITRS-MtM-v2%203.pdf - more than Moore

    http://www.itrs.net/Links/2012ITRS/2012Tables/ORTC_2012Tables.xlsm
    2012_ORTC_2C table
    State of the art from 22nm in 2011 to 8nm by 2022
    Cost/performance from 1,102 in 2011 to 35,336 in 2026
    Transistor density from 768M/cm2 to 32,179

    One of the tricks is to make some repetitive chips from smaller pieces of silicon and bond them together. This means you can get a much higher yield from your 22nm stuff since an imperfection ruins less of them you can then bond it to a 32nm base at your existing plants (reducing capital costs , It can cost $10Bn to make a next generation fab plant )

    The industry has been using clever tricks like this to keep moving forward,


    All this is for silicon.

    If they get DNA or other bio-molecules working it could shift up a gear or two.


Advertisement