Bioinformatics question: How much CPU grunt would it take to do a genome?

Shtanto · 20-03-2013 5:47pm #1

I was at a good lecture last night about how SAP HANA is able to process Big Data quicker than traditional database models thanks to its in-memory approach. Got me to thinking. In terms of converting DNA from base pairs to codons, how much time would it take if we could compute 1 codon every second? Now we'll say there's 3 billion base pairs so that's 1 billion codons to guess at converting. Reference tables here: http://en.wikipedia.org/wiki/Genetic_code

I was at the hospital there yesterday afternoon, thinking how many CPU cores you'd need to 'number crunch' a genome in a sensible amount of time, say 10 minutes (600 seconds). Start with a 6 core processor at 2.4Ghz. We'll bring in the RAM question later.

Capt'n Midnight · 20-03-2013 7:46pm

It depends on what size DNA fragments you are trying to assemble together.

Smaller pieces mean it's harder to do the jigsaw

opensource project

http://seqbarracuda.sourceforge.net/

BarraCUDA can align a paired-end library containing 14 million pairs of 76bp reads to the Human genome in about 27 minutes (from fastq files to SAM alignment) using a £380 NVIDIA Geforce GTX 680*. The alignment throughput can be boosted further by using multiple GPUs (up to 8) at the same time.

There are also ASIC and FPGA sequencers

Shtanto · 21-03-2013 2:16pm

Cool - this'll be just the thing for my parallelisation class of a Wednesday. We're due to start CUDA after Easter. Thank you

I suppose fragment size speaks the course or fine grained approaches to parallelisation. Good to see it's been tackled

Moore's law will probably break before 2020 at this rate, unless we start moving to a type of computer that uses something other than transistors. Still, I have to wonder if we'll ever have a processor that's fast enough to do everything we can humanly require before we notice something taking a while. 27 minutes is great. I remember the SAP HANA man talking about how the speedup was so significant that folks lost the luxury of a coffee break they used to have from the old waiting around part. In silico testing will need a lot of processor grunt. It'll negate the need for test animals though, so it's very worthwhile.

Capt'n Midnight · 21-03-2013 3:38pm

Shtanto wrote: »

Moore's law will probably break before 2020 at this rate, unless we start moving to a type of computer that uses something other than transistors. Still, I have to wonder if we'll ever have a processor that's fast enough to do everything we can humanly require before we notice something taking a while. 27 minutes is great. I remember the SAP HANA man talking about how the speedup was so significant that folks lost the luxury of a coffee break they used to have from the old waiting around part.

Roll the clock back even further and people used to have a week before getting the response to the post. Then the fax machine was invented and there'd be queries the same day.

As for Moore's law - International Technology Roadmap for Semiconductors
http://www.itrs.net/Links/2012ITRS/Home2012.htm
http://www.itrs.net/Links/2010ITRS/IRC-ITRS-MtM-v2%203.pdf - more than Moore

http://www.itrs.net/Links/2012ITRS/2012Tables/ORTC_2012Tables.xlsm
2012_ORTC_2C table
State of the art from 22nm in 2011 to 8nm by 2022
Cost/performance from 1,102 in 2011 to 35,336 in 2026
Transistor density from 768M/cm2 to 32,179

One of the tricks is to make some repetitive chips from smaller pieces of silicon and bond them together. This means you can get a much higher yield from your 22nm stuff since an imperfection ruins less of them you can then bond it to a 32nm base at your existing plants (reducing capital costs , It can cost $10Bn to make a next generation fab plant )

The industry has been using clever tricks like this to keep moving forward,

All this is for silicon.

If they get DNA or other bio-molecules working it could shift up a gear or two.

Bioinformatics question: How much CPU grunt would it take to do a genome?

Comments