Posts

Showing posts from 2009

SHREC mines errors

SHREC: a short-read error correction method Bioinformatics doi:10.1093/bioinformatics/btp379 (2009) In their paper, the authors stress on the importance of read correction in sequencing applications -- such as resequencing and de novo assembly. They mention that error correction techniques for Sanger reads are outdated, and that novelty is required. The authors say that assemblers are better when there're no errors -- that they work well with error-corrected reads. This is incorrect, as both the resequencing and de novo assembly applications, allow consensus calling. The authors write that the Euler assembly program is 'established', meanwhile, nobody uses it. Over the Sanger reign, only a very few software were introduced for error correction in reads. Accordingly, the authors only cite two such works. They compare SHREC with the error-correction component of EULER-SR and ALLPATHS -- two assemblers -- on simulated and real data. The paper is fun to read, and the desc

Polish researchers elucidate genome assembly

Whole genome assembly from 454 sequencing output via modified DNA graph concept Computational Biology and Chemistry doi:10.1016/j.compbiolchem.2009.04.005 (2009) The human genome project was a scientific success which allowed bioinformatics to grow. During this project, only Sanger sequencing was in action. Recently, with pyrosequencing, however, new platforms are emerging, and they provide much more data at lower cost, in a few hours. This data storm has prompted the need for novel assembly algorithms. The authors provide a new computational framework for genome assembly -- SR-ASM (Short Reads ASseMbly). They utilize the 'recently available' Roche/454 technology, released in 2005. They evaluate their tool against Velvet and Newbler. Velvet is designed solely for Illumina (the authors say it runs on 454), whereas Newbler is sold along with the 454 sequencer from Roche. The authors say Newbler can not load fasta files, but it can. With this argument, they avoided the need

Beware of bioinformatics

The advent of computing in our world has made research endeavours easier. The whole field of bioinformatics is built on the mindset that once you have data, regardless of its quality, you can go on and publish the data-analysis recipe and associated observations. Why do computational biologists feel they have to tell everyone that their sequence contains specific sub-sequences? Even more alarming, unless you are publishing in a good journal -- like Nature -- a bioinformatic paper is very unlikely to undergo extensive copy-editing, and thus, will presumably be super-boring to read, unless you are an outstanding writer. Suppose that a person prepares bread with his own novel recipe (yeah, right -- a novel bread recipe). He mixes the ingredients rightly, then he put the mix in the oven, sets it up, waits, and gets his result. This result is void of any scientific value, just as most of the creepy crap published in the bioinformatics sphere is void of discoveries. One could tell that this

Updates on my assembler software

Image
I have been working very hard since December 10th on a novel algorithm for fragments assembly. I already registered a sourceforge open source project . I started with some of the ideas of Pevzner et al. 2001 , but I added several enhancements, and I modeled the problems with equations. My software is compatible with the amos specification and will be released on sourceforge upon acceptance in a journal. We are planning to submit our work to PNAS . We collected public data sets from the Short Read Archive to assess the performance of our software.