Illumina FASTQ files - Read Segment Quality Control Indicator

In another quirk to the FASTQ story, recent Illumina FASTQ files don’t actually use the full range of PHRED scores - and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as ‘B’). Hats off to Dr Torsten Seemann for raising awareness of this issue in his post on the seqanswers.com forum, referring to a presentation by Tobias Mann of Illumina which says: [Read More]

BioRuby 1.4.0 released

We are pleased to announce the release of BioRuby 1.4.0. This new release contains many new features, in addition to bug fixes and improvements. PhyloXML support: Support for reading and writing PhyloXML file format is added, developed by Diana Jaunzeikare, mentored by Christian M Zmasek and co-mentors, supported by Google Summer of Code 2009 in collaboration with the National Evolutionary Synthesis Center (NESCent). FASTQ file format support: Support for reading and writing FASTQ file format is added. [Read More]

Sanger FASTQ format and the Solexa/Illumina variants

I’m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects: The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Peter J. A. Cock ( Biopython), Christopher J. Fields ( BioPerl), Naohisa Goto ( BioRuby), Michael L. Heuer ( BioJava) and Peter M. Rice ( EMBOSS). Nucleic Acids Research, doi:10. [Read More]

BioPerl 1.6.1 released

We are pleased to announce the immediate availability of BioPerl 1.6.1, the latest release of BioPerl’s core code. You can grab it here: Via CPAN: http://search.cpan.org/~cjfields/BioPerl-1.6.1/ Via the BioPerl website: http://bioperl.org/DIST/BioPerl-1.6.1.tar.bz2 http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz http://bioperl.org/DIST/BioPerl-1.6.1.zip The PPM for Windows should also finally be available this week, ActivePerl problems permitting (we will post more information when it becomes available). Tons of bug fixes and changes have been incorporated into this release. For a more complete change list please see the ‘Changes’ file included with the distribution. [Read More]

Working with FASTQ files in Biopython when speed matters

Biopython 1.51 onward includes support for Sanger, Solexa and Illumina 1.3+ FASTQ files in Bio.SeqIO, which allows a lot of neat tricks very concisely. For example, the tutorial ( PDF) has examples finding and removing primer or adaptor sequences. However, because the Bio.SeqIO interface revolves around SeqRecord objects there is often a speed penalty. For example for FASTQ files, the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing. [Read More]

Simpler, optimized format conversion with Biopython

As per Peter’s recent post we are using this space to show of a couple of the new features in Biopython 1.52 before it is released. In this post we’ll look at the new convert() function that both Bio.SeqIO and Bio.AlignIO will get in Biopython 1.52. No one has ever complained that bioinformatics just doesn’t have enough file formats - you probably frequently find yourself converting sequence files to suit particular applications with Bio. [Read More]

Indexing sequence files with Biopython

The forthcoming release of Biopython 1.52 will include a couple of nice improvements to the Bio.SeqIO module, and here we’re going to introduce the new index function. This will of course be covered in the Biopython Tutorial & Cookbook ( PDF) once this code is released. Suppose you have a large sequence file with many many individual sequences in it. This could be next generation sequence data for example, maybe a FASTQ, FASTA or QUAL file. [Read More]

Biopython 1.51 released

We are pleased to announce the release of Biopython 1.51.This new stable release enhances version 1.50 (released in April) by extending the functionality of existing modules, adding a set of application wrappers for popular alignment programs and fixing a number of minor bugs. In particular, the SeqIO module can now write Genbank files that include features, and deal with FASTQ files created by Illumina 1.3+. Support for this format allows interconversion between FASTQ files using Solexa, Sanger and Ilumina variants using conventions agreed upon with the BioPerl and EMBOSS projects. [Read More]

Biopython 1.51 beta released

A beta release for Biopython 1.51 is now available for download and testing. In the two months since Biopython 1.50 was released, we have introduced support for writing features in GenBank files using Bio.SeqIO, extended SeqIO’s support for the FASTQ format to include files created by Illumina 1.3+, and added a new set of application wrappers for alignment programs, and made numerous tweaks and bug fixes. All the new features have been tested by the dev team but it’s possible there are cases that we haven’t been able to foresee and test, especially for the GenBank feature writer (as there as just so many possible odd fuzzy feature locations). [Read More]