Obf-Projects

Working with FASTQ files in Biopython when speed matters

Posted on September 25, 2009 | peterc

Biopython 1.51 onward includes support for Sanger, Solexa and Illumina 1.3+ FASTQ files in Bio.SeqIO, which allows a lot of neat tricks very concisely. For example, the tutorial ( PDF) has examples finding and removing primer or adaptor sequences.

However, because the Bio.SeqIO interface revolves around SeqRecord objects there is often a speed penalty. For example for FASTQ files, the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing.

[Read More]

Biopython CVS to git migration

Posted on September 24, 2009 | peterc

The release of Biopython 1.52 earlier this week marked the end of an era, it was our last release using CVS for source code control.

As of now, Biopython is using a git repository, hosted on github.com who kindly provide git hosting for open source projects free of charge. The BioRuby project have been using github for some time, so we are in good company.

Our existing OBF hosted CVS repository will be maintained in the short to medium term as a backup, but will not be updated.

[Read More]

Biopython 1.52 released

Posted on September 22, 2009 | davidw

We are pleased to announce the availability of Biopython 1.52, a new stable release of the Biopython library.

It may only have been one month since the last release, but in that time we’ve added enough useful features to warrant a new release. In particular, Biopython 1.52 includes more substantial support for population genetics, and adds new functions that will be useful for people working with next generation sequencing.

Tiago Antao’s work on the Population Genetics module brings a command line wrapper for GenePop which allows the estimation of F-statistics, null allele frequencies and migration rates as well as tests for isolation by distance (IBD) and deviation from Hardy-Weinberg equilibrium.

[Read More]

Simpler, optimized format conversion with Biopython

Posted on September 22, 2009 | davidw

As per Peter’s recent post we are using this space to show of a couple of the new features in Biopython 1.52 before it is released. In this post we’ll look at the new convert() function that both Bio.SeqIO and Bio.AlignIO will get in Biopython 1.52.

No one has ever complained that bioinformatics just doesn’t have enough file formats - you probably frequently find yourself converting sequence files to suit particular applications with Bio.SeqIO. At the moment this is usually a two step process, something like this:

[Read More]

Indexing sequence files with Biopython

Posted on September 21, 2009 | peterc

The forthcoming release of Biopython 1.52 will include a couple of nice improvements to the Bio.SeqIO module, and here we’re going to introduce the new index function. This will of course be covered in the Biopython Tutorial & Cookbook ( PDF) once this code is released.

Suppose you have a large sequence file with many many individual sequences in it. This could be next generation sequence data for example, maybe a FASTQ, FASTA or QUAL file. Or, it might be a big annotation rich file, such as the whole of UniProt, or a chunk of GenBank.

[Read More]

BioRuby 1.3.1 released

Posted on September 2, 2009 | NaohisaGoto

We are pleased to announce the release of BioRuby 1.3.1. This new release fixes many bugs existed in 1.3.0.

Here is a brief summary of changes.

Refactoring of BioSQL support.
Bio::PubMed bug fixes.
Bio::NCBI::REST bug fixes.
Bio::GCG::Msf bug fixes.
Bio::Fasta::Report bug fixes and added support for multiple query sequences.
Bio::Sim4::Report bug fixes.
Added unit tests for Bio::GCG::Msf and Bio::Sim4::Report.
License of BioRuby is clarified.

In addition, many changes have been made, mainly bug fixes. For more information, you can see ChangeLog.

[Read More]

Biopython 1.51 released

Posted on August 17, 2009 | davidw

We are pleased to announce the release of Biopython 1.51.This new stable release enhances version 1.50 (released in April) by extending the functionality of existing modules, adding a set of application wrappers for popular alignment programs and fixing a number of minor bugs.

In particular, the SeqIO module can now write Genbank files that include features, and deal with FASTQ files created by Illumina 1.3+. Support for this format allows interconversion between FASTQ files using Solexa, Sanger and Ilumina variants using conventions agreed upon with the BioPerl and EMBOSS projects.

[Read More]

Biopython 1.51 beta released

Posted on June 23, 2009 | peterc

A beta release for Biopython 1.51 is now available for download and testing.

In the two months since Biopython 1.50 was released, we have introduced support for writing features in GenBank files using Bio.SeqIO, extended SeqIO’s support for the FASTQ format to include files created by Illumina 1.3+, and added a new set of application wrappers for alignment programs, and made numerous tweaks and bug fixes.

All the new features have been tested by the dev team but it’s possible there are cases that we haven’t been able to foresee and test, especially for the GenBank feature writer (as there as just so many possible odd fuzzy feature locations).

[Read More]

Biopython projects chosen for Google Summer of Code

Posted on April 27, 2009 | davidw

Congratulations to Nick Matzke and Eric Talevich who have had Biopython projects accepted for this year’s Google Summer of Code. Both projects were accepted as part of The National Evolutionary Synthesis Center’s (NESCent) involvement as a mentoring organisation with the program.

Nick will spend his summer working on modules that access locality data from biodiversity databases and incorporate this information in biogeographical and phylogenetic analyses ( Nick’s abstract) while Eric will be building a parser for the emerging PhyloXML format for storing and sharing phylogenetic trees ( Eric’s abstract).

[Read More]

Biopython release 1.50

Posted on April 20, 2009 | peterc

We are pleased to announce Biopython release 1.50, featuring some significant additions since Biopython 1.49 was released late last year.

GenomeDiagram by Leighton Pritchard has been integrated into Biopython as the Bio.Graphics.GenomeDiagram module.

A new module Bio.Motif has been added, which is intended to replace the existing Bio.AlignAce and Bio.MEME modules. Also have a look at Bio.SwissProt and Bio.ExPASy and their revised parsers.

As noted in a previous news posting, Bio.SeqIO can now read and write FASTQ and QUAL files used in second generation sequencing work. In connection with this, our SeqRecord object has a new dictionary attribute, letter_annotations, for per-letter-annotation information like sequence quality scores or secondary structure predictions. Also, the SeqRecord object can now be sliced to give a new SeqRecord covering just part of the sequence.

[Read More]