Development

Indexing sequence files with Biopython

Posted on September 21, 2009 | peterc

The forthcoming release of Biopython 1.52 will include a couple of nice improvements to the Bio.SeqIO module, and here we’re going to introduce the new index function. This will of course be covered in the Biopython Tutorial & Cookbook ( PDF) once this code is released.

Suppose you have a large sequence file with many many individual sequences in it. This could be next generation sequence data for example, maybe a FASTQ, FASTA or QUAL file. Or, it might be a big annotation rich file, such as the whole of UniProt, or a chunk of GenBank.

[Read More]

BioRuby 1.3.1 released

Posted on September 2, 2009 | NaohisaGoto

We are pleased to announce the release of BioRuby 1.3.1. This new release fixes many bugs existed in 1.3.0.

Here is a brief summary of changes.

Refactoring of BioSQL support.
Bio::PubMed bug fixes.
Bio::NCBI::REST bug fixes.
Bio::GCG::Msf bug fixes.
Bio::Fasta::Report bug fixes and added support for multiple query sequences.
Bio::Sim4::Report bug fixes.
Added unit tests for Bio::GCG::Msf and Bio::Sim4::Report.
License of BioRuby is clarified.

In addition, many changes have been made, mainly bug fixes. For more information, you can see ChangeLog.

[Read More]

Biopython 1.51 released

Posted on August 17, 2009 | davidw

We are pleased to announce the release of Biopython 1.51.This new stable release enhances version 1.50 (released in April) by extending the functionality of existing modules, adding a set of application wrappers for popular alignment programs and fixing a number of minor bugs.

In particular, the SeqIO module can now write Genbank files that include features, and deal with FASTQ files created by Illumina 1.3+. Support for this format allows interconversion between FASTQ files using Solexa, Sanger and Ilumina variants using conventions agreed upon with the BioPerl and EMBOSS projects.

[Read More]

Biopython 1.51 beta released

Posted on June 23, 2009 | peterc

A beta release for Biopython 1.51 is now available for download and testing.

In the two months since Biopython 1.50 was released, we have introduced support for writing features in GenBank files using Bio.SeqIO, extended SeqIO’s support for the FASTQ format to include files created by Illumina 1.3+, and added a new set of application wrappers for alignment programs, and made numerous tweaks and bug fixes.

All the new features have been tested by the dev team but it’s possible there are cases that we haven’t been able to foresee and test, especially for the GenBank feature writer (as there as just so many possible odd fuzzy feature locations).

[Read More]

Clever tricks with NCBI Entrez EInfo (& Biopython)

Posted on June 21, 2009 | peterc

Constructing complicated NCBI Entrez searches can be tricky, but it turns out one of the Entrez Programming Utilities called Entrez EInfo can help.

For example, suppose you want to search for mitochondrial genomes from a given taxa - either just in the Entrez web interface, for use in a script with EFetch.

I knew from past experience about using name[ORGN] in Entrez to search for an organism name - but how would you specify just mitochondria? I actually worked this out from the NCBI help and exploring the Entrez website’s advanced search - but it took a while.

[Read More]

Dropping Python 2.3 Support

Posted on May 6, 2009 | johnm

As announced here, any last minute requests to postpone dropping support for Python 2.3 from the next release of Biopython must be posted to the main Biopython mailing list no later than Friday, May 8.

Biopython release 1.50

Posted on April 20, 2009 | peterc

We are pleased to announce Biopython release 1.50, featuring some significant additions since Biopython 1.49 was released late last year.

GenomeDiagram by Leighton Pritchard has been integrated into Biopython as the Bio.Graphics.GenomeDiagram module.

A new module Bio.Motif has been added, which is intended to replace the existing Bio.AlignAce and Bio.MEME modules. Also have a look at Bio.SwissProt and Bio.ExPASy and their revised parsers.

As noted in a previous news posting, Bio.SeqIO can now read and write FASTQ and QUAL files used in second generation sequencing work. In connection with this, our SeqRecord object has a new dictionary attribute, letter_annotations, for per-letter-annotation information like sequence quality scores or secondary structure predictions. Also, the SeqRecord object can now be sliced to give a new SeqRecord covering just part of the sequence.

[Read More]

Biopython 1.50 beta released

Posted on April 3, 2009 | peterc

We are pleased to announce a beta release of Biopython 1.50 for public testing. There have been some significant changes since Biopython 1.49 was released late last year.

GenomeDiagram by Leighton Pritchard has been integrated into Biopython as the Bio.Graphics.GenomeDiagram module.

A new module Bio.Motif has been added, which is intended to replace the existing Bio.AlignAce and Bio.MEME modules. Also have a look at Bio.ExPASy and the revised Prosite and Enzyme parsers.

[Read More]

Biopython and next generation sequencing

Posted on March 26, 2009 | peterc

Those of you doing next generation sequencing may be pleased to know that the next release of Biopython is expected to include support for reading and writing FASTQ and QUAL files within our Bio.SeqIO interface. These formats are used for traditional Sanger capillary sequencing, and Roche 454 sequencing (Roche provide tools to convert from their binary SFF files) with PHRED quality scores. Solexa/Illumina sequencers produce a FASTQ variant where the quality scores are encoded differently, and this is also supported.

[Read More]

Biopython and version control systems

Posted on March 17, 2009 | peterc

Initially for evaluation purposes only, Giovanni and Bartek have setup a mirror of Biopython on GitHub, which is automatically updated from the OBF hosted Biopython CVS repository. See our git migration wiki page for details. If this is favorably received, then moving Biopython from CVS to git seems likely at some point this year.

Originally, all the OBF hosted projects used CVS for their source code repositories. At the start of 2008, BioPerl and BioJava moved over to Subversion (SVN), followed by BioSQL. Biopython was originally going to do the same, but this didn’t actually happen. Having all the Bio* projects using the same version control system would have simplified server administration for the OBF, but using SVN wouldn’t really have made that much difference to Biopython development. Discussion on the Biopython development mailing list has since shifted towards next-generation distributed version control systems like git or Bazaar.

[Read More]