BioPerl at GMOD Meeting 2010

BioPerl developers and users attended the BioPerl satellite meeting on January 13th, just prior to the GMOD Meeting.  Several items were covered on the agenda:

  • In order to start addressing whole genome data with more lightweight objects, we are planning on setting up a lightweight Bio::SeqI object that has a flexible DB backend (i.e. Bio::DB::SeqFeature::Store or similar).  We are also contemplating adding lazy parsing for some parsers, possibly using the Bio::PullParserI methods (or similar) that Sendu Bala created.
  • After a final  1.6 branch point release, we may ‘freeze’ BioPerl in a maintenance mode, primarily so that we can reorganize core into several more easily installed subdistributions on a branch.  New modules will essentially be additional separate repos that will depend on BioPerl core.  This reorganization has been discussed for a few years now, and as we edge closer to starting this (probably this spring) we’ll announce more details.
  • Some initial thoughts on how to handle circular genomes more efficiently.  We essentially do this already, but it isn’t full-proof.
  • Need some significant time dedicated towards GFF3-based coding (reimplement FeatureIO but allow some flexibility).  Rob Buels had started the initial run at splitting out FeatureIO, so next step is a true reimplementation.
  • We don’t plan on including Moose support for the immediate future, feeling that it would be better to reimplement some of the classes from scratch using Moose and similar as a BioPerl 2.0, or possibly await the impending Rakudo Perl 6 alpha and start afresh using that instead of Moose.

Anything we missed?  Anything you would like to address?  Please add comments and we’ll discuss them on list.

[Read More]

BOSC 2010 Request for Input

BOSC 2010 is currently in the planning stages. It will be held for 2 days in conjunction with the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2010) in Boston, Massachusetts, USA. The dates of BOSC 2010 are July 9-10; the main ISMB Conference runs July 11-13, 2010.  The BOSC 2010 web site can be accessed here:  /wiki/BOSC_2010.

The BOSC organizing committee is soliciting input on the planning of BOSC 2010 so that we can make it a successful and productive conference for the O|B|F community.  You may send your suggestions to the bosc@open-bio.org e-mail address  or add suggestions to the BOSC 2010 talk/discussion wiki page at: /wiki/Talk:BOSC_2010. Please respond to any or all of the questions below:

[Read More]

Sanger FASTQ format and the Solexa/Illumina variants

I’m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects:

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Peter J. A. Cock ( Biopython), Christopher J. Fields ( BioPerl), Naohisa Goto ( BioRuby), Michael L. Heuer ( BioJava) and Peter M. Rice ( EMBOSS). Nucleic Acids Research, doi:10.1093/nar/gkp1137

This will hopefully serve as a reference describing the original standard Sanger FASTQ, and the two variants from Solexa/Illumina, and how to inter-convert between them.

[Read More]

BioPerl interview in latest FLOSS Weekly

Two of the core BioPerl developers, Jason Stajich and Chris Fields, were interviewed by FLOSS Weekly.  The interview is now available as an MP3 on the FLOSS Weekly website; several streaming versions (including podcast) are also available.

BioPerl 1.6.1 released

We are pleased to announce the immediate availability of BioPerl 1.6.1, the latest release of BioPerl’s core code. You can grab it here:

Via CPAN:

http://search.cpan.org/~cjfields/BioPerl-1.6.1/

Via the BioPerl website:

http://bioperl.org/DIST/BioPerl-1.6.1.tar.bz2 http://bioperl.org/DIST/BioPerl-1.6.1.tar.gz http://bioperl.org/DIST/BioPerl-1.6.1.zip

The PPM for Windows should also finally be available this week, ActivePerl problems permitting (we will post more information when it becomes available).

Tons of bug fixes and changes have been incorporated into this release. For a more complete change list please see the ‘Changes’ file included with the distribution.

[Read More]

Working with FASTQ files in Biopython when speed matters

Biopython 1.51 onward includes support for Sanger, Solexa and Illumina 1.3+ FASTQ files in Bio.SeqIO, which allows a lot of neat tricks very concisely. For example, the tutorial ( PDF) has examples finding and removing primer or adaptor sequences.

However, because the Bio.SeqIO interface revolves around SeqRecord objects there is often a speed penalty. For example for FASTQ files, the quality string gets turned into a list of integers on parsing, and then re-encoded back to ASCII on writing.

[Read More]

Biopython CVS to git migration

The release of Biopython 1.52 earlier this week marked the end of an era, it was our last release using CVS for source code control.

As of now, Biopython is using a git repository, hosted on github.com who kindly provide git hosting for open source projects free of charge. The BioRuby project have been using github for some time, so we are in good company.

Our existing OBF hosted CVS repository will be maintained in the short to medium term as a backup, but will not be updated.

[Read More]

BioRuby 1.3.1 released

We are pleased to announce the release of BioRuby 1.3.1. This new release fixes many bugs existed in 1.3.0.

Here is a brief summary of changes.

  • Refactoring of BioSQL support.
  • Bio::PubMed bug fixes.
  • Bio::NCBI::REST bug fixes.
  • Bio::GCG::Msf bug fixes.
  • Bio::Fasta::Report bug fixes and added support for multiple query sequences.
  • Bio::Sim4::Report bug fixes.
  • Added unit tests for Bio::GCG::Msf and Bio::Sim4::Report.
  • License of BioRuby is clarified.

In addition, many changes have been made, mainly bug fixes. For more information, you can see ChangeLog.

[Read More]

Biopython 1.51 released

We are pleased to announce the release of Biopython 1.51.This new stable release enhances version 1.50 (released in April) by extending the functionality of existing modules, adding a set of application wrappers for popular alignment programs and fixing a number of minor bugs.

In particular, the SeqIO module can now write Genbank files that include features, and deal with FASTQ files created by Illumina 1.3+. Support for this format allows interconversion between FASTQ files using Solexa, Sanger and Ilumina variants using conventions agreed upon with the BioPerl and EMBOSS projects.

[Read More]