Biopython 1.74 released

Dear Biopythoneers,

Biopython 1.74 has been released and is available from our website and PyPI.

This release of Biopython supports Python 2.7, 3.4, 3.5, 3.6 and 3.7. However, it will be the last release to support Python 3.4 which is now at end-of-life. It has also been tested on PyPy2.7 v6.0.0 and PyPy3.5 v6.0.0.

(Please note we will be dropping support for Python 2.7 in early 2020.)

Over half our code is now explicitly available under either our original “Biopython License Agreement”, or the very similar but more commonly used “3-Clause BSD License”. See the LICENSE.rst file for more details.

[Read More]

Chromosome Diagrams in Biopython

One of the new things coming in Biopython 1.59 is improved chromosome diagrams, something you may have seen via Twitter. I’ve just been updating the Biopython Tutorial (current version here, PDF) to include an example drawing this:

tRNA genes in Arabidopsis thaliana

Here’s a PDF version too. This example just parses the Arabidopsis thaliana GenBank files to get the chromosome lengths and the tRNA gene placements. There are so many tRNA on the forward strand of Chr I that their labels are forced to overlap. Here the figure just uses a different color for each chromosome, but you can color each feature individually.

[Read More]

Introduction of OpenID logins for OBF wikis

Due to a huge influx of spam across all OBF wikis, we are in the process of locking down new user account creation and adding OpenID logins for the OBF wikis (BioPerl example). User account creation via the old login system will be disabled and OpenID will be the default path for new accounts so users to make wiki changes.  This currently appears to have cut the incidence of spam significantly.  We will be adding information to the login pages to redirect new users to the new login page.

[Read More]

BioRuby paper published

After 10 years of development, the BioRuby paper is finally published in the Bioinformatics journal.  The article is open access, so please take a look.

BioRuby: Bioinformatics software for the Ruby programming language Naohisa Goto, Pjotr Prins, Mitsuteru Nakao, Raoul Bonnal, Jan Aerts and Toshiaki Katayama Bioinformatics 2010; doi: 10.1093/bioinformatics/btq475

BioPerl has moved to GitHub

BioPerl has migrated to git and GitHub!  We have also set up a mirror set of several key repositories at the great public git hosting site repo.or.cz.

If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us know your user ID.  Also, add the extra email (where ‘DEVNAME’ is your original Subversion account ID).  This should map any previous commits from the older Subversion and CVS repository to your new GitHub account.

[Read More]

Illumina FASTQ files - Read Segment Quality Control Indicator

In another quirk to the FASTQ story, recent Illumina FASTQ files don’t actually use the full range of PHRED scores - and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as ‘B’).

Hats off to Dr Torsten Seemann for raising awareness of this issue in his post on the seqanswers.com forum, referring to a presentation by Tobias Mann of Illumina which says:

The Read Segment Quality Control Indicator:

[Read More]

Making Biopython SeqIO and AlignIO easier

One of the small changes coming in Biopython 1.54 (which you can try out already using the Biopython 1.54 beta) is to Bio.SeqIO and Bio.AlignIO. Previously the input and output functions had required file handles, but they will now also accept filenames.

This is a case of practicality beats purity (to quote the Zen of Python), and is particularly handy when doing very short scripts or working at the Python prompt.

For example, filtering a FASTA file to take only entries with a minimum length of 100 can be done like this (with handles):

[Read More]

Sanger FASTQ format and the Solexa/Illumina variants

I’m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects:

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Peter J. A. Cock ( Biopython), Christopher J. Fields ( BioPerl), Naohisa Goto ( BioRuby), Michael L. Heuer ( BioJava) and Peter M. Rice ( EMBOSS). Nucleic Acids Research, doi:10.1093/nar/gkp1137

This will hopefully serve as a reference describing the original standard Sanger FASTQ, and the two variants from Solexa/Illumina, and how to inter-convert between them.

[Read More]

BioPerl core 1.6.1 PPM available

BioPerl 1.6.1 is now available for ActivePerl as a PPM, instructions for downloading can be found on the BioPerl wiki. This has been tested only for ActivePerl 5.10 and above, so any feedback with older versions of BioPerl would be greatly appreciated.