Biopython 1.52 released

We are pleased to announce the availability of Biopython 1.52, a new stable release of the Biopython library.

It may only have been one month since the last release, but in that time we’ve added enough useful features to warrant a new release. In particular, Biopython 1.52 includes more substantial support for population genetics, and adds new functions that will be useful for people working with next generation sequencing.

Tiago Antao’s work on the Population Genetics module brings a command line wrapper for GenePop which allows the estimation of F-statistics, null allele frequencies and migration rates as well as tests for isolation by distance (IBD) and deviation from Hardy-Weinberg equilibrium.

[Read More]

Simpler, optimized format conversion with Biopython

As per Peter’s recent post we are using this space to show of a couple of the new features in Biopython 1.52 before it is released. In this post we’ll look at the new convert() function that both Bio.SeqIO and Bio.AlignIO will get in Biopython 1.52.

No one has ever complained that bioinformatics just doesn’t have enough file formats - you probably frequently find yourself converting sequence files to suit particular applications with Bio.SeqIO. At the moment this is usually a two step process, something like this:

[Read More]

Indexing sequence files with Biopython

The forthcoming release of Biopython 1.52 will include a couple of nice improvements to the Bio.SeqIO module, and here we’re going to introduce the new index function. This will of course be covered in the Biopython Tutorial & Cookbook ( PDF) once this code is released.

Suppose you have a large sequence file with many many individual sequences in it. This could be next generation sequence data for example, maybe a FASTQ, FASTA or QUAL file. Or, it might be a big annotation rich file, such as the whole of UniProt, or a chunk of GenBank.

[Read More]

BioRuby 1.3.1 released

We are pleased to announce the release of BioRuby 1.3.1. This new release fixes many bugs existed in 1.3.0.

Here is a brief summary of changes.

  • Refactoring of BioSQL support.
  • Bio::PubMed bug fixes.
  • Bio::NCBI::REST bug fixes.
  • Bio::GCG::Msf bug fixes.
  • Bio::Fasta::Report bug fixes and added support for multiple query sequences.
  • Bio::Sim4::Report bug fixes.
  • Added unit tests for Bio::GCG::Msf and Bio::Sim4::Report.
  • License of BioRuby is clarified.

In addition, many changes have been made, mainly bug fixes. For more information, you can see ChangeLog.

[Read More]

Biopython 1.51 released

We are pleased to announce the release of Biopython 1.51.This new stable release enhances version 1.50 (released in April) by extending the functionality of existing modules, adding a set of application wrappers for popular alignment programs and fixing a number of minor bugs.

In particular, the SeqIO module can now write Genbank files that include features, and deal with FASTQ files created by Illumina 1.3+. Support for this format allows interconversion between FASTQ files using Solexa, Sanger and Ilumina variants using conventions agreed upon with the BioPerl and EMBOSS projects.

[Read More]

Minutes:2009 BOSC Meeting

Note: this is preliminary and needs some editing.

OBF Business Meeting at BOSC 2009

  • Location: Rica Talk Hotel, Restaurant 2nd floor
  • Present:
    • current BoD members: Hilmar Lapp, Kam Dahlquist
    • Guests:

Called meeting to order at 7.05pm.

  • Explained business meeting purpose and distinction to conference call that will make decisions
  • Introductions: Hilmar, Kam, everyone else around the table (see guests)

Discussion:

  • OBF does a very good job in servicing its member project
  • could look at open source projects more globally; specifically could come up with an open source policy statement that is global
  • Kam: in agreement, we are overdue with coming up with a broader vision statement
    • should maybe open this up and do it in an open and transparent manner
  • Location of incorporation issue
    • Has not been pursued further since the last conference call.
    • Issue of being under US export restrictions, and the open question of whether we can legally accept membership applications from citizens of countries under US trade sanctions.
    • Incorporating in Canada would be an option. Generally, need to look carefully at the local laws as to what they permit in terms of transferring funds.
  • Open-source hosting portals: what are the benefits of the various sites?
    • Github: very much focused on the code, working with other people.
    • SourceForge: temporary inaccessibility problems and interface changes can be nasty, especially for web hosting. They also
  • Shouldn’t OBF take a more outspoken role in software development best practices, patterns, and standards?
    • OBF history has provided a forum for such practice recommendations to be promulgated and to foster the community to converge among common practices, but has otherwise remained neutral.
    • Is that maybe a role the member projects could fill out? For example, EMBOSS or Biojava have done this in their communities.
    • Some mistakes are often repeated. Maybe create a repository of organizational knowledge about practices that have proven efficient, and anti-patterns.
    • What makes a project an Open Bio project? Could formalize some recommendations based on empirical experiences and lessons learned.
    • We may be easily duplicating efforts - there are many other organizations and fields defining design patterns and recommended practices.
    • There are still many new people coming into Bioinformatics that come from a different background. They shouldn’t have to learn by the hard way how not to do things.
      • Once people are into it they really need to make the first step of looking beyond their own little sphere themselves.
      • However, training and education needs to be built into the undergrad and graduate curricula.
      • Can and should the OBF play a role in this, for example by compiling and providing resources, information, and tutorial material?
      • ISCB actually has an education committee, and it takes simply showing up to participate. Members come from any different perspectives and backgrounds. A subgroup (which seems invitation-only) is working on
      • Should OBF maybe initiate an open resource (as opposed to a purchased book) for education and training? Really needs a critical mass of people to push it forward.
      • OBF could provide a coordination or point of contact or counselor role to help potential authors retain copyright when they publish a book.
    • caBIG infratructure increasingly mature and required to be built upon or in a compliant way by NIH grant applicants.
    • Everyone reinventing the wheel continues to be an issue. Need to reach local partnerships between faculty members to break down further the “do it on our own attitude).
    • caBIG had to dole out money for “early adopters”.
    • Biologists can be highly biased purely due to the application domain.
  • People may have odd preconceptions to open-source projects solely because they haven’t looked at the supporting community yet.
  • Bioinformaticists often are trying to solve a specific problem, only to move on to something else immediately afterwards.
  • BOSC conference:
    • Should keep requiring abstract submissions (rather than just title, for example), but should not require full papers to remain sufficiently inclusive.
    • Tutorials could be held in a session concurrent with the main ISMB conference.
      • These parallel tracks are separate submissions to ISCB. Lonnie and Peter volunteer to work out a proposal for 2010.
  • OBF remit: Should keep existing, support its member projects, and promote open-source software among biologists.
  • Conference proceedings:
    • Open source software work

Biopython 1.51 beta released

A beta release for Biopython 1.51 is now available for download and testing.

In the two months since Biopython 1.50 was released, we have introduced support for writing features in GenBank files using Bio.SeqIO, extended SeqIO’s support for the FASTQ format to include files created by Illumina 1.3+, and added a new set of application wrappers for alignment programs, and made numerous tweaks and bug fixes.

All the new features have been tested by the dev team but it’s possible there are cases that we haven’t been able to foresee and test, especially for the GenBank feature writer (as there as just so many possible odd fuzzy feature locations).

[Read More]

Clever tricks with NCBI Entrez EInfo (& Biopython)

Constructing complicated NCBI Entrez searches can be tricky, but it turns out one of the Entrez Programming Utilities called Entrez EInfo can help.

For example, suppose you want to search for mitochondrial genomes from a given taxa - either just in the Entrez web interface, for use in a script with EFetch.

I knew from past experience about using name[ORGN] in Entrez to search for an organism name - but how would you specify just mitochondria? I actually worked this out from the NCBI help and exploring the Entrez website’s advanced search - but it took a while.

[Read More]

BOSC Update: Ruttenberg, Hanmer confirmed as Keynotes, Early Registration Deadline Friday

Alan Ruttenberg of Science Commons and Robert Hanmer of the Hillside Group have been confirmed as Keynote Speakers for BOSC 2009.  For more information, see the BOSC 2009 web site at /wiki/BOSC_2009.

Abstract acceptances went out today–stay tuned for the schedule, which will be posted once the speakers have confirmed their invitations.

The early registration deadline for BOSC is Friday, May 15; don’t forget to take advantage of the discounted fee for early registrants at http://www.iscb.org/ismbeccb2009/registration.php.

[Read More]