BioPerl has moved to GitHub

BioPerl has migrated to git and GitHub!  We have also set up a mirror set of several key repositories at the great public git hosting site repo.or.cz.

If you are a current BioPerl developer (had a previous account for direct access to our prior Subversion repository), please sign up for a GitHub account and let us know your user ID.  Also, add the extra email (where ‘DEVNAME’ is your original Subversion account ID).  This should map any previous commits from the older Subversion and CVS repository to your new GitHub account.

[Read More]

O|B|F Google Summer of Code Accepted Students

I’m pleased to announce the acceptance of OBF’s 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors:

Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms

Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins

Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby

Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree

[Read More]

Illumina FASTQ files - Read Segment Quality Control Indicator

In another quirk to the FASTQ story, recent Illumina FASTQ files don’t actually use the full range of PHRED scores - and a score of 2 has a special meaning, The Read Segment Quality Control Indicator (RSQCI, encoded as ‘B’).

Hats off to Dr Torsten Seemann for raising awareness of this issue in his post on the seqanswers.com forum, referring to a presentation by Tobias Mann of Illumina which says:

The Read Segment Quality Control Indicator:

[Read More]

Reminder: BOSC Abstract Deadline April 15

Just a friendly reminder that abstracts for BOSC 2010 are due next Thursday, April 15.  See the BOSC web site at /wiki/BOSC_2010 for details.  Submissions will only be accepted electronically at http://events.open-bio.org/BOSC2010/openconf.php.

Graduate students, don’t forget we are offering $250 student travel awards this year. Be sure to check the box indicating that you are a graduate student to be considered for the award.

We are also pleased to announce that Guy Coates, Group leader of the Informatics Systems Group at the Wellcome Trust Sanger Institute, and Ross Gardler, Vice President of the Apache Software Foundation, will be giving keynote presentations at BOSC. http://www.sanger.ac.uk/

[Read More]

O|B|F in Google Summer of Code

The Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer’s Google Summer of Code.  Our list of project ideas and mentors is linked from the O|B|F GSoC page.

Student applications must be submitted to Google by April 9, 2010, see the official GSoC 2010 FAQ. That is less than three weeks away!

[Read More]

BOSC 2010 Call for Abstracts

**Abstract submissions for the 11th Annual Bioinformatics Open Source Conference (BOSC 2010) are now open.**At-a-glance BOSC is an ISMB 2010 Special Interest Group (SIG) Date: July 9-10, 2010 Location: Boston, Massachusetts, USA BOSC 2010 web site: /wiki/BOSC_2010 Abstract submission via Open Conference System site:  http://events.open-bio.org/BOSC2010/openconf.php E-mail: bosc@open-bio.org Bosc-announce list:  http://lists.open-bio.org/mailman/listinfo/bosc-announce Important Dates April 15: Abstract deadline May 5:  Notification of accepted abstracts May 28: Early Registration Discount Cut-off date July 8-9:  Codefest 2010 July 9-10: BOSC 2010 August 15:  Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics

[Read More]

BioPerl at GMOD Meeting 2010

BioPerl developers and users attended the BioPerl satellite meeting on January 13th, just prior to the GMOD Meeting.  Several items were covered on the agenda:

  • In order to start addressing whole genome data with more lightweight objects, we are planning on setting up a lightweight Bio::SeqI object that has a flexible DB backend (i.e. Bio::DB::SeqFeature::Store or similar).  We are also contemplating adding lazy parsing for some parsers, possibly using the Bio::PullParserI methods (or similar) that Sendu Bala created.
  • After a final  1.6 branch point release, we may ‘freeze’ BioPerl in a maintenance mode, primarily so that we can reorganize core into several more easily installed subdistributions on a branch.  New modules will essentially be additional separate repos that will depend on BioPerl core.  This reorganization has been discussed for a few years now, and as we edge closer to starting this (probably this spring) we’ll announce more details.
  • Some initial thoughts on how to handle circular genomes more efficiently.  We essentially do this already, but it isn’t full-proof.
  • Need some significant time dedicated towards GFF3-based coding (reimplement FeatureIO but allow some flexibility).  Rob Buels had started the initial run at splitting out FeatureIO, so next step is a true reimplementation.
  • We don’t plan on including Moose support for the immediate future, feeling that it would be better to reimplement some of the classes from scratch using Moose and similar as a BioPerl 2.0, or possibly await the impending Rakudo Perl 6 alpha and start afresh using that instead of Moose.

Anything we missed?  Anything you would like to address?  Please add comments and we’ll discuss them on list.

[Read More]

BOSC 2010 Request for Input

BOSC 2010 is currently in the planning stages. It will be held for 2 days in conjunction with the 18th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2010) in Boston, Massachusetts, USA. The dates of BOSC 2010 are July 9-10; the main ISMB Conference runs July 11-13, 2010.  The BOSC 2010 web site can be accessed here:  /wiki/BOSC_2010.

The BOSC organizing committee is soliciting input on the planning of BOSC 2010 so that we can make it a successful and productive conference for the O|B|F community.  You may send your suggestions to the bosc@open-bio.org e-mail address  or add suggestions to the BOSC 2010 talk/discussion wiki page at: /wiki/Talk:BOSC_2010. Please respond to any or all of the questions below:

[Read More]

Sanger FASTQ format and the Solexa/Illumina variants

I’m delighted to announce an open access publication in Nucleic Acids Research describing the FASTQ file format based on the conventions agreed by the OBF projects:

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Peter J. A. Cock ( Biopython), Christopher J. Fields ( BioPerl), Naohisa Goto ( BioRuby), Michael L. Heuer ( BioJava) and Peter M. Rice ( EMBOSS). Nucleic Acids Research, doi:10.1093/nar/gkp1137

This will hopefully serve as a reference describing the original standard Sanger FASTQ, and the two variants from Solexa/Illumina, and how to inter-convert between them.

[Read More]