Bioperl 2005 Summary

I just wanted to use the end of the year as a chance to reflect on what we’ve accomplished in 2004 and think about what 2005 holds for Bioperl. List Message

What happened in 2004? First of all, this year has been really has been productive at a level perhaps only appreciated by the folks who read the bioperl-guts-l list which lists the CVS commits. New modules, bugfixes and code improvements have been steadily making their way into the codebase. Not only has there been lots of traffic, but more people are contributing code and fixes.

We have also seen increased contributions to the HOWTOs which we hope will be an effective place to explain how to use sets of modules to complete a particular task. We are continually working to improve the documentation. This is a balance between a developer trying to get something accomplished for their own research and wanting other people to use their code (and not wanting to field lots of emails about a particular module). Open source software written solely by volunteers suffers from a reward system which values code over documentation and writing tutorials. We welcome ideas on changes which would help this and are currently thinking about ways to reward the productive documenters as well as coders.

We had a chance to have a 5 day Bootcamp in June thanks to Sylvain Foisy, the University of Montreal and the Quebec Bioinformatics Network (BioneQ). We hope to do another one of these in 2006. If there is a general interest in more widespread Bioperl tutorials please forward them to myself or the bioperl list and we can consider how something like this could be organized in conjunction with a conference or meeting.

How popular is Bioperl? The 2002 paper has 60+ citations according to Web of Science and we’re seeing use in a broader context than just sequence analysis. At least one published paper about modules which were already part of the codebase has appeared suggesting software availability and collaboration can happen prior to publication. The website has been consistently gets around 300,000 hits per month which isn’t bad considering that the content doesn’t change very much and this is just a site for one toolkit for specific aspect of science. The bioperl-l mailing list has seen an average 341 mails per month (not correcting for spam) which has seen a lot of questions answered and ideas hashed out.

How can you help out? I want to use this chance to also appeal to those who use Bioperl and have been sitting on your hands waiting to jump in. It is a collaborative project that only works if new people jump in an contribute ideas and manpower. We’ve had many examples of people who have just jumped on board the project, fixed some bugs, contributed a module and went on their merry way. We’ve also had other people who have jumped in, contributed code, and found themselves fully engaged in the project and its internal workings almost immediately. Not to wax poetic, but it was about 5 years ago that fresh out of college, I started reading the mailing list, read Steve Chervitz’s email plea for people to “ask not what Bioperl can do for you, ask what you can do for Bioperl” and just jumped right in. I can only hope to influence some more folks who might have wanted to contribute but were waiting for the invitation. Well come on over, we’d love to have you taking part.

As for some specifics. - Parsing of Species information out from the ORGANISM lines in SwissProt, GenBank, and EMBL is pretty spotty and could take some work. - Some more parsers for formats that people have asked for - a Spidey parser (NCBI’s mRNA -> genomic alignment tool) - Work on the Structure modules for dealing with protein structure data - Integrate new applications into bioperl-run and further cleanup the existing modules so they are more consistent - Volunteer to be the next release master.

What does the future hold for Bioperl? We expect to have a 1.5 release of bioperl in 1st quarter of 2005 - this is the domain of Aaron Mackey who agreed to be the release master (who has his hands full right now, but I’m sure will ask for help when he needs it). This should incorporate many new modules and bug fixes but be compatible with the 1.4 API as well. Details on the schedule for 1.5 sometime after the holidays.

The future depends entirely on who steps up to work on the project next year. In 2005, I am resolving to limit myself from the front guard of mailing list question answering. This is in part finish my PhD research and focus on building more specific tools to support my research questions, but also it is time for other people to contribute and share the spotlight and be a know-it-all. Bioperl is very much a labor of love and it is an integral part of the tools I use in my own work so I expect to focus more directly on those things I need in the coming year and help out where I can.

My hope is that some of the new folks who have stepped up to contribute will help by continuing the course we have set to have high quality releases, a full test suite, POD documentation for every module, and overall documentation for using modules in HOWTOs and tutorials. If there are new or unexplored areas the project should consider I hope that you will speak up and suggest them.

There is discussion underfoot that a new Bioperl object model may be born. This has been called Bioperl2 and Bioperl-NG. The idea is it would try and create a leaner and cleaner code base which is does things like event-based parsing, autogenerated code for things like getters/setters, and could do things faster and easier than we are currently. Generally there is a lot of legacy code and legacy design in Bioperl and it would be beneficial to have a project that was free of these constraints. At the same time there is an expectation that a project like this would also need to achieve something more than what the current bioperl API cannot do so it incumbent on the new project to have goals that are higher than what Bioperl can do.

Thank you I’d like to finally thank some people who have done a lot this year. Of course I’m not going to remember to name everyone, but I just wanted to highlight some folks who have endeavored not only get the toolkit to do what they want, but also to help out other people get started with it.

The people who have kept the project going. These are usual suspects how have labored to do the dirty grunt work cleaning up boring bugs, adding documentation, preparing a release, keeping the servers going, etc. They also code too, but wanted to highlight that they have really been critical to keeping the project going by doing the things that most people don’t want to bother with. Brian Osborne Aaron Mackey Chris Dagdigian Kyle Jenson (mailing list and site searching at http://search.open-bio.org)

Some usual suspects who have been helping maintain their modules and generally being Bioperl knowledgeable on the list: Scott Cain Steve Chervitz Allen Day Donald Jackson Stefan Kirov Hilmar Lapp Josh Lauricha Heikki Lehvaslaiho Chris Mungall Jurgen Plentinckx Lincon Stein

There are new several people who have taken up the slack as those before them have drifted onto other commitments. (metaphoric slack of course, not trying to accuse anyone of being a ‘slacker’). Thanks for jumping in, fixing bugs, running tests, giving feedback, and just getting involved. It is really encouraging when the project can be a 2-way street and not just a one way flow information going out from a few people who post answers to the list. Richard Adams Sean Davis Rob Edwards Nathan Haigh Marc Logghe Barry Moore Remo Sanges James Thompson Koen van der Drift (Bioperl available via fink on OS X)

Thanks also to Peter van Heusden and Electric Genetics which are undertaking a code audit of Bioperl and should have many helpful feedback points for us.

I’ve probably forgotten some people, please post a followup if I have neglected someone as I would like you to be recognized for your work since we don’t give out a whole lot else right now.

A safe and prosperous New Year to you all.

Jason Stajich on behalf of the Bioperl core developers.