-

Difference between revisions of "Codefest 2010"

From Open Bioinformatics Foundation
Jump to: navigation, search
(Change inline use of O|B|F to OBF)
 
(67 intermediate revisions by 26 users not shown)
Line 8: Line 8:
 
The general aim of the Codefest is improving the accessibility, functionality and interoperability of the existing libraries. The specific goals are determined based on the interests of attending members and inputs of sponsors. Some current areas of topic discussion are:
 
The general aim of the Codefest is improving the accessibility, functionality and interoperability of the existing libraries. The specific goals are determined based on the interests of attending members and inputs of sponsors. Some current areas of topic discussion are:
  
* Improving the presence of OpenBio libraries on distributed computing environments like [http://aws.amazon.com/ec2/ Amazon Elastic Compute Cloud] and [http://open.eucalyptus.com/ Eucalyptus]. Ntino has written up an excellent project proposal available for [http://www.open-bio.org/w/images/8/85/BOSC_2010_Cloud_BioLinux.pdf download in pdf format].
+
=== Cloud computing ===
 +
 
 +
Improving the presence of OpenBio libraries on distributed computing environments like [http://aws.amazon.com/ec2/ Amazon Elastic Compute Cloud] and [http://open.eucalyptus.com/ Eucalyptus]. Ntino has written up an excellent project proposal available for [http://www.open-bio.org/w/images/8/85/BOSC_2010_Cloud_BioLinux.pdf download in pdf format].
 +
 
 +
Initial work has started to develop an automated build environment that incorporates the [http://www.jcvi.org/cms/research/projects/jcvi-cloud-biolinux/overview/ Cloud BioLinux] and [http://fortinbras.us/bioperl-max/ bioperl-max] efforts. See the [http://bcbio.wordpress.com/2010/05/08/automated-build-environment-for-bioinformatics-cloud-images/ blog post] for full details. Code and configuration files are available from a [http://github.com/chapmanb/bcbb/tree/master/ec2/biolinux/ GitHub repository]. The post outlines several areas of improvements which could be targets for focused work at the Codefest.
 +
 
 +
=== Semantic Web ===
 +
 
 +
[http://hackathon3.dbcls.jp/ The 3rd DBCLS BioHackathon] focused on the Semantic Web technologies in bioinformatics. As a result, in addition to the [http://uniprot.org/ UniProt], several database providers including [http://www.ddbj.nig.ac.jp/ DDBJ], [http://www.pdbj.org/ PDBj] and [http://www.kegg.jp/en/ KEGG] have started to generate their data in RDF. These Linked Data can be queried by SPARQL and initial attempts to provide high level library for biological queries were made by [http://biopython.org/ BioPython] and [http://bioruby.org/ BioRuby] groups. We propose to continue this challenge with all OpenBio projects to make a standard interface (query builder, ontology mapping etc.) for major biological SPARQL endpoints and handling RDF files.
 +
 
 +
To achieve this goal, we also need to develop an integrated/distributed triple store such as [http://www.semantic-systems-biology.org/ BioGateway]. From our experience, to generate and store a large scale RDF triples is still a major issue even with standard triple stores. Additionally, we will try to convert biological queries in natural language to SPARQL with a NLP technology.
  
 
== Location ==
 
== Location ==
  
* July 7, 2010 -- Harvard School of Public Health [http://maps.google.com/maps?hl=en&source=hp&q=655+Huntington+Avenue,+Boston,+MA&um=1&ie=UTF-8&hq=&hnear=655+Huntington+Ave,+Boston,+MA+02115&gl=us&ei=vepeS6rVAsSztgeV-7z9Cw&sa=X&oi=geocode_result&ct=image&resnum=1&ved=0CAkQ8gEwAA 655 Huntington Avenue, Boston, MA]
+
* July 7, 2010 (10am to whenever) -- Countway Library of Medicine at the Harvard Medical School, 10 Shattuck St Boston [http://maps.google.com/maps?oe=utf-8&rls=org.mozilla:en-US:official&client=firefox-a&um=1&ie=UTF-8&q=countway+library&fb=1&gl=us&hq=countway+library&hnear=Boston,+MA&cid=0,0,5764501267301494561&ei=DKgjTMqrGoT6lwf00aS4AQ&sa=X&oi=local_result&ct=image&resnum=1&ved=0CBMQnwIwAA]
* July 8, 2010 -- Massachusetts General Hospital (MGH), Simches Research Building, Room 3130 [http://maps.google.com/maps?hl=en&source=hp&q=185+Cambridge+St,+Boston,+Suffolk,+Massachusetts+02114&um=1&ie=UTF-8&hq=&hnear=185+Cambridge+St,+Boston,+MA+02114&gl=us&ei=qeteS_eUGoy1tgeDyKjwCw&sa=X&oi=geocode_result&ct=image&resnum=1&ved=0CAsQ8gEwAA 185 Cambridge Street, Boston, MA]
+
**Getting there: take the E-line green line train (direction Heath St) to the Brigham Circle stop. Cross over to the right side of the street and walk back a bit in the direction from which you came. There is a passageway directly before the Harvard School of Public Health; walk down the steps into the courtyard area behind the Harvard School of Public Health, and continue straight back through the courtyard and up the steps on the other side. Turn left at the top of the stairs and you will see the Countway Library on your left. When you arrive, the security guard should have your name, but if not, tell him you are attending the Hackathon and he will let you through. If you get lost use the [http://www.open-bio.org/w/images/2/25/Brigham_Circle_-_Countway_Map.pdf PDF map] as a guide.
 +
 
 +
* July 8, 2010 (10:30am to whenever) -- Massachusetts General Hospital (MGH), Simches Research Building, Room 3130 [http://maps.google.com/maps?hl=en&source=hp&q=185+Cambridge+St,+Boston,+Suffolk,+Massachusetts+02114&um=1&ie=UTF-8&hq=&hnear=185+Cambridge+St,+Boston,+MA+02114&gl=us&ei=qeteS_eUGoy1tgeDyKjwCw&sa=X&oi=geocode_result&ct=image&resnum=1&ved=0CAsQ8gEwAA 185 Cambridge Street, Boston, MA]
 +
** Getting there: The three closest [http://www.mbta.com/ MBTA] stops are 'Charles/MGH' (Red Line), 'Bowdoin' (Blue line) and 'Government Center' (Green line). All three are located next to Cambridge Street. From the Red Line stop, walk away from the river; walk towards the river from either the Blue or Green line. Simches is located slightly off Cambridge Street -- the closest landmark on Cambridge Street is a big yellow Au Bon Pain. Walk up the stairs next to the Au Bon Pain and you'll be in a parking lot: the Simches building is located straight in front of you across the parking lot, to the left of the Whole Foods and CVS. Walk in the lobby and the security desk should have a badge for you. You can take the elevators up to the 3rd floor. Room 3130 is down the only non-secured corridor and is on the left side just past the cafeteria.
 +
 
 +
== Resources ==
 +
 
 +
* [http://aws.amazon.com/ Amazon Web Services] -- We will be able to use EC2 and other Amazon services thanks to [http://aws.amazon.com/education/ Amazon grant] support for a proposal put together by Steffan Moellar. Many thanks to Amazon for supporting the initiative and Steffan for putting together the application.
 +
* [http://www.eucalyptus.com/ Eucalyptus] -- The [http://www.uk-sh.de/index.phtml?NavID=676.467.4&La=4&switch_la=1&back_qs=NavID%3D676.467.4 Universitätsklinikum Schleswig-Holstein] is sponsoring a 12 node local Eucalyptus cloud for testing and development. Thanks again to Steffan for making this available.
  
 
== Sponsorship ==
 
== Sponsorship ==
  
Space and internet for the Codefest are kindly provided by the [http://www.hsph.harvard.edu/ Harvard School of Public Health] and [http://www.mgh.harvard.edu/ Massachusetts General Hospital]. We are actively seeking sponsors to help supplement the travel, lodging and meal costs for developers. If you're interested in contributing to Open Source development in Bioinformatics and helping to direct the focus on the Codefest, please contact Brad.
+
Space and internet for the Codefest are kindly provided by the [http://www.hsph.harvard.edu/research/bioinfocore/ Harvard School of Public Health Bioinformatics Core] and [http://www.mgh.harvard.edu/ Massachusetts General Hospital]. We are actively seeking sponsors to help supplement the travel, lodging and meal costs for developers. If you're interested in contributing to Open Source development in Bioinformatics and helping to direct the focus on the Codefest, please contact Brad.
 +
 
 +
== ToDo List ==
 +
 
 +
Add your goals and plans for the Codefest here. This is a brainstorming section to help us organize ourselves.
 +
 
 +
=== Cloud computing ===
 +
 
 +
Work for the current [http://bcbio.wordpress.com/2010/05/08/automated-build-environment-for-bioinformatics-cloud-images/ community bioinformatics image] ([http://github.com/chapmanb/bcbb/tree/master/ec2/biolinux/ framework on GitHub]):
 +
 
 +
* Perl library support and useful package list
 +
* Java library organization and expand useful packages
 +
* Provide packaging for missing programs. See comments at the end of the [http://github.com/chapmanb/bcbb/blob/master/ec2/biolinux/config/packages.yaml Package config] for some targets.
 +
* Documentation: especially targeted at new users.
 +
* Produce an automated manifest for an AMI, listing versions of all installed packages and libraries.
 +
* Provide standard data like indexed next-gen genomes, blast databases, and so on via EBS snapshots.
 +
* Automation to build AMI and roll out to Amazon on a bi-weekly/monthly basis based on latest code.
 +
* Website with documentation, AMI history.
 +
* Testing on [http://ecc.eucalyptus.com/ Eucalyptus clouds].
 +
* Titus Brown's post on [http://ivory.idyll.org/blog/jun-10/ngs-course-postmortem.html using cloud computing to teach a next-gen sequencing course].
 +
* Richard Holland's post on [http://blog.eaglegenomics.com/?p=17 getting started with Amazon EC2].
 +
 
 +
=== Suggested Additions for Cloud computing image ===
 +
==== Perl ====
 +
==== Ruby ====
 +
==== Python ====
 +
==== Java ====
 +
==== Data ====
 +
* [ftp://ftp.ebi.ac.uk//pub/databases/uniprot/current_release/uniref UniRef]
 +
** UniRef50 and UniRef90, and if you've got the space, UniRef100, too.
 +
 
 +
=== Semantic Web ===
 +
 
 +
* Building SPARQL endpoints of DDBJ RDF, PDBj RDF, MEDLINE, KEGG and UniProt.
 +
* Complete supports for SPARQL endpoints in BioRuby and BioPython. If possible, BioPerl and BioJava as well.
 +
** [http://hackathon3.dbcls.jp/wiki/ImplementationBootcamp Brief introduction to RDF/SPARQL in OpenBio]
 +
** [http://sourceforge.net/apps/mediawiki/bio2rdf/index.php?title=Demo_queries Example queries against Bio2RDF]
 +
** [http://www.semantic-systems-biology.org/biogateway/querying Example queries against BioGateway]
 +
** [[SemWeb:PDB2RDF]]
 +
** [[SemWeb:Bio2RDF_Endpoints]]
 +
 
 +
=== Bioperl/BioSQL ===
 +
 
 +
* Include SQLite support in BioSQL
 +
* DBIx::Class integration with BioSQL
 +
* A bit on Moose and Perl 6
 +
 
 +
=== Key signing ===
 +
 
 +
* Sign Open PGP keys for OBF project release managers, see e.g. [http://cryptnet.net/fdp/crypto/keysigning_party/en/keysigning_party.html The Keysigning Party HOWTO]
  
 
== Attendees ==
 
== Attendees ==
Line 26: Line 93:
 
* [http://www.bioperl.org/wiki/User:Lapp Hilmar Lapp]
 
* [http://www.bioperl.org/wiki/User:Lapp Hilmar Lapp]
 
* [http://www.biojava.org/wiki/Michael_Heuer Michael Heuer]
 
* [http://www.biojava.org/wiki/Michael_Heuer Michael Heuer]
* [http://www.bioperl.org/wiki/User:Cjfields Chris Fields]
+
* [http://www.bioperl.org/wiki/User:Cjfields Chris Fields] - Arriving Wed around 2pm
 
* [http://www.linkedin.com/in/agbiotec Ntino Krampis]
 
* [http://www.linkedin.com/in/agbiotec Ntino Krampis]
 
* [http://nebc.nox.ac.uk/nebc/about-us/people Tim Booth]
 
* [http://nebc.nox.ac.uk/nebc/about-us/people Tim Booth]
* [http://www.scri.ac.uk/staff/petercock Peter Cock] (TBC)
+
* <s>[http://www.scri.ac.uk/staff/petercock Peter Cock]</s> (Unable to attend)
 
* [[User:Kdahlquist | Kam Dahlquist]]  
 
* [[User:Kdahlquist | Kam Dahlquist]]  
 
* [http://myweb.lmu.edu/dondi John David N. Dionisio]
 
* [http://myweb.lmu.edu/dondi John David N. Dionisio]
Line 35: Line 102:
 
* [http://www.biojava.org/wiki/User:Rholland Richard Holland] (TBC)
 
* [http://www.biojava.org/wiki/User:Rholland Richard Holland] (TBC)
 
* [http://nebc.nox.ac.uk/nebc/about-us/people Bela Tiwari]
 
* [http://nebc.nox.ac.uk/nebc/about-us/people Bela Tiwari]
* [http://nebc.nox.ac.uk/nebc/about-us/people Oliver Buckley]
+
* <s>[http://nebc.nox.ac.uk/nebc/about-us/people Oliver Buckley]</s> (Changed job)
 
* [[User:Domibel | Dominique Belhachemi]]
 
* [[User:Domibel | Dominique Belhachemi]]
 +
* [http://www.linkedin.com/in/toshiakikatayama Toshiaki Katayama]
 +
* [http://www.linkedin.com/in/nakaomitsuteru Mitsuteru NAKAO]
 +
* [http://www.bioperl.org/wiki/User:Dave_Messina Dave Messina]
 +
* [http://www.linkedin.com/in/cmzmasek Christian Zmasek]
 +
* Kimberly Begley
 +
* [http://jp.linkedin.com/in/akirakinjo Akira KINJO (PDBj)]
 +
* [http://www.linkedin.com/in/ngoto Naohisa Goto]
 +
* [http://purl.org/yayamamo/home Yasunori Yamamoto]
 +
* [http://it.linkedin.com/in/raoulbonnal Raoul J.P. Bonnal]
 +
* [http://www.linkedin.com/in/atsukoyamaguchi Atsuko Yamaguchi]
 +
* Christopher Bottoms (+ wife Marcie, and dau. Zebadee)
 +
* [http://bx.mathcs.emory.edu James Taylor]
 +
* [http://bx.mathcs.emory.edu Enis Afgan]
 +
* [http://bx.mathcs.emory.edu Dannon Baker]
 +
* [http://jp.linkedin.com/in/shuichikawashima Shuichi Kawashima]
 +
* Jin-Dong Kim
 +
* [http://sa.linkedin.com/pub/heikki-lehvaslaiho/1/974/377 Heikki Lehvaslaiho]
 +
* [http://etalog.blogspot.com/ Eric Talevich]
 +
* [http://bio2rdf.org/ Marc-Alexandre Nolin (Bio2RDF)]
 +
* [http://scottcain.net/ Scott Cain]
 +
* [http://www.broadinstitute.org/gsa/wiki/index.php/Main_Page Aaron McKenna]
 +
* [http://search.cpan.org/~rogerhall/ Roger A Hall]
  
 
Feel free to add yourself if you are interested. We are happy to have you.
 
Feel free to add yourself if you are interested. We are happy to have you.
 +
 +
== BBQ ==
 +
 +
After two days of hard work, there will be a celebratory BBQ at [http://maps.google.com/maps?hl=en&q=35+Partridge+Avenue+Somerville,+MA+02145&um=1&ie=UTF-8&hq=&hnear=35+Partridge+Ave,+Somerville,+MA+02145&gl=us&ei=3ckYTKzOL4OC8gbd_qDODA&sa=X&oi=geocode_result&ct=image&resnum=1&ved=0CBQQ8gEwAA Brad's house in Somerville] the evening of July 8th. All are welcome for drinks and whatever magic I can whip up on my little charcoal grill.
 +
 +
The easiest way to get there is via cab. From Mass General Hospital, walk up Cambridge Street a few blocks to the Liberty Hotel where there is a cab stand. Ask the cab driver to take you to Medford Street in Somerville, via McGrath Highway. Partridge Avenue is located off Medford Street, on the right a few blocks after Central Street. I'll pass out my cell phone number to everyone during the coding sessions if more directions are needed on route.
  
 
== Discussion ==
 
== Discussion ==
  
We welcome any thoughts from interested participants. Please direct discussion to the BOSC mailing list: [mailto:bosc@open-bio.org bosc@open-bio.org].
+
We welcome any thoughts from interested participants. Please direct discussion to the OpenBio mailing list: [mailto:open-bio-l@lists.open-bio.org open-bio-l@lists.open-bio.org].
 +
 
 +
For short-lived coordination tasks during the hackaton, an IRC channel has been setup on FreeNode: #codefest
 +
 
 +
Please use the hash tag #bosc2010 on twitter to help remote folks follow the discussion.

Latest revision as of 18:12, 7 September 2015

OpenBio Codefest 2010 will take place July 7th and 8th, 2010 in conjunction with BOSC 2010. This is an opportunity for OpenBio developers from projects like BioPerl, BioJava, Biopython, BioRuby, and EMBOSS to work collaboratively on improving Open Source Bioinformatics code.

Goals

OpenBio projects are typically coordinated remotely, with users from all over the world contributing and organizing themselves through mailing lists and IRC chats. Additionally, contributors work on these projects in their spare time, coordinating improving the projects with their day jobs and life outside of the computer. The objective of the Codefest is to give these talented developers a chance to be fully focused on the projects for a few days, interacting in real time. Previous Hackathons have been immensely successful at producing new high quality code and innovative project developments.

The general aim of the Codefest is improving the accessibility, functionality and interoperability of the existing libraries. The specific goals are determined based on the interests of attending members and inputs of sponsors. Some current areas of topic discussion are:

Cloud computing

Improving the presence of OpenBio libraries on distributed computing environments like Amazon Elastic Compute Cloud and Eucalyptus. Ntino has written up an excellent project proposal available for download in pdf format.

Initial work has started to develop an automated build environment that incorporates the Cloud BioLinux and bioperl-max efforts. See the blog post for full details. Code and configuration files are available from a GitHub repository. The post outlines several areas of improvements which could be targets for focused work at the Codefest.

Semantic Web

The 3rd DBCLS BioHackathon focused on the Semantic Web technologies in bioinformatics. As a result, in addition to the UniProt, several database providers including DDBJ, PDBj and KEGG have started to generate their data in RDF. These Linked Data can be queried by SPARQL and initial attempts to provide high level library for biological queries were made by BioPython and BioRuby groups. We propose to continue this challenge with all OpenBio projects to make a standard interface (query builder, ontology mapping etc.) for major biological SPARQL endpoints and handling RDF files.

To achieve this goal, we also need to develop an integrated/distributed triple store such as BioGateway. From our experience, to generate and store a large scale RDF triples is still a major issue even with standard triple stores. Additionally, we will try to convert biological queries in natural language to SPARQL with a NLP technology.

Location

  • July 7, 2010 (10am to whenever) -- Countway Library of Medicine at the Harvard Medical School, 10 Shattuck St Boston [1]
    • Getting there: take the E-line green line train (direction Heath St) to the Brigham Circle stop. Cross over to the right side of the street and walk back a bit in the direction from which you came. There is a passageway directly before the Harvard School of Public Health; walk down the steps into the courtyard area behind the Harvard School of Public Health, and continue straight back through the courtyard and up the steps on the other side. Turn left at the top of the stairs and you will see the Countway Library on your left. When you arrive, the security guard should have your name, but if not, tell him you are attending the Hackathon and he will let you through. If you get lost use the PDF map as a guide.
  • July 8, 2010 (10:30am to whenever) -- Massachusetts General Hospital (MGH), Simches Research Building, Room 3130 185 Cambridge Street, Boston, MA
    • Getting there: The three closest MBTA stops are 'Charles/MGH' (Red Line), 'Bowdoin' (Blue line) and 'Government Center' (Green line). All three are located next to Cambridge Street. From the Red Line stop, walk away from the river; walk towards the river from either the Blue or Green line. Simches is located slightly off Cambridge Street -- the closest landmark on Cambridge Street is a big yellow Au Bon Pain. Walk up the stairs next to the Au Bon Pain and you'll be in a parking lot: the Simches building is located straight in front of you across the parking lot, to the left of the Whole Foods and CVS. Walk in the lobby and the security desk should have a badge for you. You can take the elevators up to the 3rd floor. Room 3130 is down the only non-secured corridor and is on the left side just past the cafeteria.

Resources

  • Amazon Web Services -- We will be able to use EC2 and other Amazon services thanks to Amazon grant support for a proposal put together by Steffan Moellar. Many thanks to Amazon for supporting the initiative and Steffan for putting together the application.
  • Eucalyptus -- The Universitätsklinikum Schleswig-Holstein is sponsoring a 12 node local Eucalyptus cloud for testing and development. Thanks again to Steffan for making this available.

Sponsorship

Space and internet for the Codefest are kindly provided by the Harvard School of Public Health Bioinformatics Core and Massachusetts General Hospital. We are actively seeking sponsors to help supplement the travel, lodging and meal costs for developers. If you're interested in contributing to Open Source development in Bioinformatics and helping to direct the focus on the Codefest, please contact Brad.

ToDo List

Add your goals and plans for the Codefest here. This is a brainstorming section to help us organize ourselves.

Cloud computing

Work for the current community bioinformatics image (framework on GitHub):

  • Perl library support and useful package list
  • Java library organization and expand useful packages
  • Provide packaging for missing programs. See comments at the end of the Package config for some targets.
  • Documentation: especially targeted at new users.
  • Produce an automated manifest for an AMI, listing versions of all installed packages and libraries.
  • Provide standard data like indexed next-gen genomes, blast databases, and so on via EBS snapshots.
  • Automation to build AMI and roll out to Amazon on a bi-weekly/monthly basis based on latest code.
  • Website with documentation, AMI history.
  • Testing on Eucalyptus clouds.
  • Titus Brown's post on using cloud computing to teach a next-gen sequencing course.
  • Richard Holland's post on getting started with Amazon EC2.

Suggested Additions for Cloud computing image

Perl

Ruby

Python

Java

Data

  • UniRef
    • UniRef50 and UniRef90, and if you've got the space, UniRef100, too.

Semantic Web

Bioperl/BioSQL

  • Include SQLite support in BioSQL
  • DBIx::Class integration with BioSQL
  • A bit on Moose and Perl 6

Key signing

Attendees

Feel free to add yourself if you are interested. We are happy to have you.

BBQ

After two days of hard work, there will be a celebratory BBQ at Brad's house in Somerville the evening of July 8th. All are welcome for drinks and whatever magic I can whip up on my little charcoal grill.

The easiest way to get there is via cab. From Mass General Hospital, walk up Cambridge Street a few blocks to the Liberty Hotel where there is a cab stand. Ask the cab driver to take you to Medford Street in Somerville, via McGrath Highway. Partridge Avenue is located off Medford Street, on the right a few blocks after Central Street. I'll pass out my cell phone number to everyone during the coding sessions if more directions are needed on route.

Discussion

We welcome any thoughts from interested participants. Please direct discussion to the OpenBio mailing list: open-bio-l@lists.open-bio.org.

For short-lived coordination tasks during the hackaton, an IRC channel has been setup on FreeNode: #codefest

Please use the hash tag #bosc2010 on twitter to help remote folks follow the discussion.