-

Difference between revisions of "Google Summer of Code"

From Open Bioinformatics Foundation
Jump to: navigation, search
(Biogeographical and community phylogenetics for BioPython)
(GSoC 2016: Link to new site)
 
(283 intermediate revisions by 30 users not shown)
Line 1: Line 1:
The O|B|F is applying for the first time for the [http://socghop.appspot.com/ Google Summer of Code] (GSoC) program as an umbrella organization for all O|B|F-affiliated projects.
+
[[Image:GSoC15-logo-small.jpg|right|frame|link=http://code.google.com/soc]]  
  
On this page we are collecting ideas, possible projects, prerequisites, possible solution approaches, mentors, other people or channels to contact for more information or to bounce ideas off of, etc.
+
Google Summer of Code (GSoC) is a student internship program for open-source projects. The program offers eligible student developers stipends to write code for open source projects over a period of 3 summer months ("flip bits, not burgers"). See the '''[http://code.google.com/soc Google Summer of Code Main Site]''' for general information about the Google Summer of Code program, how to apply, frequently asked general questions, and more.
  
== News ==
 
  
*  08 Mar 2009: The project ideas page (the page you are looking at) is ready for adding project ideas. ''--[[User:Lapp|Lapp]]''
+
== GSoC 2016 ==
  
== Contact ==
+
The Google Summer of Code 2016 is ON! OBF is once again applying as a GSoC mentoring organization this year. Interested mentors and students should subscribe to the OBF/GSoC [http://lists.open-bio.org/mailman/listinfo/gsoc mailing list]. Please announce yourself, so we know who you are!
 +
The details of each of our project ideas are listed below, including potential mentors.
  
Our organization administrators are [[User:Lapp|Hilmar Lapp]] ([mailto:hlapp%40gmx%2enet hlapp@gmx.net]) and [[User:Mauricio|Mauricio Herrera Cuadra]] ([mailto:mauricio%40open-bio%2eorg mauricio@open-bio.org]).
+
See http://obf.github.io/GSoC/ for more information about the GSoC program and additional ways to get in touch with us.
  
If you are a student interested in applying for a Google Summer of Code project with our organization, please send any questions you have, projects you would like to propose, etc to the developer mailing list of the pertinent O|B|F project.
+
<!--Our GSoC ideas from each project are collected here: '''[[Google Summer of Code 2015 Ideas |OBF Project Ideas for GSoC 2015]]''' -->
 +
=== Facts & Links ===
  
How do you know which project is pertinent and the address of its developer mailing list? The projects under the O|B|F umbrella are listed below, with home page and developer mailing lists. Each project idea lists the O|B|F project it is a part of; look it up in the list below and you have the information you need. If you want to propose your own project idea and the project to which you would contribute isn't obvious, send email to [mailto:open-bio-l%40open-bio%2eorg open-bio-l&#64;open-bio&#46;org].
+
; Time Line :
 +
:* [https://developers.google.com/open-source/gsoc/timeline GSoC time line]
  
Some of us also hang out regularly on IRC, see the list of O|B|F projects below for information on which projects have a channel and the name of the channel. ''(If you do not have an IRC client installed, you might find the [http://en.wikipedia.org/wiki/List_of_IRC_clients comparison on Wikipedia], the [http://directory.google.com/Top/Computers/Software/Internet/Clients/Chat/IRC/ Google directory], or the [http://www.ircreviews.org/clients/ IRC Reviews] helpful. For Macs, [http://en.wikipedia.org/wiki/X-Chat X-Chat Aqua] works pretty well. If you have never used IRC, try the [http://irchelp.org/irchelp/ircprimer.html IRC Primer] at [http://irchelp.org/ IRC Help], which also has links to lots of other material.)''
+
; GSoC 2016 FAQ :
 +
:* For questions of eligibility, see the [https://developers.google.com/open-source/gsoc/faq GSoC 2016 FAQ].
  
For applying, please make sure you read our [[#What_should_prospective_students_know.3F|documentation on information that students should know and guidelines we expect you to follow]] ''before'' you apply. We don't have a format template for application that you need to adhere to, but we do ask that you include specific kinds of information. What those are is documented under "[[#When you apply|When you apply]]."
+
; Info from Google :
 +
:* There is also a [http://groups.google.com/group/google-summer-of-code-discuss Google group for posting GSoC questions] (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (and are not specific to our organization).
 +
:* Students receive a stipend from Google if accepted. See the  [https://developers.google.com/open-source/gsoc/faq GSoC 2016 FAQ] for full documentation.
 +
:* Development is done entirely remotely and on-line; there is no requirement or expectation for either students or mentors to travel.
  
== Ideas ==
+
=== Why apply? ===
  
''Note: if there is more than one mentor for a project, the primary mentor is in '''bold font'''. Biographical and other information on the mentors is linked to in the [[#Mentors|Mentors section]].''
+
One of the most important features of the program is that students are paired with mentors, who are typically experienced developers from the project to which the student is contributing. The mentor guides the student to work productively within the community, and helps the student avoid obstacles and pitfalls.  The program is global - students and mentors may be located anywhere where they have an internet connection (except for countries affected by US trade restrictions), and no travel is required. Thus, aside from the stipend and mentorship aspects, the student's experience in the internship closely mirrors normal work on distributed development projects.  Effective work habits for distributed development are typically not taught as part of computer science curricula, yet are highly desired in the increasingly global and distributed software, IT, and biotechnology industries.
  
''Students: The below are only our project '''ideas''', albeit well thought-out ones. You are welcome to propose your own project if none of those below catches your interest, or if your idea is more exciting to you, provided it is still a contribution to one the O|B|F member projects (see list below). Just be aware that we can't guarantee finding an appropriate mentor, but if we like your proposal we will try.  Regardless of what you decide to do, make sure you read and follow the [[#What_should_prospective_students_know.3F|guidelines for students]] below.''
+
From the viewpoint of each open-source project, the program not only offers to pay students for contributing, but more importantly, offers an opportunity to recruit new developers who will hopefully go on to become regular, sustaining contributors.
  
=== Write a NEXUS parser in C&amp; ===
+
==Project Ideas==
''This is a template for how the student project ideas could be presented. Feel free to copy & paste & edit, and feel free to adjust the format.''
 
; Rationale : C& is an amp'ed-up programming language that has not been invented yet but in a few years will dominate the programming world. The best way to prevent broken non-compliant NEXUS parsers written in C& from appearing is to write a good one now.
 
; Approach : Re-implementations of NEXUS parsers inevitably tend to be broken or non-compliant. Hence, the best approach is to write a translator that translates a reference implementation to C&.
 
; Challenges : C& has not been invented yet, so a lot of assumptions will have to be made.
 
; Involved toolkits or projects : The [http://www.biocamp.org BioC&] toolkit has much of the needed framework.
 
; Degree of difficulty and needed skills : Hard. The hardest part is probably inventing C&amp;. Writing the parser itself should be medium, unless C& was ill-designed for writing parsers. Knowledge of the BioC& toolkit will obviously help, as well as knowing the NEXUS format.
 
; Mentors : Mike&amp;, founder of BioC&
 
  
  
=== Write a JEE5 webservice interface to BioSQL ===
+
Our GSoC ideas from each project are collected here: http://obf.github.io/GSoC/
; Rationale :  
 
BioSQL is a intelligently designed database schema for storing sequence data and associated metadata. It does however lack any kind of user API. A sensible way to design an API for a BioSQL backed database would be to expose the API as webservices. This would allow the API to be language and database agnostic (unlike an API based on database proceedures). It would also allow data in BioSQL to be very loosely coupled into bioinformatics workflows. Once an API is in place one could even adopt modified SQL schemas underneath as long as the data access API still conforms to some specification.  
 
  
; Approach :
+
== OBF Projects Accepting Applicants ==
Since the development of Java EE5 (and EJB3) the development of Enterprise Java Beans that interoperate with databases and webservices is exceptionally easy. In addition Java Session Beans can be readily exported as webservices with the addition of simple annotations, often no specific configuration is required. Free and open Java app servers (such as glassfish) that provide almost all of the management middleware for object relational mapping (ORM) and webservice deployment (and a whole host of other things) are available and relatively simple to use. Finally the free and open IDE Netbeans has excellent integration with Glassfish and Java EE5 (plus I am most experienced with this IDE so I can provide more help with it's use).  For these reasons I would suggest that Java EE5 is the most sensible approach to implementing this project.
 
  
During a development meeting, in Tokyo in 2008, a preliminary EJB mapping to BioSQL was generated. What remains to be done is the development of a simple, well documented and well tested API specification and implementation that bioinformatics developers can use to perform CRUD (CReate, Update, Delete) functions on the database as well as useful search and retreival operations.
+
[[Image:BioPerl_logo_tiny.jpg|right|link=bp:Google Summer of Code]]
 +
; [[bp:Google Summer of Code|BioPerl]] :
 +
:* '''[[bp:Google Summer of Code | BioPerl GSoC Page]]''' - project ideas and mentors
 +
:* [[bp:Main Page|Project website]]
 +
:* [[bp:Becoming_a_developer|Information for new developers]]
 +
:* source code browser for [http://code.open-bio.org/svnweb/index.cgi/bioperl/browse/bioperl-live/trunk bioperl-live] (the main BioPerl code base), and [http://code.open-bio.org/svnweb/index.cgi/bioperl/ all BioPerl sub-projects]
 +
:* [[bp:Project_priority_list|Priority list]] of things that need work, as another source for student-conceived project ideas
 +
:* [[bp:Mailing_lists|Mailing lists]]
 +
:* IRC: <code>#bioperl</code> on [http://freenode.net Freenode]
  
In summary the project will define and document an API and expected behaivour and then implement the webservice interface. A set of unit tests will also be developed along with a proof-of-concept app that demonstrates use of the API.
+
[[Image:Biopython_logo_tiny.png|right|link=biopython:Google Summer of Code]]
 +
; [[biopython:Google Summer of Code |BioPython]] :
 +
:* '''[[biopython:Google Summer of Code | BioPython GSoC Page]]''' - project ideas and mentors
 +
:* [[biopython:Main Page|Project website]]
 +
:* [[biopython:Contributing|Information for contributors]]
 +
:* [[biopython:Mailing lists|Mailing lists]]
 +
:* [[biopython:SourceCode| Source Code]]
 +
:* No IRC channel at present
  
; Challenges :  
+
[[Image:Biojava_logo_tiny.jpg|right|link=http://biojava.org/wiki/Google_Summer_of_Code]]
* Designing and documenting the API so that it is simple and intuitive
+
; [http://biojava.org/wiki/Google_Summer_of_Code BioJava] :
* Making simple queries simple and efficient and complex queries possible.
+
:* '''[http://biojava.org/wiki/Google_Summer_of_Code BioJava GSoC Page]''' - project ideas and mentors
* Making CRUD operations secure (only people with the right credentials should be able to delete the data).
+
:* [http://biojava.org/wiki/BioJava:Modules BioJava modules] as another source for student-conceived project ideas
* Loaders for common file types.
+
:* source code for [http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk biojava-live] (the main BioJava code base) and [http://code.open-bio.org/svnweb/index.cgi/biojava/ all BioJava sub-projects]
* [Nice to have] Making a test application that will call API methods with predefined arguments. This will let people make alternative implementations of the API while testing they are still compatible with the API. For example someone could make an entire implementation in Perl/ BioPerl and still have it validate against the API.
+
:* [http://biojava.org/wiki/BioJava:MailingLists Mailing lists]
 +
:* No IRC channel at present
  
; Involved toolkits or projects :  
+
[[Image:BioRuby_logo_tiny.png|right|link=http://bioruby.org]]
JavaEE5, BioSQL, parts of BioJava would be useful to steal for parsing.
+
; [http://bioruby.org BioRuby] :
 +
:* '''[http://bioruby.open-bio.org/wiki/Google_Summer_of_Code BioRuby GSoC Page]''' - project ideas and mentors
 +
:* [http://bioruby.org Project website]
 +
:* [http://lists.open-bio.org/mailman/listinfo/bioruby developers mailing list]
 +
:* [http://github.com/bioruby/bioruby/tree/master source code]
 +
:* IRC: <code>#bioruby</code> on [http://freenode.net Freenode]
  
; Degree of difficulty and needed skills :  
+
[[Image:BioSQL_logo.png|160px|right|link=biosql:Main Page]]
Medium to Hard. While the use of Java EE5 is now quite easy (esp with IDEs like Netbeans) there is quite a lot of concepts involved in the project (Webservices, ORM, EJBs etc). The hard part would be getting up to speed with those concepts. If you already know a lot of this then the project would only be medium difficulty. At minimum the student should be confident with Java and at least aware of some of the technologies. This is not the right project for a very new programmer.
+
; [[biosql:Main Page|BioSQL]] :
 +
:* [[biosql:Main Page|Project website]]
 +
:* Current [http://biosql.org/wiki/Enhancement_Requests enhancement requests] as another source for student-conceived project ideas
 +
:* [http://biosql.org/mailman/listinfo/biosql-l developers mailing list]
 +
:* [http://code.open-bio.org/svnweb/index.cgi/biosql/browse/biosql-schema/trunk source code]
 +
:* No IRC channel at present
  
; Mentors : [http://www.linkedin.com/in/markjschreiber Mark Schreiber] (and anyone else who wants to help)
+
; [http://biohaskell.org/ BioHaskell] :
 +
:* [http://biohaskell.org/ Project website]]
 +
:* [http://hackage.haskell.org/packages/#cat:Bioinformatics Bioinformatics section on HackageDB]
  
===  Mapping the NCBI toolkit to BioPerl, BioRuby, BioConductor and BioJAVA using BioLib ===
+
; [http://biocaml.org Biocaml] :
 +
:* [http://biocaml.org Project website]
 +
:* [https://groups.google.com/d/forum/biocaml Mailing list]
 +
:* [http://www.open-bio.org/wiki/Google_Summer_of_Code_2015_Ideas#Biocaml Project ideas]
  
; Rationale :
+
== Guide for prospective GSoC students ==
  
The National Center for Biotechnology Information (NCBI) has created a
+
=== Before you apply ===
large collection of utilities developed for the production and
 
distribution of GenBank, Entrez, BLAST, and related services. To
 
support these utilities a large set of C and C++ libraries are
 
maintained and regularly improved by NCBI. These include, for example,
 
sequence alignment algorithms, antigenic determinant prediction,
 
CPG-island finder, ORF finder and string matchers. This functionality
 
is ultimately of great interest to all scientists working in molecular
 
biology with application in biology and biomedical research.
 
  
Unfortunately, few bioinformaticians work with C/C++. Addressing this
+
* Proposals should extend one of affiliated toolkits, not start a new project.
NCBI has made a binding available for Python. This is not enough as
+
* If you want to apply with your own idea, it's best to [[#Contact|contact]] the OBF subproject you're interested in well before the application deadline, so we can work with you to find a mentor and solidify your project idea and application.
bioinformaticians work in many different programming languages, and to
+
* [[#Contact|Ask us questions]] on the subproject mailing lists about the project idea you have in mind.
be fully effective support should be made available at least for Perl,
+
* Write a project proposal draft, include a project plan (see below), and [[#Contact|send it to a project mailing list]] for comments before submitting it.
R and JAVA. These three together, probably, representing over 90% of
 
bioinformaticians. The [http://biolib.open-bio.org/ BioLib project]
 
successfully provides the 'mapping' infrastructure to map complex
 
libraries against many computer languages using SWIG. Basically one
 
mapping suffices to support all popular languages.  
 
 
; Approach :
 
  
Special interfaces need to be developed to map the NCBI toolkit
+
Again, '''students are strongly encouraged to [[#Contact|contact us]] as early as possible'''. Frequent and early communication is extremely valuable for putting together successful projects.
libraries against Perl initially. The (outdated) NCBI
 
[http://pypi.python.org/pypi/ncbi/0308 Python mapping] can be used as an initial
 
guide for mapping functionality. Once mapped against Perl mapping
 
against Ruby and Python is trivial. However, at this point BioLib
 
support for R and JAVA needs to be developed. A proof-of-concept can
 
be part of this project. Finally SWIG mappings can be used to
 
create automated documentation and testing of BioLib code.
 
  
; Challenges :
+
=== When you apply ===
  
The main challenge is to provide nice and consistent interfaces in
+
When applying, (aside from the information requested by Google) please provide the following in your application material.
high-level languages against the NCBI C/C++ toolkit library. This
+
# '''Your complete contact information''', including full name, physical address, preferred email address, and telephone number, plus other pertinent contact information such as IRC handles, etc.
requires OOP design and unit testing of existing functionality.
+
# Why you are interested in the project you are proposing and are well-suited to undertake it.
Also some SWIG hacking may be involved to provide decent mappings for R and
+
# A summary of your programming experience and skills.
JAVA, as well as SWIG auto generated documentation and testing.
+
# Programs or projects you have previously authored or contributed to, in particular those available as open-source, including, if applicable, any past Summer of Code involvement.
 +
# A project plan for the project you are proposing, even if your proposed project is directly based on one of the proposed project ideas for member projects.
 +
#* A project plan in principle divides up the whole project into a series of manageable milestones and time-lines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
 +
#* Do not take this part lightly. A compelling plan takes a significant amount of work. Empirically, applications with no or a hastily composed project plan have not been competitive, and a more thorough project plan can easily make an applicant out compete another with more advanced skills.
 +
#* A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
 +
#* We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
 +
#* We strongly recommend that you bounce your proposed project and your project plan draft off of us, using either the pertinent developers mailing list or the IRC channel(s). Through the project plan exercise you will inevitably discover that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
 +
# Any obligations, vacations, or plans for the summer that may require scheduling during the GSoC work period.
 +
#* We expect the your GSoC project to be your primary focus over the summer.  It should not be regarded as a part-time occupation.
 +
#* If you feel that you can manage other work obligations concurrently with your Summer of Code project, make your case and support it with evidence.
 +
#* Be honest and open.  If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) our trust in you will be severely degraded. Also, if you are accepted, discuss with your GSoC mentor before taking on additional obligations.
 +
#* One of the most common reasons for students to struggle or fail is being overcommitted. Do not set yourself up for failure!  GSoC summers should be fun and rewarding!
  
; Involved toolkits or projects :
+
== Student Progress Reports ==
  
[http://biolib.open-bio.org/ BioLib], BioPerl, SWIG (and optionally BioRuby, R/Bioconductor, BioJAVA
+
In addition to writing code, accepted students send weekly updates to the OBF community on their project's progress. These updates allow us to keep aware of how GSoC students are doing, give students a forum to ask any questions, and promote overall community bonding.
or BioPython)
 
  
; Degree of difficulty and needed skills :
+
At the beginning of the summer, we ask that you set up a blog for the GSoC project (or a category/tag on your existing blog) which you will use to summarize your progress every week, as well as longer posts about your work if you'd like. (See [http://zruanweb.com/tag/gsoc.html these] [http://www.yeyanbo.com/tag/gsoc.html examples] from 2013.)
  
This is a challenging project as it crosses
+
Then, at the start of each week:
computer languages. It requires experience in C++ and a wish for
 
deeper understanding of at least one high-level OOP language
 
like Perl (did I write OOP?), Python, JAVA, R or Ruby.
 
  
; Mentors :
+
# Post an update on your blog: What did you do last week? What do you plan to do this week? Do you have any unanswered questions, any unsolved problems from the last week, interesting observations or anything else you'd like to mention?
 +
# Email the URL and text of the post (or a short summary) to the host project's mailing list (your mentors will confirm which one to use) ''and'' the main OBF GSoC mailing list (gsoc@lists.open-bio.org).
  
'''Pjotr Prins''', Chris Fields
+
You will be writing under your own name, but with a clear association with your mentors, the OBF and its projects, so please take this seriously and be professional. Remember that your blog will be one of the first things found by anyone interested in the project you're working on, and can be a valuable resource to them &mdash; as well as a significant part of your online presence.
  
=== BioSQL web interface and API on Google App Engine ===
+
== Contact ==
 
 
; Rationale :
 
The [http://www.biosql.org/wiki/Main_Page BioSQL] project provides a
 
robust and well supported database schema for storing sequence data and
 
associated annotations and features. It does not have a standard web interface
 
or web facing API, both of which would provide improved access to scientific
 
data. Deployment of BioSQL currently requires knowledge
 
and administration of relational databases, which can hinder its use in
 
smaller research laboratories that do not have public servers or experienced
 
systems administrators.
 
 
 
This proposal seeks to bridge this gap by providing a rapidly deployable
 
[http://en.wikipedia.org/wiki/Cloud_computing cloud based] solution utilizing
 
the established BioSQL backend. This system will allow scientists to share
 
results in a standard format both early on during research and at the time of
 
publication. By deploying on stable architectures, long term data access is
 
ensured and not dependent on maintenance of local servers. Data archival for
 
replication and expansion of ideas is an important part of the scientific process; this
 
[http://www.portfolio.com/views/blogs/market-movers/2009/02/18/when-academic-papers-arent-replicable?tid=true recent blog review]
 
summarizes some of the problems associated with primary data access.
 
 
 
; Approach :
 
[http://code.google.com/appengine/ Google App Engine] provides a full
 
development stack for rapidly building and deploying web applications. The
 
platform provides free quotas which allow a small lab with a limited budget to
 
make their data available, and also scales for larger projects with popular
 
data sets.
 
 
 
The student project expands an initial demonstration server (under development)
 
to a full featured web application. The server side
 
implementation will be programmed in Python, utilizing the Google App Engine
 
[http://code.google.com/appengine/docs/ developers toolkit]
 
supplemented with the [http://biopython.org/wiki/Main_Page Biopython]
 
libraries. The client web interface will be designed using HTML, CSS and
 
javascript; the interface will utilize a full featured javascript
 
library, such as  [http://jquery.com/ jQuery] and [http://jqueryui.com/ jQueryUI]
 
or [http://extjs.com/ ExtJS]. Client to server communication occurs
 
using [http://en.wikipedia.org/wiki/Ajax_(programming) AJAX] techniques
 
with [http://en.wikipedia.org/wiki/JSON JSON] for data exchange.
 
 
 
In addition to the web interface, the server will also provide a programming
 
interface using a [http://en.wikipedia.org/wiki/Representational_State_Transfer REST]
 
API. This involves coordination with other proposed projects,
 
including the proposed JEE5 Java webservice, to design a common interface.
 
 
 
; Challenges :
 
* Familiarizing student with Python, Javascript and AJAX, as well as the Google App Engine environment.
 
* Initial implementation of BioSQL server interface with useful features.
 
* Coordinating input from users on the [http://lists.open-bio.org/mailman/listinfo/biosql-l BioSQL mailing list]. The student will need to solicit desired features from users and prioritize based on implementation time and importance. See [http://lists.open-bio.org/pipermail/biosql-l/2009-January/001464.html this mailing list discussion] for an example of interest and initial ideas.
 
* Designing the web interface for intuitive use.
 
* Coordinating API development with other projects.
 
 
 
; Involved toolkits or projects :
 
* [http://www.biosql.org/wiki/Main_Page BioSQL]
 
* [http://biopython.org/wiki/Main_Page Biopython]
 
* [http://code.google.com/appengine/ Google App Engine]
 
* [http://www.python.org Python]
 
* [http://jquery.com/ jQuery]; [http://jqueryui.com/ jQueryUI]
 
* [http://extjs.com/ ExtJS]
 
 
 
; Degree of difficulty and needed skills :
 
Medium to Hard. This requires a familiarity with current web frameworks and
 
utilizes a number of existing libraries to allow the student to jump right
 
into the development process. This requires the interested student to be comfortable
 
with quickly learning outside libraries. Beyond programming, the project
 
will also involve creative thinking about interface and usability design.
 
 
 
; Mentors :
 
Brad Chapman (plus...)
 
 
 
 
 
=== Biogeographical and community phylogenetics for BioPython ===
 
 
 
(Note: this project is proposed by potential GSoC student [http://www.open-bio.org/wiki/User:Nmatzke Nick Matzke].)
 
 
 
; Rationale :
 
The field of phylogenetics has proliferated, and one new development is that large, phylogenetically explicit datasets are beginning to be used to answer questions about the relationships of ecological communities and biogeographic regions, instead of just individual clades.  The [http://www.phylodiversity.net/phylocom/ phylocom] package (Webb et al., 2008) contains fast C implementations of basic analyses such as alpha- and beta-phylodiversity (Net Related Index and Nearest Taxon Index).  The R package [http://picante.r-forge.r-project.org/ picante], funded by NESCent and Google Summer of Code 2008, contains utilities for processing phylocom inputs/outputs as well as additional tools for applied phylogenetics such as phylogenetic signal, phylosor (phylogenetic sorenon's index), and lineages-through-time plots.  These tools, developed for evolutionary community ecology, are useable in any context where a collection of lineages are undergoing cladogenesis, dispersal, and extinction in a series of containers (communities, biogeographic regions, gene families undergoing gene conversion, laterally transferring elements in unicell genomes, etc.)
 
 
 
The related field of phylogenetic or historical biogeography -- the estimation of the geographic location of ancestral lineages, the history of their dispersal, and the history of connectivity and vicariance between regions -- has also advanced with a variety of algorithms (Ronquist's Dispersal-Vicariance Analysis, [http://www.ebc.uu.se/systzoo/research/diva/diva.html DIVA]; [http://code.google.com/p/lagrange/ lagrange], a maximum likelihood method implemented in Python, [http://code.google.com/p/lagrange/ available online at Google Code]; [https://www.nescent.org/wg_EvoViz/GeoPhyloBuilder GeoPhyloBuilder], a NESCent-sponsored package for producing GIS files to display biogeographic history in Google Earth; [http://panbiog.infobio.net/croizat/ croizat], a panbiogeographical method and visualization package implemented in python using matplotlib's Basemap module; and older methods derived from traditional ancestral-state reconstruction).
 
 
 
; Approach :
 
 
 
Write BioPython modules/functions to:
 
 
 
("*" indicates some version of this already done independently by [http://www.open-bio.org/wiki/User:Nmatzke nmatzke])
 
 
 
* Improve BioPython's [[Bio.Nexus.Trees]] [[newick]] parser, which currently cannot successfully read the newick files output by Phylocom (although these files are read successfully by a variety of other programs and modules, e.g. [http://www-ab.informatik.uni-tuebingen.de/software/dendroscope/welcome.html Dendroscope], [http://pbil.univ-lyon1.fr/software/alfacinha/ alfacinha] python module).*
 
* Implement Cardona et al.'s [http://www.biomedcentral.com/1471-2105/9/532 Extended Newick format] for reticulating trees etc. (only exists in BioPerl currently)
 
* Develop a series of functions for processing phylocom inputs and outputs*
 
* Provide functions for basic community/geographic relatedness (e.g., NRI, NTI, phylosor)*
 
* Calculating these statistics for large phylogenies requires calculating/processing a large distance matrix with a C or java library*
 
* Basic graphics for analyzing community/regional phylogenetic history, e.g. lineage-through-time plots*
 
* Downloading sample location data from online databases (e.g. [http://www.gbif.org/ GBIF], although see [http://iphylo.blogspot.com/search?q=latitude here]), combine with phylogenies for input into lagrange, DIVA or other algorithms
 
* Re-creating DIVA in Python; the only available version is 12 years old and currently will only run on certain PCs
 
* Process output from DIVA, lagrange, etc., for display in GISs, Google Earth (KML files), and/or matplotlib's Basemap
 
  
; Challenges :
+
Before applying, please read our [[#What_should_prospective_students_know.3F|documentation on information that students should know and guidelines we expect you to follow]]. We also require that you include certain information, listed below, under "[[#When you apply|When you apply]]."
* Contacting & involving/getting feedback from authors of the mentioned packages (have been in contact with many of them already)
 
* Uncertainty, error, & missing data in geographic location databases (see [http://iphylo.blogspot.com/search?q=latitude here]), and flagging such
 
* Deciding the appropriate number of BioPython modules, etc. will require mentor advice
 
  
; Involved toolkits or projects :  
+
=== Staff and org Admins ===
* [http://biopython.org/wiki/Main_Page Biopython]
+
;Organization administrator: Eric Talevich ([mailto:eric&#46;talevich&#64;gmail&#46;com eric&#46;talevich&#64;gmail&#46;com])
* [http://www.biosql.org/wiki/Main_Page BioSQL]
+
;Backup administrator: Raoul Bonnal ([mailto:ilpuccio&#46;febo&#64;gmail&#46;com email]) (IRC: helius | channels: #obf-soc, #bioruby, #gsoc ) (Skype: ilpuccio)
* [http://www.python.org Python]
+
<!--
* others mentioned above
+
Other organisations relevant for bioinformatics students are: [http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013 Nescent], [https://github.com/SciRuby/sciruby/wiki/Google-Summer-of-Code-2013-Ideas SciRuby], [http://gmod.org/wiki/GSoC GMOD], who took on some of our projects and mentors.
  
; Degree of difficulty and needed skills :  
+
;2013 Organization administrator: [[User:PjotrPrins|Pjotr Prins]] ([mailto:pjotr.gsoc2013@thebird.nl pjotr.gsoc2013@thebird.nl])
Medium. Requires a familiarity with not just python/biopython but some unusual data formats and datasets, and packages, and integrating them (geographic, phylogenetic, metadata, etc.).  Must be familiar with evolution, phylogenetics, biogeography, and the statistical hazards from oversimple interpretations of these.
+
;Backup administrators: Chris Fields, [[User:Lapp|Hilmar Lapp]], Robert Buels
 +
-->
  
; Mentors :
+
=== Google Plus ===
[http://bcbio.wordpress.com/ Brad Chapman] (MGH; Biopython) (plus?  Various python/phylogenetics gurus at NESCent etc might be consulted)
 
  
=== phyloXML support in BioRuby ===
+
[https://plus.google.com/communities/103096212020630764091 OBF Summer of Code] on G+
  
; Rationale : Evolutionary trees are central to comparative genomics studies. Trees used in this context are usually annotated with a variety of data elements, such as taxonomic information, genome-related data (gene names, functional annotations) and gene duplication events, as well as information related to the evolutionary tree itself (branch lengths, support values). phyloXML is an XML data exchange standard that can represent this data. Trees in phyloXML format can be displayed and analyzed with [http://www.phylosoft.org/archaeopteryx/ Archaeopteryx] (the successor to [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/17/4/383 ATV]), which also allows manipulation and navigation of the tree. While tools exist to convert other formats (such as the widely used Newick and Nexus formats) to phyloXML, there is currently support for phyloXML in only one of the open source Bio* projects (in [http://www.bioperl.org/wiki/Phyloxml_Project_Demo BioPerl], as a result of Google's Summer of Code 2008).
+
=== Email ===
; Approach : Build phyloXML support in Ruby. More specifically, extend the open source BioRuby project to support phyloXML (BioRuby 1.3.0 has just been released). This will entail (i) the development of objects to represent all the elements of phyloXML (sequences, taxonomic data, annotations, etc), (ii) the development of a parser to read in phyloXML, and (iii) a phyloXML writer.
 
; Challenges : Relating the data elements specific to phyloXML to the tree classes already in BioRuby while maintaining the standards of the BioRuby project. Development of a time and memory efficient phyloXML parser (the parser has to be able to process trees with thousands of external nodes, at least).
 
; Involved toolkits or projects : [http://www.bioruby.org/ BioRuby],  [http://www.phyloxml.org phyloXML]
 
; Degree of difficulty and needed skills : Medium. Requires experience in an object oriented programming language (such as C++, Java, or, ideally, Ruby). Experience in genomics or a related biological field is also critical. Knowledge of  BioRuby will obviously help, as well as familiarity with XML.
 
; Mentors : Christian Zmasek (and anyone else who wants to help)
 
  
==Mentors==
+
For prospective students, the first point of contact should be the mailing list of the OBF project you are interested in working with:
  
* [http://bcbio.wordpress.com/ Brad Chapman] (MGH; Biopython)
+
;BioPerl: [mailto:bioperl-l@lists.open-bio.org bioperl-l@lists.open-bio.org]
* [[bp:User:Cjfields|Chris Fields]] (U. Illinois, Chicago; BioPerl)
+
;BioPython: [mailto:biopython@lists.open-bio.org biopython@lists.open-bio.org]
* [[bp:User:Majensen|Mark Jensen]] (Fortinbras; BioPerl)
+
;BioJava: [mailto:biojava-l@lists.open-bio.org biojava-l@lists.open-bio.org]
* [[bp:User:Rogerhall|Roger Hall]] (U. of Arkansas; BioPerl)
+
;BioRuby: [mailto:bioruby@lists.open-bio.org bioruby@lists.open-bio.org]
* [[User:Mauricio|Mauricio Herrera Cuadra]] (Yahoo! Inc.; backup org admin)
+
;BioSQL: [mailto:biosql-l@lists.open-bio.org biosql-l@lists.open-bio.org]
* [[User:Lapp|Hilmar Lapp]] (NESCent; org admin)
+
;BioLib: [mailto:biolib-dev@lists.open-bio.org biolib-dev@lists.open-bio.org]
* [http://thebird.nl/pjwiki/wiki.pl Pjotr Prins] (BioLib)
 
* [http://biojava.org/wiki:User:Mark Mark Schreiber] (Novartis Institute for Tropical Diseases, Singapore; BioJava)
 
* Joshua Udall (BioPerl)
 
* Jonathan Warren (Sanger Institute, UK; Biojava)
 
* Scooter Willis (Scripps Florida; Biojava)
 
* [http://monochrome-effect.net/ Christian Zmasek] (Burnham Institute for Medical Research; BioRuby)
 
  
== What should prospective students know? ==
+
Also, it would be a good idea to CC the organization administrator ([[User:EricTalevich|Eric Talevich]], [mailto:eric&#46;talevich&#64;gmail&#46;com eric&#46;talevich&#64;gmail&#46;com]), so he can make sure that you are properly taken care of!
 
 
=== Before you apply ===
 
  
* If you want to apply with your own idea, determine which O|B|F project you would be contributing to, and [[#Contact|contact us]] early on so we can try to find a mentor.
+
If you are not quite sure which project you would like to contribute to, you can email to the organization administrator for helpHowever, do not worry overly much about picking the right OBF project at the outset.  If you are unsure, simply make your best guess, and other members of the email list will help you to find the best organization to suit your idea.
* Our scope for proposals that we will entertain is those extend one of affiliated toolkits. Project proposals that would create a new stand-alone piece of code are outside of our scope.   
 
* We are most interested in students who give us evidence that they have already or might develop a sustained interest in becoming future contributors to one (or more) of our projects.
 
* [[#Contact|Ask us questions]] about the project idea you have in mind.
 
* Write a project proposal draft, include a project plan (see below), and [[#Contact|bounce those off of us]].
 
  
 +
=== IRC - Internet Relay Chat ===
  
Have I mentioned yet that you should [[#Contact|be in touch with us]] ''before'' you apply? The value of frequent and early communication in contributing to a distributed and collaboratively developed project can hardly be overemphasized. The same is true for becoming part of a community, even if only temporarily.  
+
OBF IRC channels are maintained on [http://freenode.net freenode], connect your IRC client to <code>chat.freenode.net</code>.
  
=== When you apply ===
+
;Main OBF GSoC Channel: <code>#obf-soc</code>
 +
;BioPerl: <code>#bioperl</code>
 +
;BioRuby: <code>#bioruby</code>
  
When applying, (aside from the information requested by Google) please provide the following in your application material.
+
Some mentors and developers can regularly be found on IRC, see the list of OBF projects below for information on which projects have a channel and the name of the channel. And/or join <code>#obf-soc</code> on [http://freenode.net Freenode.] ''(If you do not have an IRC client installed, you might find the [http://en.wikipedia.org/wiki/List_of_IRC_clients comparison on Wikipedia], the [http://directory.google.com/Top/Computers/Software/Internet/Clients/Chat/IRC/ Google directory], or the [http://www.ircreviews.org/clients/ IRC Reviews] helpful. For Macs, [http://en.wikipedia.org/wiki/X-Chat X-Chat Aqua] works pretty well. If you have never used IRC, try the [http://irchelp.org/irchelp/ircprimer.html IRC Primer] at [http://irchelp.org/ IRC Help], which also has links to lots of other material.)''
# Why you are interested in the project you are proposing, uniquely suited to undertake it, and what do you anticipate to gain from it.
 
# Why are you interested in contributing to the O|B|F project that your work would be (or become) a part of? To what extent and in which ways do you anticipate to stay involved with the project?
 
# A summary of your programming experience and skills.
 
# Programs or projects you have previously authored or contributed to, in particular those available as open-source, including, if applicable, any past Summer of Code involvement.
 
# A project plan for the project you are proposing, even if your proposed project is directly based on one of the ideas above.
 
#* A project plan in principle divides up the whole project into a series of manageable milestones and timelines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
 
#* Do not take this part lightly. A compelling plan takes a significant amount of work. Empirically, applications with no or a hastily composed project plan have not been competitive, and a more thorough project plan can easily make an applicant outcompete another with more advanced skills.
 
#* A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
 
#* We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
 
#* We strongly recommend that you bounce your proposed project and your project plan draft off of us, using either the pertinent developers mailing list or the IRC channel(s). Through the project plan exercise you will inevitably discover that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
 
# Your possibly conflicting obligations or plans for the summer during the coding period.
 
#* Although there are no hard and fast rules about how much you can do in parallel to your Summer of Code project, we do expect the project to be your primary focus of attention over the summer. If you look at your Summer of Code project as a part-time occupation, please don't apply to us.
 
#* That notwithstanding, if you have the time-management skills to manage other work obligations concurrent with your Summer of Code project, feel encouraged to make your case and support it with evidence.
 
#* Most important of all, be upfront. If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) it destroys our trust. Also, if you are accepted, don't take on additional obligations before discussing those with your mentor.
 
#* One of the most common reasons for students to struggle or fail is being overstretched. Don't set yourself up for that - at best it would greatly diminish the amount of fun you'll have with your Summer of Code project.
 
 
 
=== Other information ===
 
 
 
* Our [ 2009 application document] with Google's questions and our answers
 
* For questions of eligibility, see the [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_eligibility_83343977761348_13148542340972003 GSoC eligibility requirements for students]. These requirements must be met on April 20, 2009.
 
* There is also a [http://groups.google.com/group/google-summer-of-code-discuss Google group for posting GSoC questions] (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (and are not specific to our organization).
 
* Students receive a stipend from Google if accepted. See the [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_administrivia_842873138659_49145328697313184 Google SoC FAQ on payments] for full documentation.
 
 
 
== Reference Facts &amp; Links==
 
 
 
=== Open-Bio projects involved ===
 
 
 
; [[bp:Main Page|BioPerl]] :
 
:* [[bp:Becoming_a_developer|Information for new developers]]
 
:* [[bp:Mailing_lists|Mailing lists]]
 
:** [http://bioperl.org/mailman/listinfo/bioperl-l developers mailing list]
 
:* IRC: #bioperl on [http://freenode.net Freenode]
 
  
; [http://www.biojava.org Biojava] :
 
  
; [http://www.biopython.org Biopython] :
 
  
; [http://www.bioruby.org Bioruby] :
+
== Mentor Resources ==
  
; [http://biosql.org BioSQL] :
+
* [http://en.flossmanuals.net/GSoCMentoring/ GSoC Mentoring Guide]
 +
* [[Google_Summer_of_Code_Application_Evaluation|OBF Application Evaluation Guidelines]]
  
; [http://biolib.open-bio.org BioLib] :
+
== Scientific Achievements ==
 +
In this section we want to report all the scientific achievements of our community, scientific papers or grant funded project that used the tools developed during the Google Summer of Code over the years.
 +
* [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btv098 Sambamba: fast processing of NGS alignment formats.] Bioinformatics (2015) doi: 10.1093/bioinformatics/btv098
 +
* [http://csw.github.io/bioruby-maf/ bio-maf]: The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Ranzani V et al. Nat Immunol. 2015 Mar;16(3):318-25. doi: 10.1038/ni.3093. Epub 2015 Jan 26.
 +
* [http://www.biomedcentral.com/1471-2105/13/209 Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython.] BMC Bioinformatics 2012, 13:209  doi:10.1186/1471-2105-13-209
 +
* [http://www.open-bio.org/wiki/Google_Summer_of_Code_2014#Loris_Cro vcf-mongo ]: Gene2Farm and WHEALBI European Research projects
 +
* [http://news.open-bio.org/news/2015/02/obf-gsoc-2014-wrapup/ OBF-GSoC-2014-WrapUp] is rich of science activities and results.
 +
* [http://www.rcsb.org RCSB PDB] is the north american access point to the world wide protein data bank, and uses BioJava extensively
 +
* Publications using BioJava:
 +
** Prlić, Andreas, et al. "BioJava: an open-source framework for bioinformatics in 2012." Bioinformatics 28.20 (2012): 2693-2695.
 +
** Holland, Richard CG, et al. "BioJava: an open-source framework for bioinformatics." Bioinformatics 24.18 (2008): 2096-2097.
 +
** Pocock, Matthew, Thomas Down, and Tim Hubbard. "BioJava: open source components for bioinformatics." ACM Sigbio Newsletter 20.2 (2000): 10-12.
 +
*** 181 citations on Google Scholar
 +
** Myers-Turnbull, Douglas, et al. "Systematic Detection of Internal Symmetry in Proteins Using CE-Symm." Journal of Molecular Biology, (2014) 426:11 pp. 2255–2268.
 +
** Prlić, Andreas, et al. (2010) “Precalculated Protein Structure Alignments at the RCSB PDB website.” Bioinformatics 26(23), 2983-2985
 +
** Bliven, Spencer, et al. (2015) "Detection of circular permutations within protein structures using CE-CP Bioinformatics." Bioinformatics. In press.
 +
** Aerts, Stein, et al. "Toucan: deciphering the cis‐regulatory logic of coregulated genes." Nucleic acids research 31.6 (2003): 1753-1764.
 +
** Vaida, Mircea-Florin, Radu Terec, and Lenuta Alboaie. "Alternative DNA Security Using BioJava." Digital Information and Communication Technology and Its Applications. Springer Berlin Heidelberg, 2011. 455-469.
 +
** Ross, Christian, and Qingxi J. Shen. "Computational prediction and experimental verification of HVA1-like abscisic acid responsive promoters in rice (Oryza sativa)." Plant molecular biology 62.1-2 (2006): 233-246.
 +
** Finak, G., et al. "BIAS: bioinformatics integrated application software." Bioinformatics 21.8 (2005): 1745-1746.
 +
** Aerts, Stein, et al. "A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes." Bioinformatics 20.12 (2004): 1974-1976.
 +
** Hanganu, A. N. D. R. E. I., et al. "SLIDE: An interactive threading refinement tool for homology modeling." Rom J Biochem 1009.46: 123-127.
 +
** Kaladhar, DSVGK. "BioJava: A Programming Guide." (2012). LAP Lambert Academic Publishing , Germany. ISBN:3659167509 9783659167508
 +
** Prins, J. C. P. "BioLib: Sharing high performance code between BioPerl, BioPython, BioRuby, R/Bioconductor and BioJAVA." 17th Annual International Conference on Intelligent Systems for Molecular Biology, Stockhol, Sweden, June 27-July 2, 2009. 2009.
 +
** Tang, Si-Xin, Yi-Bing Li, and Hong-Bo He. "Designing a BioJava-based Software for RNA Sequence Analysis." Journal of Luoyang Institute of Technology 6 (2005): 016.
 +
** Mangalam, Harry. "The Bio* toolkits—a brief overview." Briefings in bioinformatics 3.3 (2002): 296-302.
 +
** Ryu, Taewan. "Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for primitive bioinformatics tasks and choosing a suitable language." International Journal of Contents 5.2 (2009): 6-15.
 +
** McGuffee, James W. "Programming languages and the biological sciences." Journal of Computing Sciences in Colleges 22.4 (2007): 178-183.
  
; [http://emboss.sourceforge.net/ EMBOSS] :
+
== Previous Years ==
  
=== [http://socghop.appspot.com/ Google Summer of Code 2009] ===
+
This section contains links to content related to OBF's participation
 +
in GSoC in previous years.
 +
* [[Google_Summer_of_Code_2014|2014]] - 6 student projects
 +
* Google Summer of Code 2013 - OBF not accepted, some Bio* projects partnered with other organisations
 +
* [[Google_Summer_of_Code_2012|2012]] - 5 student projects
 +
* [[Google_Summer_of_Code_2011|2011]] - 6 student projects
 +
* [[Google_Summer_of_Code_2010|2010]] - 6 student projects
  
* Mentoring organizations apply between March 9-13, 2009. Accepted mentoring organizations will be published March 18. See [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_timeline_5354032302481437_ full set of timelines].
 
* Google expects to accept around 150 mentoring organizations, a bit less than in 2008 (when they accepted 175). If the trend over the past years is any indication, this will be out of at least 3x as many organizations that apply.
 
* Students apply between March 23-April 3, 2009. The [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_eligibility_83343977761348_13148542340972003 eligibility requirements for students] are in the GSoC FAQ.
 
* [http://code.google.com/opensource/gsoc/2009/faqs.html#0_1_development_where_91701355_4247830955169275 Development occurs on-line], there is no requirement or expectation to travel, neither for students nor for mentors.
 
  
[[Category:Summer of Code]]
+
[[Category:Google Summer of Code]]

Latest revision as of 06:39, 18 February 2016

GSoC15-logo-small.jpg

Google Summer of Code (GSoC) is a student internship program for open-source projects. The program offers eligible student developers stipends to write code for open source projects over a period of 3 summer months ("flip bits, not burgers"). See the Google Summer of Code Main Site for general information about the Google Summer of Code program, how to apply, frequently asked general questions, and more.


GSoC 2016

The Google Summer of Code 2016 is ON! OBF is once again applying as a GSoC mentoring organization this year. Interested mentors and students should subscribe to the OBF/GSoC mailing list. Please announce yourself, so we know who you are! The details of each of our project ideas are listed below, including potential mentors.

See http://obf.github.io/GSoC/ for more information about the GSoC program and additional ways to get in touch with us.

Facts & Links

Time Line 
GSoC 2016 FAQ 
Info from Google 
  • There is also a Google group for posting GSoC questions (and receiving answers; note that you will need to sign up for the group) that relate to the program itself (and are not specific to our organization).
  • Students receive a stipend from Google if accepted. See the GSoC 2016 FAQ for full documentation.
  • Development is done entirely remotely and on-line; there is no requirement or expectation for either students or mentors to travel.

Why apply?

One of the most important features of the program is that students are paired with mentors, who are typically experienced developers from the project to which the student is contributing. The mentor guides the student to work productively within the community, and helps the student avoid obstacles and pitfalls. The program is global - students and mentors may be located anywhere where they have an internet connection (except for countries affected by US trade restrictions), and no travel is required. Thus, aside from the stipend and mentorship aspects, the student's experience in the internship closely mirrors normal work on distributed development projects. Effective work habits for distributed development are typically not taught as part of computer science curricula, yet are highly desired in the increasingly global and distributed software, IT, and biotechnology industries.

From the viewpoint of each open-source project, the program not only offers to pay students for contributing, but more importantly, offers an opportunity to recruit new developers who will hopefully go on to become regular, sustaining contributors.

Project Ideas

Our GSoC ideas from each project are collected here: http://obf.github.io/GSoC/

OBF Projects Accepting Applicants

BioPerl logo tiny.jpg
BioPerl 
Biopython logo tiny.png
BioPython 
Biojava logo tiny.jpg
BioJava 
BioRuby logo tiny.png
BioRuby 
BioSQL logo.png
BioSQL 
BioHaskell 
Biocaml 

Guide for prospective GSoC students

Before you apply

  • Proposals should extend one of affiliated toolkits, not start a new project.
  • If you want to apply with your own idea, it's best to contact the OBF subproject you're interested in well before the application deadline, so we can work with you to find a mentor and solidify your project idea and application.
  • Ask us questions on the subproject mailing lists about the project idea you have in mind.
  • Write a project proposal draft, include a project plan (see below), and send it to a project mailing list for comments before submitting it.

Again, students are strongly encouraged to contact us as early as possible. Frequent and early communication is extremely valuable for putting together successful projects.

When you apply

When applying, (aside from the information requested by Google) please provide the following in your application material.

  1. Your complete contact information, including full name, physical address, preferred email address, and telephone number, plus other pertinent contact information such as IRC handles, etc.
  2. Why you are interested in the project you are proposing and are well-suited to undertake it.
  3. A summary of your programming experience and skills.
  4. Programs or projects you have previously authored or contributed to, in particular those available as open-source, including, if applicable, any past Summer of Code involvement.
  5. A project plan for the project you are proposing, even if your proposed project is directly based on one of the proposed project ideas for member projects.
    • A project plan in principle divides up the whole project into a series of manageable milestones and time-lines that, when all accomplished, logically lead to the end goal(s) of the project. Put in another way, a project plan explains what you expect you will need to be doing, and what you expect you need to have accomplished, at which time, so that at the end you reach the goals of the project.
    • Do not take this part lightly. A compelling plan takes a significant amount of work. Empirically, applications with no or a hastily composed project plan have not been competitive, and a more thorough project plan can easily make an applicant out compete another with more advanced skills.
    • A good plan will require you to thoroughly think about the project itself and how one might want to go about the work.
    • We don't expect you to have all the experience, background, and knowledge to come up with the final, real work plan on your own at the time you apply. We do expect your plan to demonstrate, however, that you have made the effort and thoroughly dissected the goals into tasks and successive accomplishments that make sense.
    • We strongly recommend that you bounce your proposed project and your project plan draft off of us, using either the pertinent developers mailing list or the IRC channel(s). Through the project plan exercise you will inevitably discover that you are missing a lot of the pieces - we are there to help you fill those in as best as we can.
  6. Any obligations, vacations, or plans for the summer that may require scheduling during the GSoC work period.
    • We expect the your GSoC project to be your primary focus over the summer. It should not be regarded as a part-time occupation.
    • If you feel that you can manage other work obligations concurrently with your Summer of Code project, make your case and support it with evidence.
    • Be honest and open. If it turns out later that you weren't clear about other obligations, at best (i.e., if your accomplishment record at that point is spotless) our trust in you will be severely degraded. Also, if you are accepted, discuss with your GSoC mentor before taking on additional obligations.
    • One of the most common reasons for students to struggle or fail is being overcommitted. Do not set yourself up for failure! GSoC summers should be fun and rewarding!

Student Progress Reports

In addition to writing code, accepted students send weekly updates to the OBF community on their project's progress. These updates allow us to keep aware of how GSoC students are doing, give students a forum to ask any questions, and promote overall community bonding.

At the beginning of the summer, we ask that you set up a blog for the GSoC project (or a category/tag on your existing blog) which you will use to summarize your progress every week, as well as longer posts about your work if you'd like. (See these examples from 2013.)

Then, at the start of each week:

  1. Post an update on your blog: What did you do last week? What do you plan to do this week? Do you have any unanswered questions, any unsolved problems from the last week, interesting observations or anything else you'd like to mention?
  2. Email the URL and text of the post (or a short summary) to the host project's mailing list (your mentors will confirm which one to use) and the main OBF GSoC mailing list (gsoc@lists.open-bio.org).

You will be writing under your own name, but with a clear association with your mentors, the OBF and its projects, so please take this seriously and be professional. Remember that your blog will be one of the first things found by anyone interested in the project you're working on, and can be a valuable resource to them — as well as a significant part of your online presence.

Contact

Before applying, please read our documentation on information that students should know and guidelines we expect you to follow. We also require that you include certain information, listed below, under "When you apply."

Staff and org Admins

Organization administrator
Eric Talevich (eric.talevich@gmail.com)
Backup administrator
Raoul Bonnal (email) (IRC: helius | channels: #obf-soc, #bioruby, #gsoc ) (Skype: ilpuccio)

Google Plus

OBF Summer of Code on G+

Email

For prospective students, the first point of contact should be the mailing list of the OBF project you are interested in working with:

BioPerl
bioperl-l@lists.open-bio.org
BioPython
biopython@lists.open-bio.org
BioJava
biojava-l@lists.open-bio.org
BioRuby
bioruby@lists.open-bio.org
BioSQL
biosql-l@lists.open-bio.org
BioLib
biolib-dev@lists.open-bio.org

Also, it would be a good idea to CC the organization administrator (Eric Talevich, eric.talevich@gmail.com), so he can make sure that you are properly taken care of!

If you are not quite sure which project you would like to contribute to, you can email to the organization administrator for help. However, do not worry overly much about picking the right OBF project at the outset. If you are unsure, simply make your best guess, and other members of the email list will help you to find the best organization to suit your idea.

IRC - Internet Relay Chat

OBF IRC channels are maintained on freenode, connect your IRC client to chat.freenode.net.

Main OBF GSoC Channel
#obf-soc
BioPerl
#bioperl
BioRuby
#bioruby

Some mentors and developers can regularly be found on IRC, see the list of OBF projects below for information on which projects have a channel and the name of the channel. And/or join #obf-soc on Freenode. (If you do not have an IRC client installed, you might find the comparison on Wikipedia, the Google directory, or the IRC Reviews helpful. For Macs, X-Chat Aqua works pretty well. If you have never used IRC, try the IRC Primer at IRC Help, which also has links to lots of other material.)


Mentor Resources

Scientific Achievements

In this section we want to report all the scientific achievements of our community, scientific papers or grant funded project that used the tools developed during the Google Summer of Code over the years.

  • Sambamba: fast processing of NGS alignment formats. Bioinformatics (2015) doi: 10.1093/bioinformatics/btv098
  • bio-maf: The long intergenic noncoding RNA landscape of human lymphocytes highlights the regulation of T cell differentiation by linc-MAF-4. Ranzani V et al. Nat Immunol. 2015 Mar;16(3):318-25. doi: 10.1038/ni.3093. Epub 2015 Jan 26.
  • Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics 2012, 13:209 doi:10.1186/1471-2105-13-209
  • vcf-mongo : Gene2Farm and WHEALBI European Research projects
  • OBF-GSoC-2014-WrapUp is rich of science activities and results.
  • RCSB PDB is the north american access point to the world wide protein data bank, and uses BioJava extensively
  • Publications using BioJava:
    • Prlić, Andreas, et al. "BioJava: an open-source framework for bioinformatics in 2012." Bioinformatics 28.20 (2012): 2693-2695.
    • Holland, Richard CG, et al. "BioJava: an open-source framework for bioinformatics." Bioinformatics 24.18 (2008): 2096-2097.
    • Pocock, Matthew, Thomas Down, and Tim Hubbard. "BioJava: open source components for bioinformatics." ACM Sigbio Newsletter 20.2 (2000): 10-12.
      • 181 citations on Google Scholar
    • Myers-Turnbull, Douglas, et al. "Systematic Detection of Internal Symmetry in Proteins Using CE-Symm." Journal of Molecular Biology, (2014) 426:11 pp. 2255–2268.
    • Prlić, Andreas, et al. (2010) “Precalculated Protein Structure Alignments at the RCSB PDB website.” Bioinformatics 26(23), 2983-2985
    • Bliven, Spencer, et al. (2015) "Detection of circular permutations within protein structures using CE-CP Bioinformatics." Bioinformatics. In press.
    • Aerts, Stein, et al. "Toucan: deciphering the cis‐regulatory logic of coregulated genes." Nucleic acids research 31.6 (2003): 1753-1764.
    • Vaida, Mircea-Florin, Radu Terec, and Lenuta Alboaie. "Alternative DNA Security Using BioJava." Digital Information and Communication Technology and Its Applications. Springer Berlin Heidelberg, 2011. 455-469.
    • Ross, Christian, and Qingxi J. Shen. "Computational prediction and experimental verification of HVA1-like abscisic acid responsive promoters in rice (Oryza sativa)." Plant molecular biology 62.1-2 (2006): 233-246.
    • Finak, G., et al. "BIAS: bioinformatics integrated application software." Bioinformatics 21.8 (2005): 1745-1746.
    • Aerts, Stein, et al. "A genetic algorithm for the detection of new cis-regulatory modules in sets of coregulated genes." Bioinformatics 20.12 (2004): 1974-1976.
    • Hanganu, A. N. D. R. E. I., et al. "SLIDE: An interactive threading refinement tool for homology modeling." Rom J Biochem 1009.46: 123-127.
    • Kaladhar, DSVGK. "BioJava: A Programming Guide." (2012). LAP Lambert Academic Publishing , Germany. ISBN:3659167509 9783659167508
    • Prins, J. C. P. "BioLib: Sharing high performance code between BioPerl, BioPython, BioRuby, R/Bioconductor and BioJAVA." 17th Annual International Conference on Intelligent Systems for Molecular Biology, Stockhol, Sweden, June 27-July 2, 2009. 2009.
    • Tang, Si-Xin, Yi-Bing Li, and Hong-Bo He. "Designing a BioJava-based Software for RNA Sequence Analysis." Journal of Luoyang Institute of Technology 6 (2005): 016.
    • Mangalam, Harry. "The Bio* toolkits—a brief overview." Briefings in bioinformatics 3.3 (2002): 296-302.
    • Ryu, Taewan. "Benchmarking of BioPerl, Perl, BioJava, Java, BioPython, and Python for primitive bioinformatics tasks and choosing a suitable language." International Journal of Contents 5.2 (2009): 6-15.
    • McGuffee, James W. "Programming languages and the biological sciences." Journal of Computing Sciences in Colleges 22.4 (2007): 178-183.

Previous Years

This section contains links to content related to OBF's participation in GSoC in previous years.

  • 2014 - 6 student projects
  • Google Summer of Code 2013 - OBF not accepted, some Bio* projects partnered with other organisations
  • 2012 - 5 student projects
  • 2011 - 6 student projects
  • 2010 - 6 student projects