Please visit our ***NEW*** OBF/BOSC website: https://www.open-bio.org/

Google Summer of Code 2015 Ideas

From Open Bioinformatics Foundation

Revision as of 12:13, 18 February 2015 by Christian Höner zu Siederdissen (talk)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

The details of each of our project ideas are listed below, including potential mentors. Interested mentors and students should subscribe to the OBF/GSoC mailing list and announce their interest.

See the main OBF Google Summer of Code page for more information about the GSoC program and additional ways to get in touch with us.

1 Cross-project ideas
2 BioPerl
- 2.1 TITLE
3 BioJava and JSBML
- 3.1 TITLE
4 BioPython
- 4.1 TITLE
5 BioRuby
- 5.1 TITLE
6 BioHaskell
- 6.1 Fast k-mer indexing
- 6.2 Low-level bit and stream-fusion optimizations
7 Biocaml
- 7.1 TITLE
8 Candidate a new project for OBF
- 8.1 TITLE

Cross-project ideas

OBF is an umbrella organization which represents many different programming languages used in bioinformatics. In addition to working with each of the "Bio*" projects (listed below), this year we are also accepting a category of "cross-project" ideas that cover multiple programming languages or projects. These collaborative ideas are broadly defined and can be thought of as "unfinished" — interested students should adapt the ideas to their own strengths and goals, and are responsible for the quality of the final proposed idea in their application.

Feel free to propose your own entirely new idea. You can also draw ideas from Genome Informatics (GMOD) and the National Evolutionary Synthesis Center (NESCent).

Provide Nextflow with a GUI based on NoFlo UI

Rationale: Nextflow mission is to bridge the gap between state-of-art industry driven

computational tools and computation research requirements. The goal is to make data analysis, coming out of next generation sequence technologies and not only, an easy thing to do by all researchers. It follows the UNIX philosophy where many small tools can be composed together to create efficient computational solutions where individual parts can be easily replaced and it has been designed to allow developers to fast prototyping application reusing their existing tools and scripts. For this reasons it has been developed primarily as a command line oriented tool. NoFlo (http://noflojs.org/noflo-ui/) is a flow-based programming (FBP) framework that makes software creation more accessible and collaborative. It provides an interactive interface which allows you to create your computation workflow map by dragging, dropping and connecting the different task components. It basically allows you to draw your computational map, similar in a sense to that of a subway map. This "map" can then be more easily understood, shared and curated by other scientists, compared to endless files of source code.

Approach: The goal of the this proposal is to integrate NoFlo with Nextflow in order

to provide the latter with a presentation layer that would allow researchers to "draw" their computational pipelines instead of programming them, making easier to handle and communicate complex tasks interaction in their application logic.

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

Low latency scheduling and in-memory data processing

Rationale: Nextflow does not implement a task scheduling strategy on its own and

delegates it to the underlying processing infrastructure, which in most cases is a grid engine like technology. However these platforms were designed for job processing in a batch scheduling fashion, i.e few long duration jobs scheduled sequentially to the computer-cluster facility. This approach suffers of very high latencies that makes it unfit to highly parallel and short-lived jobs that are more and more commons in bioinformatics data analysis.

Approach: The goal of this proposal is integrating Sparrow scheduler (

https://github.com/radlab/sparrow) and Tachyon in-memory file system ( https://github.com/amplab/tachyon) with Nextflow.

The first is a high throughput, low latency distributed cluster scheduler, while the second is a memory centric distributed file system. Both of them are open source research project developed at UC Berkeley.

The integration of these technologies would allow Nextflow to manage large distributed workloads in a more efficient and timely manner, to decrease tasks granularity in Nextflow applications, gaining an higher parallelism degree and better applications performance.

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

BioPerl

Mailing lists
IRC: #bioperl on Freenode
Information for new developers
Source code browser for bioperl-live (the main BioPerl code base), and all BioPerl sub-projects
Priority list of things that need work, as another source for student-conceived project ideas

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

BioJava and JSBML

BioJava developer mailing list
JSBML developer mailing list
BioJava modules as another source for student-conceived project ideas
Source code for biojava-live (the main BioJava code base) and all BioJava sub-projects

For GSoC 2014, BioJava is partnering with the Systems Biology Markup Language (SMBL) team to bring enhancements to JSBML, the standard Java implementation of SBML, and bring SBML features to other Java-based systems biology software. See the SMBL website for more ideas from the SBML team.

Students working on these projects will interact with both the BioJava and JSBML communities, which overlap. Most development will happen on the JSBML codebase, although BioJava is used as a supporting library for some components.

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

BioPython

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

BioRuby

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

BioHaskell

Biohaskell has its own gsoc page. We currently have 2 (+1) open problems listed there. In addition, we accept peoples' own ideas and have a number of open problems not listed. The latter fall somewhere between bachelors thesis and PhD work and are harder to nicely package up.

Fast k-mer indexing

Fast k-mer indexing requires a data structure mapping a short string of k characters to a value. While trivial to do with almost all key-value maps, we also require a very memory-efficient storage system. Knowledge of suffix structures is a definite plus.

Mentors: Ketil Malde?

Low-level bit and stream-fusion optimizations

In Haskell, we typically don't talk that much about low-level implementation details. For some algorithms, low-level details (especially bitwise operations and SIMD instructions) become important. I have a library dealing with bitsets but it is not yet fully efficient. Getting SIMD instructions to play nice with /generic/ DP recursion schemes is probably really hard.

Mentors: Christian Hoener zu Siederdissen

Biocaml

Mailing List

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

Candidate a new project for OBF

Please if you want to be part of OBF but you porject is not yet listed above, contact us and let us know about your project and your proposal.

TITLE

Rationale: TEXT HERE TEXT HERE

Approach: TEXT HERE TEXT HERE

Languages and skill: TEXT HERE TEXT HERE

Code: TEXT HERE TEXT HERE

Mentors: TEXT HERE TEXT HERE

Retrieved from "https://www.open-bio.org/w/index.php?title=Google_Summer_of_Code_2015_Ideas&oldid=3877"

Google Summer of Code 2015 Ideas

Contents

Cross-project ideas

Provide Nextflow with a GUI based on NoFlo UI

Low latency scheduling and in-memory data processing

TITLE

BioPerl

TITLE

BioJava and JSBML

TITLE

BioPython

TITLE

BioRuby

TITLE

BioHaskell

Fast k-mer indexing

Low-level bit and stream-fusion optimizations

Biocaml

TITLE

Candidate a new project for OBF

TITLE

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

general

Tools