Galaxy Admin 2020 and beyond (guest post by OBF Travel Award recipient Michael Thompson)

The Open Bioinformatics Foundation (OBF) sponsors a Travel Fellowship program aimed at increasing diverse participation at events promoting Open Source bioinformatics software development and open science in the biological research community. Michael Thompson’s participation at the Galaxy Admin Training 2020 workshop in Barcelona was supported by this fellowship. Find more information here.

I had the opportunity to visit the Barcelona Supercomputing Center (B.S.C) in Spain from 2nd - 6th March 2020 to participate in the Galaxy Admin Training 2020, organized by Galaxy Europe and in partnership with B.S.C, Elixir, and de.NBI.

Purpose

The reason for attending was to gain the skill-set required to deploy and administer Galaxy within my university ( Kwame Nkrumah University of Science and Technology) which currently has a small group of students and researchers involved in Bioinformatics. I have had previous experience with HPC applications although this was my first for Galaxy. Our university’s deployment of Galaxy is also intended to be open to any researcher within my country, Ghana.

The Event

The event was very well organized with training provided by Helena Rasche (Galaxy Europe), Nate Coraor (Galaxy Project, Penn State University, U.S.A), Marius van den Beek (Galaxy Project, Penn State University, Europe), Saskia Hiltemann (Erasmus Medical Center, the Netherlands), and Nicola Soranzo (Earlham Institute).

The training materials are available here.

On the first day, after registration, we dived straight into the setup/installation using Ansible. It was followed by a talk on the advanced setup of the system and configuration of tool-sheds. Later that day, we had a tour of the Barcelona Supercomputing Center to see MareNostrum – a supercomputer with a peak performance of 11.15 Petaflops.

Participants receiving a lecture on the architecture of MareNostrum.

A guided tour of MareNostrum.

The second day of the training hosted talks on deploying scientific tools using Ephemeris, configuring and using authentication methods, access to reference data for scientific analysis, configuring job scheduling, and connecting Galaxy to your existing HPC cluster.

On day three, we continued with more about job scheduling and connecting to compute clusters, connecting to remote clusters using Pulsar, storage management, making queries with Gxadmin, and monitoring with Telegraf, InfluxDB and Grafana.

The training session.

The fifth day was about using interactive tools, development (deploying your own tools), build automation and advanced customization of the software (user interface). There was a session on Training As A Service (TiaaS) – a feature that allows you to create small groups of dedicated resources within Galaxy for running training sessions in a manner that isolates itself from the production work on the same platform.

On the last day, we had talks on how to deal with issues when they arise, management of different python versions, developing tools using Planemo. I had to leave before the very last talk about creating tutorials from the training resources provided.

Takeaways

Every part of the event had hands-on training exercises. During these exercises, the trainers did a very good job of sharing their experiences. These experiences – what I call ‘street wisdom ‘– were useful in situating the exercises in real-life scenarios.

Particularly, I found the sessions on tool development, generating queries and monitoring very useful. The ability to integrate existing tools would enable the platform to support the different needs of the user community. The monitoring and reporting utilities also provide an evidence-based and transparent approach to evaluating the usage of resources. This is important to demonstrate to our funders (the university in this case) how the facility is used and, when necessary, to make a case for upgrades or expansion.

Everything about Galaxy is Ansible! It is an example of very extensive use of automation without which management and maintenance would far more tedious. It enforces the DRY (Don’t Repeat Yourself) philosophy; the packaging and documentation of the entire software platform is an encouragement to use configuration management software extensively (and in all types of large software deployments).

Although a large number of the tools available on Galaxy are for use within the field of Bioinformatics, the platform has been designed to enable it to run almost any kind of tool. This means that one deployment of the platform can serve different scientific user groups if the available tools can be developed and integrated into the platform. To facilitate this, there is extensive tool development documentation. I find this particularly important in scenarios where there may not be a lot of resources (infrastructure, human, funding) to run different types of HPC platforms – some HPC/HTC installations out there are very limited, especially in developing countries. In my opinion, Galaxy alone, with a bit of effort in tool development, can be used to support a wide range of disciplines.

What Next

The next few months will involve planning and organizing to deploy Galaxy for use locally. I am confident we would have some success stories and, hopefully, interesting use cases of the platform to share in another blog post later.

Workshop participants at BSC.

Thanks to OBF!

The opportunity to participate in the Galaxy Admin 2020 Training in Barcelona would not have been possible without the travel fellowship from OBF. I have had the opportunity to connect with different Galaxy Admins and to join a community of people enthusiastic about providing support to the scientific community. Many thanks to OBF!

I am an IT Manager at the University Information Technology Services (U.I.T.S) of the Kwame Nkrumah University of Science and Technology (K.N.U.S.T). I am also a member of the H3ABioNet project.