Mélanie Courtot

Mélanie Courtot (Ontario Institute for Cancer Research)

The Data Shows We Need Better Data

Big data, AI, LLMs… do they live up to the hype? In a bright and hopeful future, AI accelerates progress, revolutionizes healthcare, alerts us to health risks, and creates fresh career paths. Yet, in a bleaker outlook, it obliterates jobs, fosters rampant misinformation and increases inequity.

At the root of AI is the data it relies on. In this talk we will discuss how to steer the course by improving the data AI leverages. We will explore the vast ecosystem formed by data, projects and infrastructure. We will travel along different axes to think about the data we are generating and using every day. We will consider data governance – where does it come from, who owns it, and how can we access it? We will investigate open data – how can we leverage health care knowledge for research? Finally, we will share a few thoughts about data quality and data sharing to increase reproducibility and reuse.

Dr Mélanie Courtot is the Director of Genome Informatics at the Ontario Institute for Cancer Research in Toronto, and an Assistant Professor in the Department of Medical Biophysics at University of Toronto. Dr Courtot is passionate about translational informatics – building intelligent systems to gain new insights and impact human health. Her lab aims to build a globally shared knowledge ecosystem to advance science and improve health for all. Her team develops the Overture open source software suite, which supports many active large-scale cancer genomics projects including ICGC and ICGC-ARGO, VirusSeq, and the upcoming Pan-Canadian Genome Library. It also drives the African Pathogen Data Sharing and Archive Platform.

Dr Courtot obtained her PhD in Bioinformatics from the University of British Columbia in 2014, followed by a postdoctoral fellowship in Public Health. Dr Courtot co-leads the Clinical and Phenotypic workstream and Data Use and Cohort representation groups for the Global Alliance for Genomics and Health (GA4GH) as well as cohort harmonization efforts for the International HundredK+ Cohorts Consortium. She is an advisory board member for the Public Health Alliance for Genomic Epidemiology coalition, European Open Science Cloud for Cancer project and the eLwazi open data science platform.

Andrew Su

Andrew Su (Scripps Research Institute)

Open Data, Knowledge Graphs, and Large Language Models

Bioinformatics is the science of collecting, storing, analyzing, and disseminating biological data and information. As in most domains of data science, bioinformaticians have long focused on structured data – information that is represented using ontologies and controlled vocabularies in well-defined data formats and often stored in databases with predefined schemas. This focus on structured data over the last 30 years has been the most efficient way to convert information into testable hypotheses and new scientific insights.

Recent developments in artificial intelligence, particularly the advent of large language models (LLMs), have started to challenge this traditional focus on structured data. By utilizing massive training sets of unstructured text, LLMs have shown exceptional capabilities not only in tasks like question answering and text generation but also in summarization, translation, and code generation. In this presentation, we will examine how LLMs are changing and will continue to change the practice of bioinformatics, particularly at the interface between structured and unstructured data.

Andrew Su, Ph.D., is the Elden and Verna Strahm Professor at the Scripps Research Institute in the Department of Integrative Structural and Computational Biology (ISCB). Dr. Su earned his PhD in chemistry at Scripps Research in 2002, and was the Associate Director of Bioinformatics at The Genomics Institute of the Novartis Research Foundation (GNF) before returning to Scripps Research as a faculty member in 2011.

The Su lab focuses on building and applying bioinformatics infrastructure for biomedical discovery. Dr. Su has had a long-standing interest in leveraging crowdsourcing to organize and integrate knowledge though projects like the Gene Wiki and Wikidata. In partnership with Chunlei Wu’s lab, he has also worked extensively on creating biomedical APIs and enabling API interoperability through the BioThings project. Most recently, his lab has a particular emphasis on constructing and mining knowledge graphs for drug repurposing. In all this work, the Su lab has embraced the principles of open science, open data, and open source software.

BOSC keynote speaker selection process

BOSC usually includes two or three keynote talks given by prominent individuals or emerging leaders who are accomplished in areas relevant to the bioinformatics open source community and who represent a diversity of backgrounds and ideas. Please see our invited speaker rubric for more information about our keynote speaker selection process and criteria.