Date: Monday, April 27, 2026
Location: Room 211, Riocentro Convention and Event Center
| Time | Duration | Event |
|---|---|---|
| 9:00 - 9:05 AM | 5 mins | Welcoming Remarks (Organizers) |
| 9:05 - 10:00 AM | 55 mins |
Sarah Teichmann (University of Cambridge) Plenary Talk: Mapping the Human Body One Cell at a Time Abstract: The 37 trillion cells of the human body have a remarkable array of specialised functions and must cooperate in time and space to construct a functioning human. Combining single cell and spatial genomics with AI/ML tools and data science, my lab has been attempting to understand this cellular diversity, how it is generated during development and how it goes wrong in disease. My talk will present recent advances towards a virtual Human Cell Atlas (HCA). I will highlight methods we developed to integrate and annotate cell atlases, and discuss how these models can be used to disentangle the effect of covariates, predict unseen perturbations and decode the regulatory logic of human cell types. I will additionally describe GenAI-based approaches we developed to model the spatial organization of cell types within organs, which can be leveraged to predict the effect of a cell on its neighbours. |
| 10:00 - 10:15 AM | 15 mins | Coffee Break |
| 10:15 - 10:55 AM | 40 mins |
Invited Keynote: Generative AI for Multiscale Biology Abstract: The complexity of biological systems arises from interactions across multiple molecular layers. Understanding these relationships requires models that can integrate high-dimensional, multimodal data and capture how cells behave, interact, and respond to perturbations. In this talk, I will present a suite of generative AI methods we developed to combine information across omics modalities with the aim to uncover principles of cellular organization and bridge molecular and tissue scales. I will demonstrate how these approaches can help to learn the rules governing spatial tissue structure and reveal links between cellular morphology and molecular state. I will also discuss key limitations of current AI methods in biology, along with ongoing benchmarking efforts and agentic frameworks built on top of these models. Finally, I will argue that challenging biomedical problems do not just benefit from advances in AI but they actively drive the development of new AI methodologies. |
| 10:55 - 11:05 AM | 10 mins |
Mithril: Sponsor Talk |
| 11:05 - 11:15 AM | 10 mins |
Spotlight 1 GPC - Deep generative model of genetic variation data improves imputation accuracy in private populations Prateek Anand ⋅ Anji Liu ⋅ Meihua Dang ⋅ Boyang Fu ⋅ Xinzhu Wei ⋅ Guy Van den Broeck ⋅ Sriram Sankararaman |
| 11:15 - 11:25 AM | 10 mins |
Spotlight 2 Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain for Molecule and Biological Sequence Design Prin Phunyaphibarn ⋅ Minhyuk Sung |
| 11:25 - 11:35 AM | 10 mins |
Spotlight 3 Effects of Distance Metrics and Scaling on the Perturbation Discrimination Score Qiyuan Liu ⋅ Qirui Zhang ⋅ Jin-Hong Du ⋅ Siming Zhao ⋅ Jingshu Wang |
| 11:35 - 12:20 PM | 45 mins |
Panel Discussion
Moderator: Arnaud Doucet |
| 12:20 - 1:10 PM | 50 mins | Lunch Break |
| 1:10 - 1:55 PM | 45 mins | Poster Session #1 (Accepted submissions) |
| 1:55 - 2:35 PM | 40 mins |
Nic Fishman (Harvard University) Invited Keynote: Generative Modeling in Distribution Space Abstract: Learning from distributions rather than individual samples is increasingly important in science, where the objects of interest are often populations observed under a condition, time point, or perturbation. We formalize distribution embeddings: representations of distributions learned from samples by an encoder satisfying a distributional invariance property, which we show yields a central limit theorem for the encoder at the population level. This guarantees that embeddings concentrate around a well-defined population value and that plug-in losses become asymptotically normal estimators. Supervised against labels, distribution embeddings give a standard learning problem over distributions. We introduce Generative Distribution Embeddings (GDE), which learn distribution representations without labels. A distributionally invariant encoder is coupled with a conditional generator and trained so that samples from the generator, conditioned on an embedding, recover the encoded distribution. This enables downstream tasks over distributions — prediction of distribution-level attributes, generation of new samples from a represented distribution, and interpolation between distributions in latent space — to be solved from samples alone. GDE also yields a semi-supervised regime: when labels are available for only some distributions, reconstruction over the rest organizes the embedding space and improves prediction from limited supervision. We then turn to transport. Standard transport models learn a map from a fixed source to a fixed target; source-conditioned variants handle a family of sources paired with targets. We introduce Distribution-Conditioned Transport (DCT), which conditions a transport model on embeddings of both source and target and is trained over arbitrary pairs of distributions. This enables transport between distributions unseen during training — forecasting the effect of a new perturbation, transferring across batches or conditions, and modeling dynamics between populations — without requiring explicit pairings. DCT also yields a semi-supervised regime: when only some distributions are observed as matched pairs, singly observed "orphan" marginals sharpen the embedding space and improve prediction for unpaired source-target combinations. DCT is agnostic to the underlying transport parameterization and can be combined with flow matching or distribution-level divergence objectives such as Wasserstein and MMD losses. We demonstrate these frameworks on synthetic benchmarks and biological applications: donor-attribute prediction from single-cell data, batch effect transfer in single-cell genomics, perturbation prediction from mass cytometry data, clonal transcriptional dynamics in hematopoiesis, and T-cell receptor sequence evolution. |
| 2:35 - 3:15 PM | 40 mins |
Invited Keynote: Nona: A unifying multimodal masking framework for functional genomics Abstract: The non-coding genome encodes complex regulatory logic that orchestrates gene expression and cell identity. While machine learning models for functional genomics have advanced our understanding of the cis-regulatory code, sequence-to-function models, DNA language models, and generative models have evolved as separate paradigms despite probing the same underlying regulatory biology. We introduce Nona, a multimodal masked modeling framework that unifies these paradigms by learning jointly from DNA sequence and base-resolution functional genomics data. Beyond unifying existing modeling paradigms, Nona enables entirely new modeling objectives. We demonstrate its versatility through three applications: (1) a context-aware sequence-to-function model that improves local predictions by up to 13% by correcting systematic errors in sequence-to-function predictions; (2) a functional language model that integrates functional data into language modeling, learns relevant regulatory sequence motifs, and enables regulatory element design through masked discrete diffusion; (3) functional genotyping, which reveals an unrecognized privacy vulnerability in processed ATAC-seq data and re-identifies individuals from genetic databases with perfect accuracy. Together, these results establish masking as a universal interface for integrated modeling of functional genomics data, unifying disparate approaches while opening new directions for understanding and engineering the regulatory genome. |
| 3:15 - 3:30 PM | 15 mins | Coffee Break |
| 3:30 - 4:10 PM | 40 mins |
Adam Kosiorek (Google Deep Mind) AlphaGenome: Advancing regulatory variant effect prediction with a unified DNA sequence model Abstract: AlphaGenome is a unified DNA sequence model that takes 1Mb of DNA sequence as input and predicts thousands of functional genomic tracks up to single-base-pair resolution. By bridging the gap between traditional short-sequence/high-resolution models and long-sequence/low-resolution models, AlphaGenome expands the range of modalities, input lengths, and prediction resolutions compared to prior art. Remarkably, AlphaGenome is trained on just two i.i.d. examples: the reference human and mouse genomes. In this talk, we will explore the modeling assumptions that enable training a state-of-the-art model in this setup, leading to performance that matches or exceeds the strongest available external models in most variant effect prediction evaluations. As a case study, we will demonstrate how AlphaGenome’s multimodal capabilities successfully recapitulate the mechanisms of clinically relevant variants near the TAL1 oncogene. We will also discuss the model's current limitations and a need for a joint sequence and function model . |
| 4:10 - 5:00 PM | 50 mins | Poster Session #2 (Accepted submissions) |