Date/heure
16 janvier 2020
10:45 - 11:45
Oratrice ou orateur
Martin Weigt
Catégorie d'évènement Séminaire Probabilités et Statistique
Résumé
Thanks to the sequencing revolution in biology, protein sequence
databases have been growing exponentially over the last years.
Data-driven computational approaches are becoming more and more
popular in exploring this increasing data richness. In my talk, I will
show that global statistical modeling approaches, like (Restricted)
Boltzmann Machines are able to accurately capture the natural
variability of amino-acid sequences across entire families of
evolutionarily related but distantly diverged proteins. We show that
these models are biologically interpretable; they allow to extract
information about the three-dimensional protein structure and about
protein-protein interactions from sequence data, and they unveil
distributed sequence motifs. These models can be seen as highly
performant generative models – they capture the natural sequence
variability far beyond fitted quantities, and they allow to design
novel, fully functional proteins by simple MCMC sampling approaches.