Day 4: Target Discovery and deconvolution

4 min readJun 18, 2024

Today included some excellent talks that highlighted the importance of data and how it is collected and what kind of modeling that allows. That is also reflected in the Quotes of the day.

As always, three-point (slide) summaries below and feel free to comment in the source Google Doc: https://shorturl.at/mGWsl

Quotes of the Day

Anne Carpenter:

“Imagine segmentation of microscopy is relatively solved”
“What we haven’t done is to combine the images and structure prediction to find drugs”
“This is a plea for ML people to go into images, where there are lots of opportunities, an open area with great potential, e.g. via representation learning etc.

Sébastien Lemieux (LOTS of wisdom here):

“Biggest problem when using complex data is getting stuck critising it, just don’t, it’s hard to produce, appreciate any data”
“Dont fear ethical review process for data access, method development often is just fine of a justification”
“Linear methods work so well with omics data because genome data lies on a low-dimensional manifold”
“Do PCA, it gets rid of some of the noise” (Reference anyone?)
“The field at the moment has no systematic approach, it’s a mess, no agreed benchmarks, it’s easy to create your own benchmark to outperform other methods”
“There is serious confusion what data level to work on”
“Biologists love their normalisation, but anytime you see normalised data, think about what information was destroyed in that process”
“Treat your data with respect”
“Fit the architecture to the data, not the other way around”
“ML in biology is hard because biologist love their matrices (because Excel) and graphs (because easy to draw)”
“A lot of power of cell painting essayes seems to be in cell count”

Jonathan Hartford

“Need to be clear about our assumptions, and what can and can’t be done”
“Often, we are asking implicit causal questions”
“As the causal inference community, we need to pay more attention to unstructured data, where the hard part is how to get structured representations from unstructured data”

8:30 am — 9:30 am: Phenomics in Drug Discovery — Anne E. Carpenter

I really enjoyed this talk as it was very data focused, how image data is produced, and what can be done with it, how we can collected cheaper modalities and do better machine learning:

Cell painting assays can be used for drug discovery:

Image-based profiling can be applied to many different steps:

There are substantial success of image based profiling such as:

9:30 am — 10:30 am: Multi-Modal Omics & AI — Sébastien Lemieux

Just like Anne, Sebastian presented an excellent overview of what it means to be “data first” and with that be more successful with machine learning techniques: