Day 4: Target Discovery and deconvolution
Today included some excellent talks that highlighted the importance of data and how it is collected and what kind of modeling that allows. That is also reflected in the Quotes of the day.
As always, three-point (slide) summaries below and feel free to comment in the source Google Doc: https://shorturl.at/mGWsl
Quotes of the Day
Anne Carpenter:
- “Imagine segmentation of microscopy is relatively solved”
- “What we haven’t done is to combine the images and structure prediction to find drugs”
- “This is a plea for ML people to go into images, where there are lots of opportunities, an open area with great potential, e.g. via representation learning etc.
Sébastien Lemieux (LOTS of wisdom here):
- “Biggest problem when using complex data is getting stuck critising it, just don’t, it’s hard to produce, appreciate any data”
- “Dont fear ethical review process for data access, method development often is just fine of a justification”
- “Linear methods work so well with omics data because genome data lies on a low-dimensional manifold”
- “Do PCA, it gets rid of some of the noise” (Reference anyone?)
- “The field at the moment has no systematic approach, it’s a mess, no agreed benchmarks, it’s easy to create your own benchmark to outperform other methods”
- “There is serious confusion what data level to work on”
- “Biologists love their normalisation, but anytime you see normalised data, think about what information was destroyed in that process”
- “Treat your data with respect”
- “Fit the architecture to the data, not the other way around”
- “ML in biology is hard because biologist love their matrices (because Excel) and graphs (because easy to draw)”
- “A lot of power of cell painting essayes seems to be in cell count”
Jonathan Hartford
- “Need to be clear about our assumptions, and what can and can’t be done”
- “Often, we are asking implicit causal questions”
- “As the causal inference community, we need to pay more attention to unstructured data, where the hard part is how to get structured representations from unstructured data”
8:30 am — 9:30 am: Phenomics in Drug Discovery — Anne E. Carpenter
I really enjoyed this talk as it was very data focused, how image data is produced, and what can be done with it, how we can collected cheaper modalities and do better machine learning:
- Cell painting assays can be used for drug discovery:
- Image-based profiling can be applied to many different steps:
- There are substantial success of image based profiling such as:
9:30 am — 10:30 am: Multi-Modal Omics & AI — Sébastien Lemieux
Just like Anne, Sebastian presented an excellent overview of what it means to be “data first” and with that be more successful with machine learning techniques:
- What is omics?
- What are the tasks of multimodal learning in omics
- Take-home messages
11:00 am — 12:00 pm: Causal Discovery & Representation Learning — Jason Hartford
Unfortunately, Jason did not have enough time to cover 20+ slides on Causal Representation Learning, but a tutorial is available here: LINK
- What is Causal Inference?
- What is Causal Discovery?
- What is Causal Representation Learning?
1:30 pm — 2:30 pm: Modeling Population Dynamics — Charlotte Bunne
Slides to be released soon.
2:30 pm — 3:00 pm: Lab 4 — Target Deconvolution Explanation + 3:30 pm — 4:30 pm: Lab 4 — Target Deconvolution + 4:30 pm — 5:00 pm: Lab 4 Recap
For our lab on transcriptomics and target deconvolution, please find
the slides: https://docs.google.com/presentation/d/1gJvF8BTWwivgFE5R2cDIu5XQqBsQtVaGZc__kxQ8X8A/edit#slide=id.p
the colab notebook: https://colab.research.google.com/drive/1k7AWbdAlfUEJbb0Lj6ZcO_bhrpClFICV#scrollTo=HT9J1WKT_e4C