2-billion datapoints on drug - cell interactions

Hi! We are happy to introduce Zafrens by sharing ~2 billion datapoints in drug-cell interactions from 25,000 compounds synthesized and screened in cell-painting and low-pass mRNA sequencing assays for <$8,000 in under 2 weeks, without using any robots. [25,000 compounds x 4 colors x 80,000 mRNA reads per compound]. We feel this establishes the new frontier for the simplicity, scale and data-richness of drug discovery. Rather than just talk about it, we thought we’d put the data in front of people. Is this the complex cell data what’s been missing for AI in therapeutics?

You can find the Kaggle repositories here (https://lnkd.in/gsu-Qq-i). Admittedly, this is noisy and sparse data optimized only for speed and costs, but we’ve been able to validate multiple findings from these datasets already because the experiments are innately designed for machine learning (i.e., even if the data is noisy, there’s a lot of them). Two chemical libraries are included, one designed to alter RNA methylation and the other a mystery library we’d like the community to decipher from the biological data. Diverse cell types are included, we will add matched-cell data after we’ve mined them for insights internally.

Our method bridges drug discovery and development. How? - We make combinatorial libraries, but unlike DELs, we can profile EVERY compound, not just the hits. When profiled simultaneously by imaging and whole-transcriptome sequencing, a single experiment can suggest hits, SAR and MoA in multiple indications. You can see it for yourself in the shared data. [Well, since we only collected low resolution data for these experiments, you can see signatures. If the community can build models that can extract causal relationships in this data that would be incredible].

Next steps: We will create a series of simple Kaggle competitions to make the data accessible to a wider audience. Right now, it is available to anyone who can use the entirety of the dataset. We are eager to hear your findings and feedback and can generate data along suggested directions to explore improvements.  The goal is to see how many programs we can validate for $8,000, and then maybe $10,000 and $20,000 in costs.

Last, but not least, this has been 2 years of backbreaking work by the Zafrens team. [The magic of this approach extends beyond small molecules to all therapeutic modalities – you can see it here (https://lnkd.in/gZV45WyJ)]. I am incredibly grateful to be part of this team, and we together look forward to helping advance medicines and therapeutic insights. We’d love intros to sponsors for the Kaggle competition and will be happy to make and share data on 1 million molecules publicly if we find the right partners. Thanks! (hello@zafrens.com)

Previous
Previous

Potent CAR’s with minimal cytokines

Next
Next

Zafrens series A financing