Introduction¶

Code for scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.

This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.

Hardware requirements¶

NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)

Installation instructions¶

Prerequisites¶

scooby uses a a custom version of SnapATAC2, which can be installed with pip.

Note

This is best installed in a separate environment due to numpy version conflicts with scooby.

pip install snapatac2-scooby

Scooby package installation¶

pip install git+https://github.com/gagneurlab/scooby.git
Download file contents from the Zenodo repo
Use examples from the scooby reproducibility repository

Training¶

We offer a train script, which requires SNAPATAC2-preprocessed anndatas and embeddings. Training takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.

Model architecture¶

Currently, the model is only tested with a batch size of 1.