Usage

First, please download the reference files from zenodo.

Then, extract the archive files with tar xvzf and run the SCRIP config function with the folders.

SCRIP includes 5 main commands.

usage: SCRIP [-h] [--version] {enrich,impute,target,config,index} ...

SCRIP

positional arguments:
{enrich,impute,target,config,index}
   enrich              Main function.
   impute              Imputation Factor function.
   target              Calculate targets based on factor peak count.
   config              Configuration.
   index               Build index with custom intervals.

optional arguments:
-h, --help            show this help message and exit
--version             show program's version number and exit

For command line options of each command, type: SCRIP COMMAND -h

Simple usage

SCRIP enrich -i {peak_count.h5} -s hs -p {result_SCRIP_path} -t 32
SCRIP impute -i {peak_count.h5} -s hs -p {result_SCRIP_path} -f h5ad --factor {factor}
SCRIP target -i {result_SCRIP_path}/imputation/{factor}/imputed_{factor}.h5ad -s hs -o {result_SCRIP_path}/target/{factor}_target.h5ad

Detailed usages are listed as follows:

SCRIP enrich

In this function, you can input a peak count matrix in H5 or MTX format, with basic parameters of quality control. This function will output a folder including these files:

  • beds: bed files of all cells

  • ChIP_result: txt files of Giggle search results

  • qpeaks_length.txt: peak total length of each cell

  • SCRIP_enrichment.txt: the result of the SCRIP score

  • dataset_overlap_df.pk: the raw number of overlaps of each cell to each dataset

  • dataset_cell_norm_df.pk: normalized scores

  • dataset_score_source_df.pk: matched reference datasets

  • tf_cell_score_df.pk: the same table to SCRIP_enrichment.txt but untransposed and in pickle format

usage: SCRIP enrich [-h] -i FEATURE_MATRIX -s {hs,mm} [-p PROJECT] [--min_cells MIN_CELLS] [--min_peaks MIN_PEAKS] [--max_peaks MAX_PEAKS]
                 [-t N_CORES] [-m {max,mean}] [-y] [--clean]

optional arguments:
-h, --help            show this help message and exit

Input files arguments:
-i FEATURE_MATRIX, --input_feature_matrix FEATURE_MATRIX
                        A cell by peak matrix . REQUIRED.
-s {hs,mm}, --species {hs,mm}
                        Species. "hs"(human) or "mm"(mouse). REQUIRED.

Output arguments:
-p PROJECT, --project PROJECT
                        Project name, which will be used to generate output files folder. DEFAULT: Random generate.

Preprocessing paramater arguments:
--min_cells MIN_CELLS
                        Minimal cell cutoff for features. Auto will take 0.05% of total cell number.DEFAULT: "auto".
--min_peaks MIN_PEAKS
                        Minimal peak cutoff for cells. Auto will take the mean-3*std of all feature number (if less than 500 is 500). DEFAULT: "auto".
--max_peaks MAX_PEAKS
                        Max peak cutoff for cells. This will help you to remove the doublet cells. Auto will take the mean+5*std of all feature
                        number. DEFAULT: "auto".

Other options:
-t N_CORES, --thread N_CORES
                        Number of cores use to run SCRIP. DEFAULT: 16.
-m {max,mean}, --mode {max,mean}
                        Deduplicate strategy. DEFAULT: max.
-y, --yes             Whether ask for confirmation. DEFAULT: False.
--clean               Whether delete tmp files(including bed and search results) generated by SCRIP. DEFAULT: False.

SCRIP impute

In this function, you can input a peak count matrix of scATAC-seq in H5 or MTX format and a TR or HM you are interested in, with basic parameters of quality control. This function will output the matrix of pseudo-ChIP-seq peak in H5AD or MTX format. The output can be the input of the SCRIP target function.

usage: SCRIP impute [-h] -i FEATURE_MATRIX -s {hs,mm} [-p PROJECT] [-f {h5ad,mtx}] --factor FACTOR [--ref_baseline REF_BASELINE] [--remove_others] [--min_cells MIN_CELLS] [--min_peaks MIN_PEAKS] [--max_peaks MAX_PEAKS] [-t N_CORES]

optional arguments:
-h, --help            show this help message and exit

Input files arguments:
-i FEATURE_MATRIX, --input_feature_matrix FEATURE_MATRIX
                        A cell by peak matrix. h5 or h5ad supported. REQUIRED.
-s {hs,mm}, --species {hs,mm}
                        Species. "hs"(human) or "mm"(mouse). REQUIRED.

Output arguments:
-p PROJECT, --project PROJECT
                        Project name, which will be used to generate output files folder. DEFAULT: Random generate.
-f {h5ad,mtx}, --format {h5ad,mtx}
                        Format generate for output peak count. DEFAULT: h5ad.

Peak imputation paramater arguments:
--factor FACTOR       The factor you want to impute. REQUIRED.
--ref_baseline REF_BASELINE
                        Remove dataset which peaks number less than this value. DEFAULT: 500.
--remove_others       Remove signal not from best match. DEFAULT: False.

Other options:
--min_cells MIN_CELLS
                        Minimal cell cutoff for features. Auto will take 0.05% of total cell number.DEFAULT: "auto".
--min_peaks MIN_PEAKS
                        Minimal peak cutoff for cells. Auto will take the mean-3*std of all feature number (if less than 500 is 500). DEFAULT: "auto".
--max_peaks MAX_PEAKS
                        Max peak cutoff for cells. This will help you to remove the doublet cells. Auto will take the mean+5*std of all feature number. DEFAULT: "auto".
-t N_CORES, --thread N_CORES
                        Number of cores use to run SCRIP. DEFAULT: 16.

SCRIP target

In this function, you can input a peak count matrix of scATAC-seq in H5 format or scChIP-seq peak count. This function will output the RP matrix in H5AD. The output can be used to determine the direct target genes.

usage: SCRIP target [-h] -i FEATURE_MATRIX -s {hs,mm} [-o OUTPUT] [-d DECAY] [-m MODEL]

optional arguments:
-h, --help            show this help message and exit

Input files arguments:
-i FEATURE_MATRIX, --input_feature_matrix FEATURE_MATRIX
                        A cell by peak matrix. h5 or h5ad supported. REQUIRED.
-s {hs,mm}, --species {hs,mm}
                        Species. "hs"(human) or "mm"(mouse). REQUIRED.

Output arguments:
-o OUTPUT, --output OUTPUT
                        output h5ad file. DEFAULT: RP.h5ad

Other options:
-d DECAY, --decay DECAY
                        Range to the effect of peaks. DEFAULT: auto.
-m MODEL, --model MODEL
                        RP model chosen. DEFAULT: simple.

SCRIP config

This function is used to config the reference files that SCRIP uses. The reference files can be downloaded from zenodo. The index path should be the folder after extract.

usage: SCRIP config [-h] [--show] [--human_tf_index HUMAN_TF_INDEX] [--human_hm_index HUMAN_HM_INDEX] [--mouse_tf_index MOUSE_TF_INDEX] [--mouse_hm_index MOUSE_HM_INDEX]

optional arguments:
-h, --help            show this help message and exit
--show
--human_tf_index HUMAN_TF_INDEX
--human_hm_index HUMAN_HM_INDEX
--mouse_tf_index MOUSE_TF_INDEX
--mouse_hm_index MOUSE_HM_INDEX

SCRIP index

This function is used to create the SCRIP index with users’ peaks.

usage: SCRIP index [-h] -i INPUT -o OUTPUT

optional arguments:
-h, --help            show this help message and exit
-i INPUT, --input INPUT
                        Path to the folder that includes all your bed files. The bed files should be named in "TRName_ID.bed", e.g. "AR_1.bed".
-o OUTPUT, --output OUTPUT
                        Path to the output folder.