Skip to content

sig_queryl1k_tool

Compute similarity of input genesets to perturbational signatures

Synopsis

sig_queryl1k_tool [--up, --uptag UP] [--down, --dn, --dntag DOWN] [--score SCORE] [--rank RANK] [--sig_meta SIG_META] [--metric METRIC] [--es_tail ES_TAIL] [--query_meta QUERY_META] [--ncs_group NCS_GROUP] [--save_minimal SAVE_MINIMAL] [--save_digests SAVE_DIGESTS] [--exemplar_field EXEMPLAR_FIELD] [--max_col MAX_COL]

Arguments

--up, --uptag UP : Geneset(s) to use for the up portion of the query

--down, --dn, --dntag DOWN : Geneset(s) to use for the down portion of the query

--score SCORE : Custom dataset of differential expression scores (e.g. zscores) in GCT(X) format. Use in combination with rank parameter.

--rank RANK : Custom dataset of ranks corresponding to the score matrix in GCT(X) format. Use in combination with score parameter.

--sig_meta SIG_META : Signature metadata for each column in the score matrix. This is a required field. The first field must match the column id field in the score matrix. The following fields are required [sig_id, is_ncs_sig, is_null_sig]. In addition fields specified for ncs_group and exemplar_field arguments must be present

--metric METRIC : Similarity metric. Default is wtcs. Options are {wtcs|cs}

--es_tail ES_TAIL : Specify two-tailed or one-tailed statistic for enrichment metrics. Default is both. Options are {both|up|down}

--query_meta QUERY_META : Metadata for each query.

--ncs_group NCS_GROUP : Grouping field(s) used to normalize connectivity scores

--save_minimal SAVE_MINIMAL : Save minimal output to optimize storage requirements. For enrichment based metrics only the combined scores are saved. Default is 1

--save_digests SAVE_DIGESTS : Save per-query digests. Default is 1

--exemplar_field EXEMPLAR_FIELD : If defined the field should exist in the sig_meta file and have (0,1) values. The per-query digests are filtered to signatures where the value>0. Default is is_exemplar_sig

--max_col MAX_COL : Maximum number of columns of the score/rank matrices to read at a time. Default is 25000

Description

The tool computes a set-based enrichment similarity between input genesets (aka queries) and a perturbational gene-expression signature dataset. While the tool is optimized for datasets generated by the L1000 platform, any perturbational dataset can be used.

The algorithm operates as follows. First raw similarity (connectivity) scores between a query and CMap signatures are computed. While query methodology is agnostic to the specific similarity metric, the default choice is a non-parametric, two-tailed weighted gene-set enrichment score.

The raw scores are then scaled (normalized) by the signed-means to allow for comparisons across different queries.

Finally the statistical significance of the connections adjusted for multiple hypotheses is estimated. FDR q-values are estimated by comparing the distributions of treatments to null signatures in the dataset.

Outputs:

The tool produces the following output:

arfs/ : Per-query analysis report files (ARFs)

/query_result.gct : a GCT format text file listing the annotations, connectivity scores and q-values for each signature in the dataset. The following fields are computed by the query tool:

  • raw_cs : Raw connectivity scores

  • norm_cs : Normalized connectivity score computed by dividing the raw connectivity scores by the signed-mean scores of signatures (specified by the is_ncs_sig field in the signature metadata file) If the ncs_group field is not empty the scores are normalized within each group, otherwise the scores are normalized using the global means across all signatures.

  • fdr_q_nlog10 : Negative log10 transformed FDR q-values estimated relative to the null signatures (specified by the is_null_sig field in the signature annotation file).

matrices/query : Query parameters and result matrices in GCTx format for all queries:

  • up.gmt, dn.gmt: query genesets in GMT format

  • cs.gctx : Raw connectivity scores matrix [signatures x queries]

  • ncs.gctx : Normalized connectivity score matrix [signatures x queries]

  • fdr_qvalue.gctx : Estimated false discovery rate q-values [signatures x queries]

Examples

% Run queries

sig_queryl1k_tool --up 'up.gmt' --down 'down.gmt' --score 'score.gctx' --rank 'rank.gctx' --sig_meta 'sig_meta.txt'