sig_queryl1k_tool¶
Compute similarity of input genesets to perturbational signatures
Synopsis¶
sig_queryl1k_tool
[--up, --uptag UP]
[--down, --dn, --dntag DOWN] [--score SCORE] [--rank RANK] [--sig_meta SIG_META]
[--metric METRIC] [--es_tail ES_TAIL] [--query_meta QUERY_META] [--ncs_group NCS_GROUP]
[--save_minimal SAVE_MINIMAL] [--save_digests SAVE_DIGESTS] [--exemplar_field
EXEMPLAR_FIELD] [--max_col MAX_COL]
Arguments¶
--up, --uptag
UP
: Geneset(s) to use for the up portion of the query
--down, --dn, --dntag
DOWN
: Geneset(s) to use for the down portion of the query
--score
SCORE
: Custom dataset of differential expression scores (e.g. zscores) in GCT(X)
format. Use in combination with rank parameter.
--rank
RANK
: Custom dataset of ranks corresponding to the score matrix in GCT(X) format. Use
in combination with score parameter.
--sig_meta
SIG_META
: Signature metadata for each column in the score matrix. This is a required
field. The first field must match the column id field in the score matrix. The
following fields are required [sig_id, is_ncs_sig, is_null_sig]. In addition
fields specified for ncs_group and exemplar_field arguments must be present
--metric
METRIC
: Similarity metric. Default is wtcs. Options are {wtcs|cs}
--es_tail
ES_TAIL
: Specify two-tailed or one-tailed statistic for enrichment metrics. Default is
both. Options are {both|up|down}
--query_meta
QUERY_META
: Metadata for each query.
--ncs_group
NCS_GROUP
: Grouping field(s) used to normalize connectivity scores
--save_minimal
SAVE_MINIMAL
: Save minimal output to optimize storage requirements. For enrichment based
metrics only the combined scores are saved. Default is 1
--save_digests
SAVE_DIGESTS
: Save per-query digests. Default is 1
--exemplar_field
EXEMPLAR_FIELD
: If defined the field should exist in the sig_meta file and have (0,1) values.
The per-query digests are filtered to signatures where the value>0. Default is
is_exemplar_sig
--max_col
MAX_COL
: Maximum number of columns of the score/rank matrices to read at a time. Default
is 25000
Description¶
The tool computes a set-based enrichment similarity between input genesets (aka queries) and a perturbational gene-expression signature dataset. While the tool is optimized for datasets generated by the L1000 platform, any perturbational dataset can be used.
The algorithm operates as follows. First raw similarity (connectivity) scores between a query and CMap signatures are computed. While query methodology is agnostic to the specific similarity metric, the default choice is a non-parametric, two-tailed weighted gene-set enrichment score.
The raw scores are then scaled (normalized) by the signed-means to allow for comparisons across different queries.
Finally the statistical significance of the connections adjusted for multiple hypotheses is estimated. FDR q-values are estimated by comparing the distributions of treatments to null signatures in the dataset.
Outputs:¶
The tool produces the following output:
arfs/ : Per-query analysis report files (ARFs)
-
raw_cs : Raw connectivity scores
-
norm_cs : Normalized connectivity score computed by dividing the raw connectivity scores by the signed-mean scores of signatures (specified by the is_ncs_sig field in the signature metadata file) If the ncs_group field is not empty the scores are normalized within each group, otherwise the scores are normalized using the global means across all signatures.
-
fdr_q_nlog10 : Negative log10 transformed FDR q-values estimated relative to the null signatures (specified by the is_null_sig field in the signature annotation file).
matrices/query : Query parameters and result matrices in GCTx format for all queries:
-
up.gmt, dn.gmt: query genesets in GMT format
-
cs.gctx : Raw connectivity scores matrix [signatures x queries]
-
ncs.gctx : Normalized connectivity score matrix [signatures x queries]
-
fdr_qvalue.gctx : Estimated false discovery rate q-values [signatures x queries]
Examples¶
% Run queries
sig_queryl1k_tool --up 'up.gmt' --down 'down.gmt' --score 'score.gctx' --rank
'rank.gctx' --sig_meta 'sig_meta.txt'