Command-line interface
The GO-PCA command-line interface (CLI) consists of individual scripts that
can be used to process and visualize the results of a GO-PCA run.
Running GO-PCA: go-pca.py
go-pca.py
is the command to run GO-PCA. All parameters can either be
spcefied directly on the command line, or in a separate configuration file,
using the -c
option.
Note
The configuration file is expected to follow the Windows “INI-style” format,
with a single “[GO-PCA]” section, followed by “parameter=value” entries.
If a configuration file is given, and a parameter is set both in the
configuration file and on the command line, the command line setting takes
precedence.
The only required parameters are::
-e (The expression file.)
-s (The gene set file.)
-o (The output file.)
However, if the expression matrix is not pre-filtered to only contain expressed
genes, it is also highly advisable to specify the -G
option.
usage: go-pca.py [-h] [--version] [-c <file>] -e <file> -s <file> [-t <file>]
-o <file> [-l <file>] [-q] [-v] [-D <int>] [-G <int>]
[-P <float>] [-E <float>] [-R <float>] [-Xf <float>]
[-Xm <int>] [-L <int>] [--escore-pval-thresh <float>]
[--no-local-filter] [--no-global-filter]
[--go-part-of-cc-only] [-ps <int>] [-pp <int>] [-pz <float>]
[-pm <int>]
- Help
--version="==SUPPRESS==" |
| Output the GO-PCA version and exit. |
- Separate configuration file
-c, --config-file |
| GO-PCA configuration file. Note: The parameter values specified
as command line arguments (see below) overwrite the
corresponding values in the configuration file. |
- Input and output files
-e, --expression-file |
| Tab-separated text file containing the gene expression matrix. |
-s, --gene-set-file |
| Tab-separated text file containing the gene sets. |
-t, --gene-ontology-file |
| OBO file containing the Gene Ontology. |
-o, --output-file |
| Output pickle file (extension ”.pickle” is recommended). |
- Reporting options
-l, --log-file |
Path of log file (if specified, report to stdout AND
file). |
-q=False, --quiet=False |
| Only output errors and warnings. |
-v=False, --verbose=False |
| Enable verbose output. Ignored if –quiet is specified. |
- GO-PCA parameters ([] = default value)
-D=-1, --n-components=-1 |
| Number of principal components to test
(-1 = determine automatically using a permutation test). [-1] |
-G=0, --sel-var-genes=0 |
| Variance filter: Keep G most variable genes (0 = off). [0] |
-P=1e-06, --pval-thresh=1e-06 |
| P-value threshold for GO enrichment test. [1.0e-06]
|
-E=2.0, --escore-thresh=2.0 |
| E-score threshold for GO enrichment test. [2.0]
|
-R=0.5, --sig-corr-thresh=0.5 |
| Correlation threshold used in generating signatures. [0.50]
|
-Xf=0.25, --mHG-X-frac=0.25 |
| X_frac parameter for GO enrichment test. [0.25]
|
-Xm=5, --mHG-X-min=5 |
| X_min parameter for GO enrichment test. [5]
|
-L=-1, --mHG-L=-1 |
| L parameter for GO enrichment test
(0 = “off”; -1 = # genes / 8). [-1] |
--escore-pval-thresh=0.0001 |
| P-value threshold for XL-mHG E-score calculation (“psi”). [1.0e-04]
|
- Manually disable the GO-PCA filters
--no-local-filter=False |
| Disable the “local” filter. |
--no-global-filter=False |
| Disable the “global” filter (if -t is specified). |
- Legacy options
--go-part-of-cc-only=False |
| Only propagate “part of” GO relations for the CC domain. |
- Parameters for automatically determining the number of PCs to test ([] = default value)
-ps=0, --pc-seed=0 |
| Random number generator seed (-1 = arbitrary value). [0]
|
-pp=15, --pc-num-permutations=15 |
| Number of permutations. [15] |
-pz=2.0, --pc-zscore-thresh=2.0 |
| Z-score threshold. [2.00] |
-pm=0, --pc-max-components=0 |
| Maximum number of PCs to test (0 = no maximum). [0]
|
Inspecting the results: gopca_print_info.py
In order to simply get a summary of the results contained in a particular
GO-PCA result file, the gopca_print_info.py
command can be used. It prints
things like the number of principal components analyzed, the number of
signatures generated etc.
usage: gopca_print_info.py [-h] [--version] -g <file> [-u] [-s] [-l <file>]
[-q] [-v]
- Help
--version="==SUPPRESS==" |
| Output the GO-PCA version and exit. |
- Input file (required)
-g, --gopca-file |
| A GO-PCA run or result pickle. |
-u=False, --print-user-config=False |
| Print user-provided GO-PCA config data of the run. |
-s=False, --print-signatures=False |
| Print signatures of the GO-PCA result. |
- Reporting options
-l, --log-file |
Path of log file (if specified, report to stdout AND
file). |
-q=False, --quiet=False |
| Only output errors and warnings. |
-v=False, --verbose=False |
| Enable verbose output. Ignored if –quiet is specified. |
Plotting the signature matrix as a heatmap: gopca_plot_signature_matrix.py
This command generates an interactive plot (embedded into an HTML file) of the
GO-PCA signature matrix, visualized as a heatmap.
The HTML file also allows exporting the figure to the PNG format.