Command-line interface

The GO-PCA command-line interface (CLI) consists of individual scripts that can be used to process and visualize the results of a GO-PCA run.

Running GO-PCA: go-pca.py

go-pca.py is the command to run GO-PCA. All parameters can either be spcefied directly on the command line, or in a separate configuration file, using the -c option.

Note

The configuration file is expected to follow the Windows “INI-style” format, with a single “[GO-PCA]” section, followed by “parameter=value” entries. If a configuration file is given, and a parameter is set both in the configuration file and on the command line, the command line setting takes precedence.

The only required parameters are::

-e  (The expression file.)
-s  (The gene set file.)
-o  (The output file.)

However, if the expression matrix is not pre-filtered to only contain expressed genes, it is also highly advisable to specify the -G option.

usage: go-pca.py [-h] [--version] [-c <file>] -e <file> -s <file> [-t <file>]
                 -o <file> [-l <file>] [-q] [-v] [-D <int>] [-G <int>]
                 [-P <float>] [-E <float>] [-R <float>] [-Xf <float>]
                 [-Xm <int>] [-L <int>] [--escore-pval-thresh <float>]
                 [--no-local-filter] [--no-global-filter]
                 [--go-part-of-cc-only] [-ps <int>] [-pp <int>] [-pz <float>]
                 [-pm <int>]
Help
--version="==SUPPRESS=="
 Output the GO-PCA version and exit.
Separate configuration file
-c, --config-file
 GO-PCA configuration file. Note: The parameter values specified as command line arguments (see below) overwrite the corresponding values in the configuration file.
Input and output files
-e, --expression-file
 Tab-separated text file containing the gene expression matrix.
-s, --gene-set-file
 Tab-separated text file containing the gene sets.
-t, --gene-ontology-file
 OBO file containing the Gene Ontology.
-o, --output-file
 Output pickle file (extension ”.pickle” is recommended).
Reporting options
-l, --log-file Path of log file (if specified, report to stdout AND file).
-q=False, --quiet=False
 Only output errors and warnings.
-v=False, --verbose=False
 Enable verbose output. Ignored if –quiet is specified.
GO-PCA parameters ([] = default value)
-D=-1, --n-components=-1
 Number of principal components to test (-1 = determine automatically using a permutation test). [-1]
-G=0, --sel-var-genes=0
 Variance filter: Keep G most variable genes (0 = off). [0]
-P=1e-06, --pval-thresh=1e-06
 P-value threshold for GO enrichment test. [1.0e-06]
-E=2.0, --escore-thresh=2.0
 E-score threshold for GO enrichment test. [2.0]
-R=0.5, --sig-corr-thresh=0.5
 Correlation threshold used in generating signatures. [0.50]
-Xf=0.25, --mHG-X-frac=0.25
 X_frac parameter for GO enrichment test. [0.25]
-Xm=5, --mHG-X-min=5
 X_min parameter for GO enrichment test. [5]
-L=-1, --mHG-L=-1
 L parameter for GO enrichment test (0 = “off”; -1 = # genes / 8). [-1]
--escore-pval-thresh=0.0001
 P-value threshold for XL-mHG E-score calculation (“psi”). [1.0e-04]
Manually disable the GO-PCA filters
--no-local-filter=False
 Disable the “local” filter.
--no-global-filter=False
 Disable the “global” filter (if -t is specified).
Legacy options
--go-part-of-cc-only=False
 Only propagate “part of” GO relations for the CC domain.
Parameters for automatically determining the number of PCs to test ([] = default value)
-ps=0, --pc-seed=0
 Random number generator seed (-1 = arbitrary value). [0]
-pp=15, --pc-num-permutations=15
 Number of permutations. [15]
-pz=2.0, --pc-zscore-thresh=2.0
 Z-score threshold. [2.00]
-pm=0, --pc-max-components=0
 Maximum number of PCs to test (0 = no maximum). [0]

Inspecting the results: gopca_print_info.py

In order to simply get a summary of the results contained in a particular GO-PCA result file, the gopca_print_info.py command can be used. It prints things like the number of principal components analyzed, the number of signatures generated etc.

usage: gopca_print_info.py [-h] [--version] -g <file> [-u] [-s] [-l <file>]
                           [-q] [-v]
Help
--version="==SUPPRESS=="
 Output the GO-PCA version and exit.
Input file (required)
-g, --gopca-file
 A GO-PCA run or result pickle.
-u=False, --print-user-config=False
 Print user-provided GO-PCA config data of the run.
-s=False, --print-signatures=False
 Print signatures of the GO-PCA result.
Reporting options
-l, --log-file Path of log file (if specified, report to stdout AND file).
-q=False, --quiet=False
 Only output errors and warnings.
-v=False, --verbose=False
 Enable verbose output. Ignored if –quiet is specified.

Extracting the signature matrix (as tab-delimited text file): gopca_extract_signature_matrix.py

This command generates a tab-delimited text file which contains a matrix with the signature expression values for each signature and each sample. (This is the data visualized by the gopca_plot_signature_matrix.py command).

usage: gopca_extract_signature_matrix.py [-h] [--version] -g <file> -o <file>
                                         [-l <file>] [-q] [-v]
Help
--version="==SUPPRESS=="
 Output the GO-PCA version and exit.
Input and output files (required)
-g, --gopca-file
 The GO-PCA result file.
-o, --output-file
 The output file.
Reporting options
-l, --log-file Path of log file (if specified, report to stdout AND file).
-q=False, --quiet=False
 Only output errors and warnings.
-v=False, --verbose=False
 Enable verbose output. Ignored if –quiet is specified.

Plotting the signature matrix as a heatmap: gopca_plot_signature_matrix.py

This command generates an interactive plot (embedded into an HTML file) of the GO-PCA signature matrix, visualized as a heatmap.

The HTML file also allows exporting the figure to the PNG format.

Extracting the signatures (as tab-delimited text file): gopca_extract_signatures.py

This command generates a tab-delimited text file in which each row corresponds to a signature. The columns contain detailed information for each signature, e.g., the gene set enrichment it was based on, and the list of genes contained in it.

usage: gopca_extract_signatures.py [-h] [--version] -g <file> -o <file>
                                   [-l <file>] [-q] [-v]
Help
--version="==SUPPRESS=="
 Output the GO-PCA version and exit.
Input and output files (required)
-g, --gopca-file
 The GO-PCA result file.
-o, --output-file
 The output file.
Reporting options
-l, --log-file Path of log file (if specified, report to stdout AND file).
-q=False, --quiet=False
 Only output errors and warnings.
-v=False, --verbose=False
 Enable verbose output. Ignored if –quiet is specified.

Extracting the signatures (as Excel spreadsheet): gopca_extract_signatures_excel.py

This command generates a file with the same information as gopca_extract_signatures.py, but in the form of an Excel spreadsheet.

usage: gopca_extract_signatures_excel.py [-h] [--version] -g <file> -o <file>
                                         [-l <file>] [-q] [-v]
Help
--version="==SUPPRESS=="
 Output the GO-PCA version and exit.
Input and output files (required)
-g, --gopca-file
 The GO-PCA result file.
-o, --output-file
 The output file.
Reporting options
-l, --log-file Path of log file (if specified, report to stdout AND file).
-q=False, --quiet=False
 Only output errors and warnings.
-v=False, --verbose=False
 Enable verbose output. Ignored if –quiet is specified.