The GO-PCA command-line interface (CLI) consists of individual scripts that can be used to process and visualize the results of a GO-PCA run.
Contents
gopca_extract_go_gene_sets.py
go-pca.py
gopca_print_info.py
gopca_extract_signature_matrix.py
gopca_plot_signature_matrix.py
gopca_extract_signatures.py
gopca_extract_signatures_excel.py
gopca_extract_go_gene_sets.py
¶Generating custom GO-derived gene sets for use with GO-PCA is a two-step
process: First, the script ensembl_extract_protein_coding_genes.py
from
the genometools
package has to be used to create a tab-delimited text file
with a list of protein-coding genes. The input for this script is an Ensembl
GTF file (see the “Gene sets” column on Ensembl’s FTP Download page):
ensembl_extract_protein_coding_genes.py -a [gtf_file] -o [output_file]
The output file can then be used as the “gene file” (-g
) for the script
gopca_extract_go_gene_sets.py
.
usage: gopca_extract_go_gene_sets.py [-h] [--version] -g <file> -t <file> -a
<file> -o <file>
[-e [<evidence code, ...> [<evidence code, ...> ...]]]
[--min-genes-per-term <int>]
[--max-genes-per-term <int>]
[--part-of-cc-only] [-l <file>] [-q] [-v]
–version | Output the GO-PCA version and exit. |
-g, –gene-file | |
File containing list of protein-coding genes (generated using
the script ensembl_extract_protein_coding_genes.py ). | |
-t, –gene-ontology-file | |
Path of ontology file (in OBO format). | |
-a, –goa-association-file | |
Path of UniProt-GOA Gene Association file (in GAF format). | |
-o, –output-file | |
Path of output file. |
-e, –evidence-codes | |
List of three-letter evidence codes to include. If empty, include all evidence types. [IDA, IGI, IMP, ISO, ISS, IC, NAS, TAS] Default: [u’IDA’, u’IGI’, u’IMP’, u’ISO’, u’ISS’, u’IC’, u’NAS’, u’TAS’] | |
–min-genes-per-term | |
Exclude GO terms that have fewer than the specified number of genes annotated with them. Set to 0 to disable. [5] Default: 5 | |
–max-genes-per-term | |
Exclude GO terms that have more than the specified number of genes annotated with them. Set to 0 to disable. [200] Default: 200 | |
–part-of-cc-only | |
If enabled, ignore Default: False |
-l, –log-file | Path of log file (if specified, report to stdout AND file). |
-q, –quiet | Only output errors and warnings. Default: False |
-v, –verbose | Enable verbose output. Ignored if –quiet is specified. Default: False |
go-pca.py
¶go-pca.py
is the command to run GO-PCA. All parameters can either be
spcefied directly on the command line, or in a separate configuration file,
using the -c
option.
Note
The configuration file is expected to follow the Windows “INI-style” format, with a single “[GO-PCA]” section, followed by “parameter=value” entries. If a configuration file is given, and a parameter is set both in the configuration file and on the command line, the command line setting takes precedence.
The only required parameters are::
-e (The expression file.)
-s (The gene set file.)
-o (The output file.)
However, if the expression matrix is not pre-filtered to only contain expressed
genes, it is also highly advisable to specify the -G
option.
usage: go-pca.py [-h] [--version] [-c <file>] -e <file> -s <file> [-t <file>]
-o <file> [-l <file>] [-q] [-v] [-D <int>] [-G <int>]
[-P <float>] [-E <float>] [-R <float>] [-Xf <float>]
[-Xm <int>] [-L <int>] [--escore-pval-thresh <float>]
[--no-local-filter] [--no-global-filter]
[--go-part-of-cc-only] [-ps <int>] [-pp <int>] [-pz <float>]
[-pm <int>]
–version | Output the GO-PCA version and exit. |
-c, –config-file | |
GO-PCA configuration file. Note: The parameter values specified as command line arguments (see below) overwrite the corresponding values in the configuration file. |
-e, –expression-file | |
Tab-separated text file containing the gene expression matrix. | |
-s, –gene-set-file | |
Tab-separated text file containing the gene sets. | |
-t, –gene-ontology-file | |
OBO file containing the Gene Ontology. | |
-o, –output-file | |
Output pickle file (extension ”.pickle” is recommended). |
-l, –log-file | Path of log file (if specified, report to stdout AND file). |
-q, –quiet | Only output errors and warnings. Default: False |
-v, –verbose | Enable verbose output. Ignored if –quiet is specified. Default: False |
-D, –n-components | |
Number of principal components to test (-1 = determine automatically using a permutation test). [-1] Default: -1 | |
-G, –sel-var-genes | |
Variance filter: Keep G most variable genes (0 = off). [0] Default: 0 | |
-P, –pval-thresh | |
P-value threshold for GO enrichment test. [1.0e-06] Default: 1e-06 | |
-E, –escore-thresh | |
E-score threshold for GO enrichment test. [2.0] Default: 2.0 | |
-R, –sig-corr-thresh | |
Correlation threshold used in generating signatures. [0.50] Default: 0.5 | |
-Xf, –mHG-X-frac | |
X_frac parameter for GO enrichment test. [0.25] Default: 0.25 | |
-Xm, –mHG-X-min | |
X_min parameter for GO enrichment test. [5] Default: 5 | |
-L, –mHG-L | L parameter for GO enrichment test (0 = “off”; -1 = # genes / 8). [-1] Default: -1 |
–escore-pval-thresh | |
P-value threshold for XL-mHG E-score calculation (“psi”). [1.0e-04] Default: 0.0001 |
–no-local-filter | |
Disable the “local” filter. Default: False | |
–no-global-filter | |
Disable the “global” filter (if -t is specified). Default: False |
–go-part-of-cc-only | |
Only propagate “part of” GO relations for the CC domain. Default: False |
-ps, –pc-seed | Random number generator seed (-1 = arbitrary value). [0] Default: 0 |
-pp, –pc-num-permutations | |
Number of permutations. [15] Default: 15 | |
-pz, –pc-zscore-thresh | |
Z-score threshold. [2.00] Default: 2.0 | |
-pm, –pc-max-components | |
Maximum number of PCs to test (0 = no maximum). [0] Default: 0 |
gopca_print_info.py
¶In order to simply get a summary of the results contained in a particular
GO-PCA result file, the gopca_print_info.py
command can be used. It prints
things like the number of principal components analyzed, the number of
signatures generated etc.
usage: gopca_print_info.py [-h] [--version] -g <file> [-u] [-s] [-l <file>]
[-q] [-v]
–version | Output the GO-PCA version and exit. |
-g, –gopca-file | |
A GO-PCA run or result pickle. | |
-u, –print-user-config | |
Print user-provided GO-PCA config data of the run. Default: False | |
-s, –print-signatures | |
Print signatures of the GO-PCA result. Default: False |
-l, –log-file | Path of log file (if specified, report to stdout AND file). |
-q, –quiet | Only output errors and warnings. Default: False |
-v, –verbose | Enable verbose output. Ignored if –quiet is specified. Default: False |
gopca_extract_signature_matrix.py
¶This command generates a tab-delimited text file which contains a matrix with
the signature expression values for each signature and each sample. (This is
the data visualized by the gopca_plot_signature_matrix.py
command).
usage: gopca_extract_signature_matrix.py [-h] [--version] -g <file> -o <file>
[-l <file>] [-q] [-v]
–version | Output the GO-PCA version and exit. |
-g, –gopca-file | |
The GO-PCA result file. | |
-o, –output-file | |
The output file. |
-l, –log-file | Path of log file (if specified, report to stdout AND file). |
-q, –quiet | Only output errors and warnings. Default: False |
-v, –verbose | Enable verbose output. Ignored if –quiet is specified. Default: False |
gopca_plot_signature_matrix.py
¶This command generates an interactive plot (embedded into an HTML file) of the GO-PCA signature matrix, visualized as a heatmap.
The HTML file also allows exporting the figure to the PNG format.
gopca_extract_signatures.py
¶This command generates a tab-delimited text file in which each row corresponds to a signature. The columns contain detailed information for each signature, e.g., the gene set enrichment it was based on, and the list of genes contained in it.
usage: gopca_extract_signatures.py [-h] [--version] -g <file> -o <file>
[-l <file>] [-q] [-v]
–version | Output the GO-PCA version and exit. |
-g, –gopca-file | |
The GO-PCA result file. | |
-o, –output-file | |
The output file. |
-l, –log-file | Path of log file (if specified, report to stdout AND file). |
-q, –quiet | Only output errors and warnings. Default: False |
-v, –verbose | Enable verbose output. Ignored if –quiet is specified. Default: False |
gopca_extract_signatures_excel.py
¶This command generates a file with the same information as
gopca_extract_signatures.py
, but in the form of an Excel spreadsheet.
usage: gopca_extract_signatures_excel.py [-h] [--version] -g <file> -o <file>
[-l <file>] [-q] [-v]
–version | Output the GO-PCA version and exit. |
-g, –gopca-file | |
The GO-PCA result file. | |
-o, –output-file | |
The output file. |
-l, –log-file | Path of log file (if specified, report to stdout AND file). |
-q, –quiet | Only output errors and warnings. Default: False |
-v, –verbose | Enable verbose output. Ignored if –quiet is specified. Default: False |