GOPCASignature class

class gopca.GOPCASignature(pc, gse_result, seed, matrix)

A GO-PCA signature.

The goal of the GO-PCA algorithm is to define gene “signatures” that are likely to represent biologically relevant similarities and differences among samples.

A GO-PCA signature consists of a set genes and their expression profiles. Genes in a signature are related to each other in two ways:

  1. All signature genes are members of a specific gene set. Gene sets are supplied to GO-PCA by the user and correspond to groups of genes that are known to be related to each other in some way. For example, when GO-PCA is run with gene sets derived from Gene Ontology (GO) annotations, all genes in a gene set are known to be annotated with the same GO term, indicating a functional relationship among them.
  2. The genes have been found to be strongly correlated with each other in the sense that they all contribute strongly to the same principal component (PC) of the expression matrix.
Parameters:
pc

int – The principal component (PC) that the signature was derived from (starting at 1), with the sign of the integer indicating the way in which genes were ranked based on their PC loadings. If the sign is positive, then the signature was derived based on an ascending order. Conversely, if the sign is negative, then the signature was derived based on a descending ranking.

gse_result

RankBasedGSEResult – The result of the XL-mHG test that was conducted after ranking the genes based on their principal component loadings.

seed

genometools.expression.ExpProfile – The seed used to determine gene membership during signature generation.

matrix

genometools.expression.ExpMatrix – Gene-by-sample matrix containing the original expression values of all signature genes.

Notes

Objects of this class are hashable, which allows them to be used in pandas Series and DataFrame indices.

K

The total number of genes annotated with the GO term.

X

The expression array.

escore

The E-escore of the enrichment test.

gene_set

The gene set that the signature is based on.

gene_set_id

The ID of the gene set that the signature is based on.

genes

The genes in the signature.

get_expression(standardize=False, center=True, use_median=True)

Generate an expression profile for the signature.

Parameters:
  • standardize (bool) – Whether to standardize gene expression profiles in the calculation of the expression profile. [False]
  • center (bool) – Whether to center the gene expression profiles in the calculation of the expression profile. [True]
  • use_median (bool) – Whether to use the median to center gene expression profiles. Only relevant if center is set to True. [True]
Returns:

The expression signature.

Return type:

genometools.expression.ExpProfile

get_figure(sig_matrix=None, heatmap_kw=None, **kwargs)

Generate a plotly figure showing the signature gene matrix as a heatmap.

This is a shortcut for Signature.get_heatmap(...).get_figure(...).

See ExpHeatmap.get_figure() for keyword arguments.

Parameters:
  • sig_matrix (GOPCASignatureMatrix (optional)) – The GO-PCA signature matrix. If specified, samples will be shown in the same order as in the signature matrix.
  • heatmap_kw (dict (optional)) – If not None, dictionary containing keyword arguments to be passed to the ExpHeatmap constructor.
Returns:

The plotly figure.

Return type:

plotly.graph_objs.Figure

get_heatmap(sig_matrix=None, standardize=False, center=True, use_median=True, include_id=False, include_stats=True, include_pval=True, cluster_genes=True, gene_cluster_metric=u'correlation', cluster_samples=True, sample_cluster_metric=u'euclidean', cluster_method=u'average', colorbar_label=None, **kwargs)

Generate a heatmap of the signature gene matrix.

get_label(max_name_length=0, include_stats=True, include_id=True, include_pval=False, include_coll=True)

Generate a signature label.

k

The number of genes in the signature.

mHG_K

The total number of genes in the gene set.

mHG_N

The total number of genes in the analysis.

mHG_cutoff

The cutoff at which the XL-mHG test statistic was attained.

mHG_k

The number of genes within the gene set above the mHG cutoff.

n

The number of samples.

pval

The p-value of the enrichment test.

s

The signature expression vector.

samples

The sample labels.