This section describes the expression file format used by GO-PCA, and documents
the go-pca.py
command that is used to run GO-PCA. For information on how
to provide the gene sets to GO-PCA, please see the previous section,
Obtaining Gene Sets.
The main input to GO-PCA is an expression matrix, with rows representing genes, and columns representing samples. GO-PCA expects the expression matrix to be described in a tab-delimited text file that contains the gene expression values in a matrix layout. The first row contains the sample names, and the first column represents gene names (the content of the top left cell is ignored). A mini-example of a valid expression file with only five genes and three samples is shown below:
ignored Sample1 Sample2 Sample3
IGBP1 8.64947 8.01958 7.95444
MYC 7.61296 7.38281 7.58559
SMAD1 8.84338 8.41662 8.94365
MDM1 6.17908 6.07470 5.59411
CD44 7.64093 7.56293 7.58277
For users who are unfamiliar with Python, the most convenient way of running GO-PCA is through the command line. In general, this can be done in a two-step process: