# ProliferativeIndex Vignette

#### 2018-08-14

The ProliferativeIndex R package1 provides users with R functions for calculating and analyzing the proliferative index (PI) from an RNA-seq dataset.

The PI was adapted from Venet, et al.2:

“The proliferating cell nuclear antigen, PCNA, is a ring-shaped protein that encircles DNA and regulates several processes leading to DNA replication. As suggested by its name, this is one of the most widely used antigen target for immunohistochemical measures of the fraction of proliferating cells in tissues. Ge et al. profiled with microarrays 36 tissues from normal, healthy individuals encompassing 27 organs. We call ‘meta-PCNA’ the signature composed of the 1% genes the most positively correlated with PCNA expression across these 36 tissues. In plain language, meta-PCNA genes are consistently expressed when PCNA is expressed in normal tissues and consistently repressed when PCNA is repressed. We define the meta-PCNA index as the median expressin of meta-PCNA genes.”

IMPORTANT: Proliferative Indices are only interpretable relative to other PIs. For example, higher/lower PI in tumors compared to normal tissues or in post-mitotic tissues compared to in tissues with high rates of cell turnover. Additionally, PI is measuring proliferation associated with expression (as described above) and not necessarily proliferation itself.

ProliferativeIndex contains the following functions:

• readDataForPI : Read in user data for use with package functions
• calculatePI : Calculate PI for user data
• comparePI : Compare PI across user data set
• compareModeltoPI : Compare PI to model PCs

## Example Data Set

Included with ProliferativeIndex specifically for use with this vignette is data from the The Cancer Genome Atlas (TCGA) Adrenocortical Carcinoma (ACC) dataset.3

library(ProliferativeIndex)

This dataset, vstTCGA_ACCData_sub can be accessed from the package:

data(vstTCGA_ACCData_sub)

#Examine only the first few columns and rows because the dataset is large (20501 genes x 10 samples):
dim(vstTCGA_ACCData_sub)
## [1] 20501    10
#Note that sample IDs are column names and HGNC gene IDs (http://www.genenames.org) are rownames and that vst data is numeric.
str(vstTCGA_ACCData_sub)
## 'data.frame':    20501 obs. of  10 variables:
##  $TCGA.OR.A5J1: num 5.87 4.19 5.92 8.43 6.99 ... ##$ TCGA.OR.A5J2: num  5.49 4.19 5.2 8.74 4.19 ...
##  $TCGA.OR.A5J3: num 6.04 4.52 5.44 8.04 4.76 ... ##$ TCGA.OR.A5J5: num  11.4 4.71 5.22 7.08 6.8 ...
##  $TCGA.OR.A5J6: num 10.07 4.19 5.11 8.8 4.66 ... ##$ TCGA.OR.A5J7: num  5.57 4.19 4.96 7.52 4.91 ...
##  $TCGA.OR.A5J8: num 6.86 4.19 4.19 6.91 5.1 ... ##$ TCGA.OR.A5J9: num  5.4 4.19 6.46 8.94 6.34 ...
##  $TCGA.OR.A5JA: num 6.8 4.19 5.25 8.77 6.36 ... ##$ TCGA.OR.A5JB: num  8.53 4.19 4.19 6.84 4.19 ...
knitr::kable(vstTCGA_ACCData_sub[1:5,1:5])
TCGA.OR.A5J1 TCGA.OR.A5J2 TCGA.OR.A5J3 TCGA.OR.A5J5 TCGA.OR.A5J6
A1BG 5.871339 5.490145 6.036080 11.397348 10.065106
A1CF 4.190503 4.190503 4.523434 4.713955 4.190503
A2BP1 5.915039 5.196520 5.443088 5.221104 5.112238
A2LD1 8.431843 8.741279 8.043286 7.075708 8.798831
A2ML1 6.986670 4.190503 4.764641 6.798125 4.657211

Functions in the ProliferativeIndex package come with help pages that can be accessed as usual (for example, ?readDataForPI).

The function readDataForPI is used to read data in for use with the ProliferativeIndex package.

#Inputs are the user's vst dataframe and a model of interest for examining PI:

#examine output which is a list of two objects:
# exampleTCGAData$vstData is the user's vst dataframe and exampleTCGAData$modelIDs is a character string of the user's gene IDs for their model of interest
str(exampleTCGAData)
## List of 2
##  $vstData :'data.frame': 20501 obs. of 10 variables: ## ..$ TCGA.OR.A5J1: num [1:20501] 5.87 4.19 5.92 8.43 6.99 ...
##   ..$TCGA.OR.A5J2: num [1:20501] 5.49 4.19 5.2 8.74 4.19 ... ## ..$ TCGA.OR.A5J3: num [1:20501] 6.04 4.52 5.44 8.04 4.76 ...
##   ..$TCGA.OR.A5J5: num [1:20501] 11.4 4.71 5.22 7.08 6.8 ... ## ..$ TCGA.OR.A5J6: num [1:20501] 10.07 4.19 5.11 8.8 4.66 ...
##   ..$TCGA.OR.A5J7: num [1:20501] 5.57 4.19 4.96 7.52 4.91 ... ## ..$ TCGA.OR.A5J8: num [1:20501] 6.86 4.19 4.19 6.91 5.1 ...
##   ..$TCGA.OR.A5J9: num [1:20501] 5.4 4.19 6.46 8.94 6.34 ... ## ..$ TCGA.OR.A5JA: num [1:20501] 6.8 4.19 5.25 8.77 6.36 ...
##   ..$TCGA.OR.A5JB: num [1:20501] 8.53 4.19 4.19 6.84 4.19 ... ##$ modelIDs: chr [1:8] "AIFM3" "ATP9B" "CTRC" "MCL1" ...

*note, the R package includes a data object, ‘exReadDataObj’ that is the output from the readDataForPI function for comparison

### calculatePI function

The function calculatePI calculates PI for all sample’s in the users vst dataframe using a list of PCNA-associated genes collected from Venet et al. (including alternative gene names).

*note, the function will print to the screen how many genes used to calculate the PI were found in the vstData

proliferativeIndices<-calculatePI(exampleTCGAData)
## [1] "vstData contained 131/131 of the PI-associated genes"
summary(proliferativeIndices)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   7.454   8.480   9.220   9.246  10.016  10.556

*note, the R package includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison

## comparePI function

This function will summarize the PI values within the user’s dataset.

Min. 1st Qu. Median Mean 3rd Qu. Max. 7.454 8.480 9.220 9.246 10.016 10.556 *note, the R package includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison

## compareModeltoPI function

The function compareModeltoPI will take, as input, the user’s data and model identifiers and compare to PI:

modelComparison<-compareModeltoPI(exampleTCGAData, proliferativeIndices)

#the output is a table, inspect:
knitr::kable(modelComparison)
SpearmanRho SpearmanPvalue PCAPropOfVariance
PC1 0.9878788 0.0000000 0.51527
PC2 0.0181818 0.9728412 0.11587
PC3 -0.0909091 0.8114170 0.07491
PC4 0.1151515 0.7588331 0.06558
PC5 0.1757576 0.6319674 0.05897
PC6 -0.0424242 0.9186333 0.05068
PC7 0.0424242 0.9186333 0.05002
PC8 -0.0909091 0.8114170 0.03992
PC9 -0.0424242 0.9186333 0.02878
PC10 -0.3696970 0.2956041 0.00000

1. Ramaker and Lasseigne, et al. bioRxiv, 2016.

2. Venet, et al. PLoS Computational Biology, 2011 and Ge, et al. Genomics, 2005.

3. The TCGA ACC dataset was obtained from the TCGA data portal (tcga-data.nci.nih.gov) in June 2015. Level 3 RNASeqV2 raw count data was variance stabalized with the DESeq2 v1.8.2 ‘varianceStabilizingTransformation’.