Pathway activity assessment

Tumors are driven by the aberrant activity of key signaling pathways that e.g. promote tumor growth or hinder apoptosis. In order to obtain an overview of altered processes in a breast tumor under investigation, we consider pathway activities of a set of 20 core breast cancer-relevant pathways. These patterns can in turn be used to assess characteristics of tumor subtypes and to inform a treatment decision, as for example tumors with specifically high activities in PIK-AKT-mTOR signaling would profit from treatment with an AKT or an mTOR inhibitor, e.g. everolimus. The activity of a pathway $i$ is computed based on the deregulation scores of the pathway genes $\Gamma_i$ (i.e. fold-changes or z-scores comparing tumor vs healthy tissue), weighted by their relevance $w_i$ for the respective pathway’s activity. In order to obtain a more complete set of genes involved in the activity of a given pathway, we merge the gene sets of respective pathways and biological categories provided by KEGG, GO, Reactome and WikiPathways.

We hypothesize that targeted drugs are especially effective in cases where their target pathway is highly active and alternative cancer-driving pathways are not. We take advantage of the assumed relationship between a pathway’s activity and a corresponding drug’s efficacy to compute the weights $w_i$. To this end, we consider all 49 breast cancer cell lines from the Genomics of Drug Sensitivity in Cancer (GDSC1000) database and their sensitivities for a large panel of drugs targeting various pathways. The authors provided drug sensitivity scores as IC50 values, i.e. the 10$\cdot$log10-transformed concentration of an inhibitor that decreases the biotransformation rate of its target’s substrates by 50%. For a given pathway of interest $i$, we select the set $D_i$ of drugs from GDSC that target this pathway. For each of those drugs $d_{ij} \in D_i$, we compute Pearson’s correlation between the drug’s IC50 values and the gene expression measurements across cell lines. For each of those correlation coefficients $\rho_{jk}$, we also compute a p-value assessing the significance of its deviation from zero. This results in two matrices of dimensions $n_i \times m_i$ each, where $n_i$ corresponds to the size of the gene set $\Gamma_i$ and $m_i$ to the size of the drug set $D_i$. The matrices are then transformed as follows: the correlation coefficients are z-transformed per drug and then averaged cross drugs yielding a list of correlation-based scores per pathway $c_i$. The p-values are aggregated across drugs using Fisher’s method. The aggregated p-values are then -log10-transformed to obtain scores per pathway and gene $p_i$. The larger the score $p_{ik}$ for a gene $k$, the more relevant it is as an indicator for the pathway’s activity. As the scores in $p_i$ are all positive, we recover the direction of the gene’s effect, i.e. whether it acts as an activator or repressor of the pathway, from the sign of the corresponding correlation-based score $c_{ik}$. The final weights $w_i$ are then computed as $w_i = sgn(c_i) \cdot p_i$. The pathway activity $\phi_i(t)$ for a tumor sample $t$ and a pathway $i$ is then computed as $\phi_i(t)=w_i \cdot t_i$, where $t_i$ contains the deregulation scores for the subset of genes in the gene set $\Gamma_i$. Pathway activities per pathway finally are embedded into a range between 0 and 1.

In order to assess the significance of computed pathway activities for a sample under investigation, empirical p-values can be derived from permutation testing. Based on a user-defined number of permutations, the scores of differential gene expression are randomly permuted and the corresponding pathway activities are re-computed. One-sided p-values for the sample's (actual) pathway activities are then computed relative to the mean of the empirical background distribution, where a right-sided p-value is computed if the sample's pathway activity for a specific pathway is larger than the mean of the background distribution and a left-sided p-value if it is smaller. In a last step, the derived p-values are adjusted for multiple hypothesis testing using the Benjamini-Hochberg method [1], [2].

Bibliography

  1. Benjamini, Yoav and Hochberg, Yosef Controlling the false discovery rate: a practical and powerful approach to multiple testing Journal of the Royal Statistical Society. Series B (Methodological) JSTOR
  2. Hochberg, Yosef and Benjamini, Yoav More powerful procedures for multiple significance testing Statistics in medicine Wiley Online Library