EDTox
Description
The EDTox is an R shiny web application to train a classifier for prediction of endocrine disruption
potential of a compound using machine learning on toxicogenomics data. In this approach, starting from
the molecular initiating events (MIEs) of a compound, a random walk with restart (RWR) is performed on a
gene-gene co-expression network build using toxicogenomics data to identify the genes perturbed by the
compound in question. The resulting list of perturbed genes is then used for fast gene-set enrichment
analysis (FGSEA) to retrieve the activated pathways. The pathway activation scores obtained from FGSEA
are used to train a GLM based classifier to predict the endocine disruption potential of a compound
(defined as EDC-class probability score). The pipeline also allows an option to use pareto solution to
select the optimal proportion of edges from the original network that will be used as input for RWR and
the number of genes with highest probability after RWR to be taken into FGSEA. Validation of the model
can be performed by k-fold cross-validation.
As a pilot study, we have trained classifiers based on toxicogenomics data from DrugMatrix, open
TG-Gates and LINCS databases. An additional classifier based on a protein-protein interaction network
from STRING database has also been trained. In each case, only liver genes were included in the initial
network for training the classifiers. A sample set of 197 endocrine disrupting chemicals (EDCs) and 1336
negative controls were used as training set. The negative controls were selected such that they had
maximum dissimilarity with the EDCs based on their MIEs. The MIEs of the compounds were retrieved from
the Comparative Toxicogenomics Database (CTD). The accuracy of the trained classifiers can be viewed on
the summary tab of this application.
While users can train a new classifier based on their own network and training set compounds, the
application also provides a way to calculate EDC-class probability scores based on the 20 classifiers
trained using the toxicogenomics data mentioned above. The application can also be used to view the
EDC-class probabilities of the 12,278 compounds in CTD predicted using the EDTox pipeline and compare it
with the ToxPi scores from ToxCast.
Application tabs
- Home: The initial landing page. Contains an itroduction to the application.
- Summary: Provides an overview of the accuracy of the pre-trained classifiers, the number of pathways included in the pipeline and basic statistics regarding the EDC-class probability scores of the CTD compounds.
- Toxicogenomics Pipeline: Allows the user to train a new classifier starting from a network and lists of training set EDCs and negative controls along with their corresponding MIEs. Prediction of EDC-class probability scores for componds based on the newly trained can also be performed here.
- Pathway activation scores: View and compare pathways predicted to be activated based on the different classifiers.
- EDC-class probability: Retrieve the EDC-class probability scores of individual compounds. Also allows for comparision of EDC-class probability scores of upto five compounds.
- Comparison with ToxPi Scores: Plot and compare the compiled EDC-class probability scores against the ED-based ToxPi scores from ToxCast.
Databases
- Toxicogenomics data for network construction
- Other networks
- Pathways
- Selection of EDCs and negative controls
Example files to run the application with de novo toxicogenomics data
As a case study, publicly available transcriptomic dataset of HepaRG cells treated with 31 compounds for 24h and 72h were used to show the applicability of this application. The input files required to train a new classifier using the toxicogenomics pipeline tab of this application can be downloaded from the links below. Here, the gene-gene co-expression network was generated from the expression values of samples treated with carcinogens (20 compounds) for 24 hours. The list of EDCs and negative controls along with their MIEs were retrieved from DEDuCT and CTD. A script for selection of negative controls for a given set of EDCs can also be located below. The main manuscript of the EDTox application contains further details regarding the case study.
- HepaRG Network
- List of EDCs along with their MIEs
- List of negative controls along with their MIEs
- List of unlabelled compounds
- Script for selection of negative controls
To cite EDTox:
EDTox: an R Shiny application to predict the endocrine disruption potential of compounds
Standalone version of EDTox can be downloaded from: https://github.com/vittoriofortino84/EDC_shiny