The EDTox is an R shiny web application to train a classifier for prediction of endocrine disruption potential of a compound using machine learning on toxicogenomics data. In this approach, starting from the molecular initiating events (MIEs) of a compound, a random walk with restart (RWR) is performed on a gene-gene co-expression network build using toxicogenomics data to identify the genes perturbed by the compound in question. The resulting list of perturbed genes is then used for fast gene-set enrichment analysis (FGSEA) to retrieve the activated pathways. The pathway activation scores obtained from FGSEA are used to train a GLM based classifier to predict the endocine disruption potential of a compound (defined as EDC-class probability score). The pipeline also allows an option to use pareto solution to select the optimal proportion of edges from the original network that will be used as input for RWR and the number of genes with highest probability after RWR to be taken into FGSEA. Validation of the model can be performed by k-fold cross-validation.
As a pilot study, we have trained classifiers based on toxicogenomics data from DrugMatrix, open TG-Gates and LINCS databases. An additional classifier based on a protein-protein interaction network from STRING database has also been trained. In each case, only liver genes were included in the initial network for training the classifiers. A sample set of 197 endocrine disrupting chemicals (EDCs) and 1336 negative controls were used as training set. The negative controls were selected such that they had maximum dissimilarity with the EDCs based on their MIEs. The MIEs of the compounds were retrieved from the Comparative Toxicogenomics Database (CTD). The accuracy of the trained classifiers can be viewed on the summary tab of this application.
While users can train a new classifier based on their own network and training set compounds, the application also provides a way to calculate EDC-class probability scores based on the 20 classifiers trained using the toxicogenomics data mentioned above. The application can also be used to view the EDC-class probabilities of the 12,278 compounds in CTD predicted using the EDTox pipeline and compare it with the ToxPi scores from ToxCast.

Application tabs

  • Home: The initial landing page. Contains an itroduction to the application.
  • Summary: Provides an overview of the accuracy of the pre-trained classifiers, the number of pathways included in the pipeline and basic statistics regarding the EDC-class probability scores of the CTD compounds.
  • Toxicogenomics Pipeline: Allows the user to train a new classifier starting from a network and lists of training set EDCs and negative controls along with their corresponding MIEs. Prediction of EDC-class probability scores for componds based on the newly trained can also be performed here.
  • Pathway activation scores: View and compare pathways predicted to be activated based on the different classifiers.
  • EDC-class probability: Retrieve the EDC-class probability scores of individual compounds. Also allows for comparision of EDC-class probability scores of upto five compounds.
  • Comparison with ToxPi Scores: Plot and compare the compiled EDC-class probability scores against the ED-based ToxPi scores from ToxCast.


Example files to run the application with de novo toxicogenomics data

As a case study, publicly available transcriptomic dataset of HepaRG cells treated with 31 compounds for 24h and 72h were used to show the applicability of this application. The input files required to train a new classifier using the toxicogenomics pipeline tab of this application can be downloaded from the links below. Here, the gene-gene co-expression network was generated from the expression values of samples treated with carcinogens (20 compounds) for 24 hours. The list of EDCs and negative controls along with their MIEs were retrieved from DEDuCT and CTD. A script for selection of negative controls for a given set of EDCs can also be located below. The main manuscript of the EDTox application contains further details regarding the case study.

To cite EDTox:
EDTox: an R Shiny application to predict the endocrine disruption potential of compounds
Standalone version of EDTox can be downloaded from: https://github.com/vittoriofortino84/EDC_shiny

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825762.

Fig: Overview of the software modules provided by the EDTox platform

Distribution of EDC-class probability scores of CTD chemicals

Pathways used in the pipeline

Accuracy of trained classifiers


Selection of optimal parameters for RWR-FGSEA

Training and validation of GLM based classifier


Export selected item

Prediction of new compounds

Molecular activity profiling of EDCs

EDC-class probability scores

Average and harmonic sum of EDC-class probability scores

Plot of compound EDC-class probability scores for selected classifiers

Comparison with ED-based ToxPi Scores

Average EDC-class probabilities VS ToxPi scores