We sought to infer gene druggability across the whole human exome (19,846 genes) leveraging historical data from known drug targets and other types of evidence around gene tractability, all integrated within the DrugnomeAI ML framework (Fig. 1a). We obtained lists of known or likely druggable genes from the Pharos24 and Triage2 resources to train the DrugnomeAI ML models (see Methods). We primarily used two training datasets from Pharos: Tclin (610 genes), consisting of genes that are targets of approved drugs with known mechanism of action, and Tchem (1592 genes), consisting of genes that are targets of compounds included in ChEMBL25 or DrugCentral26. In addition, we used three training datasets from the Triage resource: Tier1 (1411 genes), which comprises of genes with approved drugs and clinical-phase drug candidates, Tier2 (658 genes), consisting of genes with known bioactive drug-like small molecules and genes with high sequence similarity with approved drug targets, and Tier 3 A (845 genes), which consists of secreted or extracellular proteins that have distant similarity to approved drug targets and gene families not already included in Tier 1 or Tier 2. We trained DrugnomeAI on each of these training sets and extracted druggability predictions based on different types of evidence provided by each labelled dataset.

Fig. 1: Overview of DrugnomeAI framework and integrated data. a Illustration of the DrugnomeAI model development workflow. The whole exome (19,846 genes) is split into random balanced subsets of positive (i.e. druggable) and unlabelled (i.e. druggability is unknown) genes. An ensemble of classifiers is generated such that multiple models are trained on each subset with stratified tenfold cross-validation. The process is repeated for L stochastic iterations. The final druggability scores are obtained by averaging the prediction scores from out-of-bag sets across all stochastic iterations from the ensemble models. b Data resources integrated in DrugnomeAI. i Feature integration from 15 data sources. m: number of GWAS-specific terms relevant to a disease; k: number of MGI-specific terms relevant to a disease; n: number of tissues affected by a given disease. ii Data sources of genes druggability labels for disease-agnostic models. iii Resources for gene labels used for the domain-specific models (detailed descriptions for each model available in Table 4). *labels are extracted from PHAROS based on the input disease terms.

We tested a range of different druggability and gene-level annotation feature sets during DrugnomeAI training (Fig. 1b). Specifically, we explored four different feature sets, in increasing order of number of features:

1. “InterPro”, comprising of the feature set extracted exclusively from that resource; 2. “Pharos + InterPro” referring to the druggability-specific features only from the respective resources; 3. “All (druggability)” denoting all druggability-specific data sources along with the generic ones inherited from mantis-ml (namely ExAC and Essential Mouse genes data), and 4. “All + Mantis”, which in addition to the aforementioned datasets, encompasses other sources utilised in mantis-ml, such as gnomAD, Genic Intolerance, GWAS, and MGI essential data (see Methods).

We evaluated and compared the performance of four classifiers (Random Forest, Extra Trees, Support Vector Machine and Gradient Boosting) across different combinations of labelled datasets and feature sets employed for the predictions. We observe that the Gradient Boosting model consistently outperformed the rest of the classifiers across all configurations of label sets (Supplementary Fig. 2b) and feature sets (Supplementary Fig. 1). Gradient Boosting’s hyperparameters were further fine-tuned (see Methods) and it was selected as the default classifier for DrugnomeAI training.

Analysis of significant druggability-associated features with ablation and Boruta

In order to select an optimal non-redundant feature set we initially performed a basic ablation analysis. Specifically, we trained DrugnomeAI using three different feature sets, employing more or less extended druggability-associated features, and specifically the: “Pharos + InterPro”, “All (druggability)” and “All + Mantis” feature sets (already described in the previous section). AUC scores achieved by the “Pharos + InterPro” dataset were either identical or comparable with those extracted by the more extended “All (druggability)” and “All + Mantis” feature sets (Supplementary Fig. 1). Thus, we selected the “Pharos + InterPro” as the default feature set for DrugnomeAI to eliminate any non-informative redundancy from the more extended feature sets. Next, we performed feature importance analysis with Boruta algorithm27 for the Tclin (Fig. 2c) and Tier 1 labelled datasets (Supplementary Fig. 2a). Boruta is an iterative feature selection method to determine if a feature has a statistically robust predictive power. It compares the predictive power of each feature against randomised versions of the original feature set (called “shadow” features), using a Random Forest as the base model for classification. Weak features (i.e. features proved statistically less relevant than the maximum of “shadow” features) are removed. Once the model converges, a “confirmed” set of features (i.e. features that are considered predictive) are identified, and are ranked based on Z-scores representing importance scores (see Methods).

Fig. 2: Analysis of DrugnomeAI models’ predictive performance and top contributing features. a DrugnomeAI AUC score distribution across different classifiers and labelling variants utilising the druggability-specific dataset (the statistical significance of Gradient Boosting outperforming the other classifiers has been calculated using DeLong test, with the corresponding p values provided above each barplot). b AUC scores (with Gradient boosting) across different labelling variants utilising the druggability-specific dataset. c List of druggability-associated features extracted by the Boruta feature selection algorithm (as “Confirmed” features) on the Tclin dataset.

For both models, the most important features were related to protein-protein interactions based on the DGIdb28, InWeb29, Reactome30 and STRING31 networks. This is consistent with existing literature that has demonstrated that interaction partners of druggable genes are also more likely to be druggable2. In addition, protein-protein interactions are linked to biological and pathological processes and, recently, protein-protein interactions have gained increasing attention as drug targets due to their potential for selectively modulating specific pathways32,33. Upon performing principal component analysis (PCA; Supplementary Fig. 3) of the Tclin and Tier 1 datasets, we observed that the first principal components capture only ~4.5% of the variance, indicating the presence of highly non-linear relationships between the features (Supplementary Fig. 10).

After selecting the optimal feature set, we investigated its performance across different classifiers for an array of labelling variants. We provide detailed AUC score breakdown across the various configurations (Supplementary Fig. 2b). The Gradient Boosting classifier consistently and significantly outperformed the other algorithms across all the examined configurations. Specifically, we applied the DeLong test to compare the AUC scores attained by the Gradient Boosting against the respective performance from all other classifiers (Random Forest, Extra Trees, Support Vector Classifier and Deep Neural Net) based on the Tclin and Tier1 labelled datasets (Supplementary Fig. 24). We observe that Gradient Boosting significantly outperforms all other classifiers for both the Tclin and Tier1 labelled datasets (Tclin dataset – DeLong test p values of Gradient Boosting vs Random Forest: p = 4.34 × 10−18, Extra Trees: p = 1.01 × 10−18, SVC: p = 2.44 × 10−10, DNN: p = 6.58 × 10−12; Tier1 dataset – DeLong test p values of Gradient Boosting vs Random Forest: p = 5.04 × 10−29, Extra Trees: p = 2.83 × 10−30, SVC: p = 3.32 × 10−15, DNN: p = 1.71 × 10−10). Finally, Gradient Boosting’s AUC score is characterised by the lowest variance which means that a choice of a labelled set does not influence noticeably the classifier performance (Fig. 2a). As for the labelled dataset variants, the highest results were obtained using Tclin and Tier 1 (AUC = 0.99 and 0.97, respectively; Fig. 2b).

Validation and exploration of DrugnomeAI top hits

Since the best performance was achieved using a Gradient Boosting model trained with the Tclin or Tier 1 label sets, we use these predictions as our reference models for further analyses (referenced as DrugnomeAI-Tclin and DrugnomeAI-Tier1, respectively). We obtained the top 5% of genes ranked by DrugnomeAI-Tclin and DrugnomeAI-Tier1, each consisting of 992 genes (Supplementary Data 1). Notably, there is 63% (621 genes) overlap between the two sets (Supplementary Data 2).

Top DrugnomeAI hits with clinical evidence

We conducted a systematic review across all clinical development activities to identify genes that have been implicated as targets in therapeutic drug development (i.e. genes that have been selected for clinical development; see Methods) among the top 5% of genes ranked by the DrugnomeAI-Tclin and/or DrugnomeAI-Tier1. We grouped these genes into 20 rank intervals, each containing ~992 genes. We found that genes ranked in the top 5% by DrugnomeAI-Tclin were significantly enriched among genes selected for clinical development (Odds Ratio = 132.78, Fisher’s exact test p < 1 × 10−308; Fig. 3b, Supplementary Data 3, Supplementary Figs. 4, 5). 753 genes (63% of the interval) ranked in the top 5% by DrugnomeAI-Tclin and 268 genes in the 5–10% rank interval are supported by prior clinical development efforts (Fig. 3a). We observe similar levels of strong enrichment among genes ranked by DrugnomeAI-Tier1 (Fig. 3c, d). Remarkably, based on the cumulative distribution function we observe that 25% of top ranked genes by DrugnomeAI explain 95% of genes supported by clinical evidence (Fig. 3e), and 80% of genes supported by clinical evidence are ranked among the top 10% genes by DrugnomeAI.

Fig. 3: Validation of DrugnomeAI ranked genes using clinical evidence. Number of genes (n = 19,846) supported by clinical evidence per rank intervals based on predictions of (a) DrugnomeAI-Tclin and (c) DrugnomeAI-Tier1. 0–5% consists of genes ranked in the top 5% whereas 95–100% contains genes ranked in the bottom 5%. Enrichment of genes supported by clinical evidence in each rank interval based on predictions of (b) DrugnomeAI-Tclin and (d) DrugnomeAI-Tier1. Larger odds ratio values indicate higher enrichment. e Cumulative distribution function (CDF) plot of genes supported by clinical evidence per rank interval.

We conducted further analysis of the top 5% genes ranked by DrugnomeAI-Tclin and DrugnomeAI-Tier1. 76% and 61% of genes in Tclin and Tier1-based predictions, respectively, have been selected for clinical development. Furthermore, 627 (63%) and 475 (48%) genes from the Tclin and Tier1-based predictions, respectively, are targeted by small molecules. Of these genes, we found clinical trials had progressed into phase IV for 501 (51%) and 346 (35%) genes in Tclin and Tier1-based, respectively (Fig. 4a, b). We also analysed the therapeutic areas that the top 5% genes ranked by DrugnomeAI models have been implicated with, and observed that the majority of those genes have been selected for clinical development for genetic diseases, cell proliferation disorders (CPD), nervous system diseases and immune system diseases targeted by small molecules or monoclonal antibodies (Supplementary Fig. 11).

Fig. 4: Clinical and non-clinical evidence for the top 5% of genes ranked by DrugnomeAI. Clinical evidence available for the top 5% genes (n = 992) ranked by (a) DrugnomeAI-Tclin and (b) DrugnomeAI-Tier1. Each bar indicates the number of genes targeted by each molecule type per clinical trial phase. c Number of genes among the top 5% DrugnomeAI Tclin-based and Tier1-based predictions satisfying each distinct type of non-clinical evidence. d Number of genes among the top 5% DrugnomeAI Tclin-based and Tier1-based predictions satisfying multiple types (x = 1,2,..6) of non-clinical evidence. Asterisks (*) denote that the respective gene sets are significantly enriched for each type or set of types of non-clinical evidence compared to 10 random gene sets of equal size (the median p value is eventually used to assess significance in each case).

Genes with no prior evidence in clinical development

In the previous section, we demonstrated that there are 239 and 387 targets among the top 5% predicted hits from the Tclin and Tier1-based DrugnomeAI models, respectively, that do not have any clinical trials data associated with them (Fig. 4a, b). These genes are predicted by DrugnomeAI to be druggable but do not yet have drugs in clinical development. We identified potential associations between these genes and diseases using non-clinical evidence (i.e. associations between genes and diseases that are not supported by clinical trials). We used six types of non-clinical evidence: genetic, animal models, somatic mutations, RNA expression, pathways, and literature (Fig. 4c; Supplementary Fig. 28. see Methods). We performed enrichment analyses for the top-ranking genes from Tclin and Tier1 without clinical support, against one or more types of support accompanying each of them, performed via Fisher’s exact test against multiple random subsets of genes (null subsets). For each enrichment analysis, we report the median p value achieved across 10 iterations against random (null) genes sets (Fig. 4d, Supplementary Fig. 27). We found all 239 (Tclin-based predictions) and 386 out of 387 (Tier1-based predictions) genes to be associated with diseases and supported by at least two types of non-clinical evidence (Tclin – Fisher’s exact p = 1.5 × 10–08; Tier1 – Fisher’s exact p = 5.2 × 10–10; Fig. 4d, Supplementary Fig. 27). Large proportions of the Tier1 and Tclin top ranking genes without clinical evidence are further supported by three, four or even five types of support, and significantly so compared to random gene sets (Fig. 4d, Supplementary Fig. 27). Finally, there are 24 genes from Tclin-based predictions (Fisher’s exact p = 2.4 × 10–3) and 26 genes from Tier1-based predictions (Fisher’s exact p = 1.3 × 10–2) that are supported by six types of evidence (Fig. 4d, Supplementary Figs. 9, 27). While all levels of support are statistically significant, we observe that for genes supported by six types of evidence, Fisher’s exact test p value are relatively lower compared to the other analyses. This is expected though due to the smaller number of genes supported by all six types of non-clinical evidence. Overall, it’s notable that the top hits predicted by DrugnomeAI (without having prior clinical evidence) are highly and significantly enriched for multiple types of non-clinical evidence, suggesting that they are more likely to be biologically relevant with regards to their druggability potential.

We then expanded the enrichment analysis for non-clinical evidence across all genes ranked by the DrugnomeAI-Tclin and DrugnomeAI-Tier1 models. Overall, top ranked genes by the two models are significantly enriched among genes supported by genetic evidence (DrugnomeAI-Tier1: Odds Ratio = 5.8, Fisher’s test p value = 9.35 × 10−38 and DrugnomeAI-Tclin: Odds Ratio = 4.6, Fisher’s test p value = 4.64 × 10−32). In addition, there is high enrichment among genes supported by the other five types of non-clinical evidence (Supplementary Data 4, Supplementary Figs. 6, 7).

Next, we explored the features of genes not previously pursued clinically to examine whether there are any identifiable traits that would distinguish them from genes selected for clinical development. We plot the kernel density estimate of the top 20 features from Tclin and Tier1 and employ the Chi-squared statistical test to compare the distribution of any of these features between genes with or without clinical evidence (Supplementary Figs. 25, 26). Top ranked features include monoclonal count, antibody count, protein sequence length, and DGIdb interaction types (p value < 1 × 10−308) where we observe that genes without clinical evidence have on average smaller values than genes that have been selected for clinical development. Notably, the CTD processes “decreases metabolic processing” and “increases uptake” have non-zero distributions among the genes without clinical evidence and are significantly different from the distributions of genes with clinical support. This may suggest that genes with no prior clinical evidence may be involved in metabolic pathways which are highly complex or more challenging to target. Other significant features include associated pathways and interactions from the Comparative Toxicogenomics Database (CTD). For example, we observe that “CTD increased cleavage” is highly present among genes that have been selected for clinical development but is depleted in genes without clinical evidence. That is expected as cleavage is one of the most established steps involved in drug mechanism of action, such as antibody drug conjugates for treating treating tumours34, and seems to have already been studied and covered extensively among known drug targets (detailed explanation of all these features is available in Supplementary Data 5).

Enrichment with significant gene hits from large-scalePheWAS studies

We investigated the overlap between the top 5% DrugnomeAI predictions and the highly ranked genes from large-scale phenome-wide association studies (PheWAS) for binary and quantitative traits extracted from 450 K samples from the UKB cohort23 (see Methods). We analysed the enrichment of top 5% genes ranked by DrugnomeAI models and supported by clinical evidence with genes achieving genome-wide significance (p value < 5 × 10−8) from PheWAS in UKB (Fig. 5, Supplementary Fig. 8). We observe significant enrichment of top 5% genes ranked by DrugnomeAI-Tclin among the top PheWAS for binary traits (Odds Ratio = 2.9, Fisher’s exact test p value = 1.69 × 10−5) and for quantitative traits (Odds Ratio = 2.5, Fisher’s exact test p value = 1.56 × 10−7). Similarly, there is a significant enrichment of highly ranked DrugnomeAI-Tier1 predictions among top genes from PheWAS binary traits (Odds Ratio = 3.0, Fisher’s exact test p value = 4.63 × 10−5) and PheWAS qualitative traits (Odds Ratio = 3.0, Fisher’s exact test p value = 9.53 × 10−10).

Fig. 5: Enrichment of top 5% genes (n = 992) ranked by DrugnomeAI-Tclin and DrugnomeAI-Tier1 and supported by clinical evidence among the top UKB PheWAS hits for binary and quantitative traits. Genome-wide significant hits (p < 5 × 10−8) have been considered from the PheWAS analysis on 450 K samples from UKB. While the common top hits from DrugnomeAI and PheWAS are sorted by the number of significant hits in PheWAS for visualisation purposes, it is expected that many of the associated phenotypes may be highly correlated.

Enrichment of top DrugnomeAI genes with OMIM disease annotations

We then assessed how genes associated with OMIM diseases are ranked by DrugnomeAI models (Supplementary Fig. 23, Supplementary Data 6). We observe that genes associated with OMIM diseases are also significantly enriched among the top 5% ranked genes by DrugnomeAI-Tclin (Fisher’s exact test p value = 6.05 × 10−110, Odds Ratio = 4.6) and DrugnomeAI-Tier1 (Fisher’s exact test p value = 6.55 × 10−77, Odds Ratio = 3.6). Specifically, 506 (51%) and 452 (45%) genes ranked among the top 5% DrugnomeAI-Tclin and DrugnomeAI-Tier1 hits, respectively, have been associated with OMIM diseases. That suggests that a relatively large proportion of genes predicted to be highly druggable may also have high likelihood to be biologically relevant and carry out a disease-specific therapeutic effect.

Benchmarking against other druggability prediction methods

We sought to explore how DrugnomeAI compares with published methods for druggability prediction, focusing on methods that can perform disease-agnostic exome-wide druggability predictions. We selected three tools for this task, which provide either pre-calculated prediction scores or a code repository for reproducing their models: (1) TargetDB, a recently published tool employing a random forest model for tractability prediction10, (2) a recently published deep learning model by Yu et al.12 for protein druggability prediction, and (3) a decision tree-based meta classifier by Costa et al.14 for genome-wide prediction of morbid and druggable genes.

To assess the enrichment for top predictions from each model, we employed two data sources for validation as independent reference sets: the Open Targets tractability data for small molecules and antibodies and a list of genes with approved drugs from King et al.35. We investigated whether the top 5% genes (top 992 genes per model) from DrugnomeAI, TargetDB and the models by Yu et al.12 and Costa et al.14 preferentially overlap with each of the validation datasets. Remarkably, we observed that DrugnomeAI-Tclin has the highest overlap with the validation datasets. The top-ranked genes from DrugnomeAI-Tclin overlap with the validation datasets by 35%, 29%, and 149% more than the top-ranked hits from TargetDB, Costa et al. and Yu et al., respectively (Fig. 6). We also observe that the DrugnomeAI-Tclin overlap with approved drug targets from King et al. is statistically significant compared to the overlap from TargetDB (Fisher’s exact test p value = 3.9 × 10−15, Odds Ratio = 2.3), Yu et al. (Fisher’s exact test p value = 3.6 × 10−79, Odds Ratio = 17.2), and Costa et al. (Fisher’s exact test p value = 2.3 × 10−10, Odds Ratio = 1.9) with the same validation dataset (Supplementary Data 7).

Fig. 6: Overlap between top 5% genes (n = 992) ranked by DrugnomeAI-Tclin, DrugnomeAI Tier1, TargetDB, Costa et al. and Yu et al. across three validation datasets. Validation of the five models across three validation datasets: (a) Approved drug targets (King et al., 2019 dataset). b Open Targets druggability dataset for monoclonal antibodies. c Open Targets druggability data for small molecules. DrugnomeAI has a significantly more enriched overlap in the majority of pairwise comparisons (39 out of 42 comparisons). Coloured (with grey, yellow and green) asterisks indicate where DrugnomeAI models achieves significantly higher enrichment with each known validation set compared to TargetDB, Costa et al., and Yu et al., respectively (see also Supplementary Data 7).

We also performed a stepwise hypergeometric test to assess the enrichment of top predictions by DrugnomeAI, TargetDB, the Costa et al.14 and Yu et al.12 models among the validation datasets (Supplementary Fig. 12). To further quantify the enrichment, we calculated the area under the hypergeometric curve (AUC) of the enriched region (p value < 0.05) (Supplementary Data 8). In three out of the seven test cases (Supplementary Fig. 12), DrugnomeAI-Tclin demonstrated higher enrichment than DrugnomeAI-Tier1, TargetDB, Costa et al., or Yu et al. For genes targeted by small molecules, top predictions by DrugnomeAI-Tclin were significantly enriched for genes with approved drugs in Bucket 1 from Open Targets (AUC was 23-fold higher than TargetDB and Costa et al.) and genes selected for clinical development in Buckets 1–3 (AUC was 10-fold and 13-fold higher than TargetDB and Costa et al., respectively). In addition, we observed significant enrichment among genes with approved drugs in King et al. dataset (AUC was 3-fold and 2-fold higher than TargetDB and Costa et al., respectively). For genes targeted by monoclonal antibodies, DrugnomeAI-Tier1 top predictions are significantly enriched for genes in Buckets 1–8 (area under curve was 372-fold higher than TargetDB). Top predictions by TargetDB are more enriched among genes in Buckets 1–8 targeted by small molecules (AUC is 3-fold higher than DrugnomeAI-Tclin and Costa et al.). DrugnomeAI models exhibited lower enrichments for genes with approved monoclonal antibodies in Bucket 1 and genes selected for clinical development in Buckets 1–3. This could be due to the small number of genes in these datasets that achieved significant enrichment. In addition, the training sets (Tclin and Tier1) are likely skewed towards genes targeted by small molecules. Finally, the top hits by the Yu et al. model have low enrichment with zero AUC scores in all cases.

Therapeutic modality-specific models

Apart from the generic DrugnomeAI models, we developed models specific to three drug modalities: small molecule, monoclonal antibody, and PROTAC, which are trained on genes known to already be amenable by each modality type, respectively. We selected these modalities since small molecule and monoclonal antibody inhibitors are two of the main types of targeted therapies, and PROTAC technology is an emerging modality that can overcome some of the drawbacks of small molecule-based therapies36. In addition, these molecules tend to successfully target different types of proteins. For example, small molecules are quite amenable to targeting intracellular proteins while monoclonal antibodies can primarily target extracellular proteins37. Therefore, obtaining granular druggability scores for each therapeutic modality could help prioritise targets that are likely to be druggable by a specific drug modality. However, our framework is generic in nature, and it can be extended to other therapeutic modalities once a sufficient volume of appropriate training data is available.

We tested four classifiers (gradient boosting, random forest, SVC, and extra trees) per drug modality. Although the four classifiers achieved comparable performance in target predictability (AUC ≥ 0.94), gradient boosting models outperformed all other classifiers achieving AUC scores of 0.98, 0.99, and 0.97 for antibody, small molecule, and PROTAC modalities, respectively (Supplementary Fig. 13). We also observed high correlation of gene probability predictions across the four classifiers reaching Pearson’s r scores of up to 0.93, 0.94, and 0.95 for small molecules, monoclonal antibodies, and PROTACs modalities, respectively (Supplementary Fig. 14).

Exploring the top 50 genes ranked per drug modality reveals several novel genes (i.e. unlabelled genes with high rankings and not among the seed genes in the model training). There were 17 and 16 novel genes among the top 50 genes ranked for antibody and PROTAC modalities, respectively, while all the top 50 genes by the small molecule model were known genes (Supplementary Fig. 16). We also assessed whether the antibody-specific DrugnomeAI predictions were preferentially under-represented for intracellular proteins, which are difficult or impossible to be accessed by this modality type. Specifically, we found that only 182 out of the 1181 positive DrugnomeAI predictions from the antibody-specific model (probability score > 0.5) are found exclusively in the intracellular space, which is significantly lower than the overall representation of intracellular proteins in the rest of the exome (Fisher’s exact test p = 3.1 × 10−139, Odds Ratio = 0.17), based on another 7544 intracellular proteins found in the remaining 14,601 proteins of the exome with known information about their cellular localisation (as derived from Open Targets8). For reference, the training set for the antibody-specific DrugnomeAI model (as derived from Open Targets), showed a similar under-representation of intracellular proteins (22 out of the total 230) with information about their cellular localisation (Fisher’s exact test p = 3.4 × 10−38, Odds Ratio = 0.11).

Schneider et al.38 describes a set of 1067 genes as potential PROTAC targets, not previously described in literature, that are also distinct from the genes we used for training our PROTAC-based DrugnomeAI model. We explored how these genes are ranked by the DrugnomeAI PROTAC model (Supplementary Fig. 17) and, notably, observed high enrichment with 287 (27%) of these genes being ranked in the top 5% (Fisher’s exact test p value = 6.7 × 10−138, Odds Ratio = 9.5).

Oncology and non-oncology specific DrugnomeAI models

Considering that targets for oncology diseases have different regularity requirements for safety and efficacy, we examined whether genes that have been selected for development in the oncology space have distinct properties from genes targeted for other disease areas. To this end, we explored genes previously selected for development for CPD, which include cancerous and pre-cancerous conditions as well as neoplastic diseases and hyperplasia, as well as a narrower set consisting of only cancer-related genes. Therefore, we investigated five scenarios: (1) “CPD-sm” and (2) “CPD-ab”, which are trained on CPD genes targeted by small molecules and antibodies, respectively; (3) “non-CPD-sm” and (4) “non-CPD-ab”, which are trained using genes targeted by small molecules and antibodies, respectively, and these genes have not been selected for development for CPDs; and (5) “cancer-sm” model using genes in cancer cell lines linked with small molecules.

In all five scenarios, the classifiers that were tested achieved high performance (AUC > 0.93) (Supplementary Fig. 18). Gradient boosting again outperformed all other classifiers achieving AUC scores of 0.99, 0.98, 0.98, 0.98, and 0.96 for cases (1)–(5), respectively. Overall, there is high correlation of gene probability predictions across the four classifiers reaching Pearson’s r scores of up to 0.95 for the “cancer-sm” case and 0.93 for the remaining cases (Supplementary Fig. 19), selecting again Gradient Boosting as the default classifier. We then aimed to determine whether there are any novel genes among the top 50 genes ranked by DrugnomeAI in each case (Supplementary Fig. 20), identifying 22 and 30 novel genes for the CPD-ab and non-CPD-ab models, respectively.

Significant features analysis of domain-specific DrugnomeAI models

We sought to explore the most important features for predicting druggable genes for each modality. Analysis of confirmed features by the Boruta algorithm shows that features derived from protein-protein interaction networks (“seed genes overlap hmean score”, “inferred seed genes overlap”, “experimental seed genes overlap”, “Re seed genes overlap”, and “StringDB protein genes overlap”) are high contributors for the three modalities (Supplementary Fig. 15). These features represent the ratio of known druggable genes interacting with a candidate target from InWeb and StringDB (see Methods). In addition, features derived from interaction data, such as DGIdb interaction types (number of gene-drug interactions from DGIdb) and CTD unique interactions (number of unique chemical-gene interactions from CTD) as well as monoclonal count (number of monoclonal antibodies for a target) are also high contributors for all three drug modalities. These features indicate that the druggability problem can better be addressed from a systems biology point of view instead of pursuing each target individually. Moreover, associated pathways from CTD is a top feature for small molecule and antibody modalities, while protein-coding sequence length is a top feature for the small molecule and PROTAC modalities (detailed explanation of each feature is available in Supplementary Data 5).

Similar to the modality-specific models, Boruta analysis showed that features from protein-protein interaction networks, associated pathways, unique interactions, monoclonal count, and sequence length were among the top features for the oncology and non-oncology specific DrugnomeAI models (Supplementary Fig. 21). In addition, we observed features associated with pathways from CTD (representing the presence or absence of a gene in a given pathway) are top contributors for druggability prediction. Specifically, CTD apoptosis is a high feature for “CPD-sm” and CTD Phagosome is among the top features for “CPD-ab” while CTD Metabolism is a top feature for both of the oncology-related small molecule modalities (“CPD-sm” and “cancer-sm”).