NQuiX has been closely involved in the National Compound Collection pilot study – an evaluation of the chemical diversity represented in a sampling of chemistry PhD theses submitted in the UK and its potential for application in pharmaceutical research.
In work with the Royal Society of Chemistry and Bristol University, a team of data extractors created the National Compound Collection (NCC) database of ~75,000 chemical structures from over 700 theses. NQuiX developed computational methods for assessing the chemical diversity of the NCC in terms of fingerprint similarity, Bemis-Murcko frameworks and ring systems after substructural and property filtering for druglikeness. The approach was encoded in a script to facilitate standardized comparison to a wide range of compound collections including known drugs, published bioactive compounds, patented chemistry space, “purchaseable” compounds and various screening decks. For the latter, the script was run by 12 external drug discovery groups from pharma, biotech, academic and not-for-profit sectors. The output was combined to provide a broad view of the potential value of the novel chemical diversity present.
The results of the pilot study have been published in Chemical Science (DOI: 10.1039/C6SC00264A). Whilst the proportion of structures passing the various filters and appearing to be novel varies quite widely depending on stringency and compound collection, a subset of ~13k structures (~18% of the NCC) look to have good diversity. This seems like a very encouraging proportion and helps to frame thinking in terms of capturing additional value in UK academic chemistry research output.
The NCC database and scripts for comparison are available as part of the supplementary material. The NCC database has also been uploaded to ChemSpider to provide additional access and searching options.