Scholia Chemistry: access to chemistry in Wikidata

ScholiaChemistry:accesstochemistryinWikidata

Egon L. Willighagen1, Denise Slenter1, Adriano Rutz2, Daniel Mietchen3,4,5, Finn Å. Nielsen6

  1. Department of Translational Genomics, NUTRIM, Maastricht University, Netherlands
  2. Institute of Molecular Systems Biology, ETH Zürich, Switzerland
  3. FIZ Karlsruhe — Leibniz Institute for Information Infrastructure, Germany
  4. Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Germany
  5. Institute for Globally Distributed Open Research and Education (IGDORE), Germany
  6. Cognitive Systems, DTU Compute, Technical University of Denmark, Denmark

egon.willighagen@maastrichtuniversity.nl

May 11, 2025

Abstract

Sharing knowledge on chemicals in the digital age has been the playground of databases such as the Chemical Abstract Services and PubChem. Wikipedia complements this field by providing context to chemicals aimed at a broad audience, but is not easily read by machines. Wikidata was started as a database service to improve the machine readability of the knowledge captured in Wikipedia. Wikidata has an open license, application programming interfaces, and a strong provenance model. Scholia uses the features to provide access to chemical knowledge. This study reviews the chemistry in Wikidata, shows how thousands of new chemicals were added, extends Wikidata with new properties for chemical representation and external links to addi- tional databases, and shows how we extended Scholia to represent the chemistry in Wikidata.

Introduction

Making chemical databases more FAIR (findable, accessible, interoperable, and reusable) and open- licensed benefits both computational chemistry and cheminformatics [1, 2]. The discussion of the need for FAIR and Open in chemistry has been ongoing for some time now [3, 4]. The use of the term open science in chemistry dates back to 1999 when Prof. Dan Gezelter set up the openscience.org domain [5]. In the past two decades, large amounts of openly licensed chemistry data have become available from databases and projects such as PubChem [6], ChEMBL [7], ChEBI [8], Nanosafety Data Interface [9], NORMAN [10], EPA CompTox Dashboard [11], Chemical Abstracts Services [12], and LOTUS [13]. However, significant gaps in open chemistry data remain, including experimental physicochemical data such as boiling and melting points, acidity constants, structural representations including (full) stereochemistry, and more.

The provenance of chemical knowledge is important, requiring facts to be supported by primary literature, leading to a detailed literature-backed history of the knowledge on a certain chemical. Cur- rently, finding out when a melting point of a common drug was measured (and how) is more compli- cated than assessing who played some character in episode 7 in season 5 of some TV series. Wikipedia is perhaps the most impactful project to make chemistry data available at a large scale and in a multilin- gual manner, attracting interest from many researchers [14, 15, 16, 17, 18, 19]. However, extracting the data of Wikipedia is nontrivial, leading to additional projects such as DBPedia from a domain-general fashion, including chemistry, that extract knowledge from Wikipedia and make it available as linked data [20]. DBPedia shares the same license as Wikipedia, and the choice of Wikidata is partly its CCZero license. We refer to literature for comparisons, e.g. by Färber et al. [21].

Here, we discuss Wikidata, a more recent sister project of Wikipedia and hosted as part of a shared ecosystem of platforms under the umbrella of the Wikimedia Foundation [22, 23]. Compared to Wikipedia, Wikidata has three key differences: first, Wikidata has a very liberal CC0 open license; second, the labels are localized, allowing compounds to have names in any language; and, third, the captured data is directly machine-readable, making the database far more useful for interoperability of molecular databases in systems biology [24, 25, 26].

Information can be added to Wikidata through different options. First, there is the Wikidata website (wikidata.org) which provides an interface to manually add items and statements about those items [27]. Complementing that, there are various tools that take advantage of Wikidata’s application programming interfaces (APIs) to scale up adding content to Wikidata. These include QuickStatements (wikidata.org/wiki/Help:QuickStatements), WikidataIntegrator (github.com/sulab/WikidataIntegrator), and OpenRefine [28]. Optionally, this process is supported by formal definitions of the data to be en- tered, e.g. by way of shape expressions in ShEx [29]. These features make Wikidata an interesting database to be used in and extended for research [24].

Chemistry in Wikidata

Collaboration in Wikidata is self-organizing. Kanke describe one mechanism: “A WikiProject is a group of individuals who want to collaborate to improve Wikidata on a specific topic or a particular task.” [30]. Thanks to the WikiProject Chemistry community on Wikidata [31], there is a growing amount of infor- mation about chemical compounds available on the platform. This increase turned out to be a perfect basis for supporting open science projects that depend on the reuse of information about chemicals, with a prime example being WikiPathways [32, 33, 34]. Chemistry was already present in Wikidata when this effort started [25], and Wikidata already had around 300,000 chemical compounds de- scribed. However, the WikiPathways project required identifier mappings for metabolites (a class of chemical compounds) to other chemistry databases and identifiers in metabolomics datasets, and found its solution in Wikidata, as it already had identifiers for various databases. Besides PubChem, ChEBI, HMDB [35], ChemSpider [36], and other chemical databases, these included InChI and InChIKey [37], which are useful for matching chemical structures from different resources. Moreover, many structures had SMILES, IUPAC and trivial names, and links to primary literature.

A second example of chemistry in Wikidata is that of chemical substances like nanomaterials, which lack a well-defined chemical graph representation. This characteristic is particularly relevant to specific nanomaterials that we want to link to their primary literature. For example, the European Commis- sion Joint Research Center (JRC) released a list of representative industrial nanomaterials for European research to focus on, allowing comparisons of results and methods [38]. To support research on sub- stances on this list, the NanoCommons project created an interactive list of the primary literature on these JRC materials [39, 40].

Wikidata currently captures information on more than a million chemical structures and increased substantially over time: for example, on 20 April 2025 and with 19 October 2024 as reference point, it contained 1,363,439 unique InChIKeys (was 1,331,451), 1,322,933 PubChem CIDs (1,207,860), and 151,512 ChEBI identifiers (120,276). While these numbers showed an overall increase, we noted that the number of CAS registry numbers has gone down in this time period (940,440 CAS Registry Num-bers, previously 942,689), but the exact reason for this decrease is difficult to establish. Furthermore, Wikidata provides links to experimental data, including melting point for 31,561 molecules, NMR spectra for 18,392 compounds, mass spectra for 15,119 compounds, crystal structures for 22,405 chemicals, and acidity constants for 232 compounds. The information is continuously curated, as illustrated at (scholia.toolforge.org/chemical/#statistics), and Wikidata links the chemical compounds to items in other domains with general properties such as has use (see wikidata:P366).

Finding chemical information

While the APIs provide a powerful way to retrieve and add information from and to Wikidata, the data are presented around a single item [27]. However, information that a user may want to visual- ize together is typically distributed over multiple items. For example, values of the physicochemical properties are directly visible, but information about the units is provided by another Wikidata item (although the web interface does provide labels). Another example of this distributed data are the references, that often are an item themselves including more detailed information such as the DOI to allow the reader to quickly locate the article.

To overcome the problem of browsing information in Wikidata, Scholia was originally developed as a graphical interface for scientific bibliographic information [41]. Scholia uses SPARQL queries to collect data from multiple items and visualizes this knowledge as tables and graphs in a user-guided manner, yet without users needing to understand the technical details how that was done.  For a selection of types, e.g. work, author, and geneScholia has separate pages, called aspects. Each aspect has a number of sections, called panels, and each panel is based on a single SPARQL query that is parametrized with the aspect and one or more Wikidata identifiers of the items to be profiled.

Scholia can be adapted for various use cases, as previously demonstrated for geospatial data [42]. On that basis, we realized [43] that Scholia could likewise be used to explore and curate chemical knowledge by pulling together data from various corners of Wikidata in a user-friendly manner. We describe here our approaches and methods to make this collection possible.

2. Implementation

Cheminformatics: the Chemistry Development Kit

The Chemistry Development Kit (CDK) version 2.9 [44] was used to work with the chemical graph information in Wikidata; the CDK was used to parse the SMILES string, to compute the mass and the chemical formula, and to interact with the InChI library to calculate InChI and InChIKey. The CDK, via CDK Depict, was also used to generate 2D diagrams of the chemical structures. For this, the Docker image of the CDK Depict tool (github.com/cdk/depict) was installed on Wikimedia Toolforge (www.toolforge.org).

User Interface: Scholia

Scholia is a Python/Flask-based server system that creates webpages using a template approach [41]. Templates were defined for concepts around knowledge exchange, such as publications, journals, pub- lishers, and topics. Scholia used the term ’aspect’ for these templates. The aspects used SPARQL queries against the Wikidata Query Service (WDQS, wikidata.org/wiki/Wikidata:SPARQL_query_service) and visualize the data in various panels.

Three aspects were developed: 1. chemical, for molecules such as paracetamol, 2. chemical-class for compound classes such as sphingolipids, and 3. chemical-element. The routing was specified in the Scholia scholia/app/views.py file for both the landing and aspect pages for specific molecules, classes, and elements. The landing pages were written to show Wikidata examples for which the respective Scholia aspect provides useful information. Each aspect page that provides information for an item uses several SPARQL templates.  Scholia injects the Wikidata identifier of the item depicted using

Figure 1: Two screenshots of the canonical SMILES statement for aspirin (wikidata:Q18216): the top one shows the default Wikidata page, while the bottom one shows what it looks like with the cdkdepict_gadget. js.

one or more {{ q }} parameters. Templates were written for those aspects and stored in the scho- lia/app/templates folder.

User Interface: Wikidata

While Scholia provides an integrated interface to explore Wikidata content, the original Wikidata Web interface is still used for data curation. The chemical graphs of molecules and compounds were rep- resented as two flavors of SMILES in Wikidata, one with chirality (wikidata:P2017) and one without (wikidata:P233). We requested an additional property for the ChemAxon Extended SMILES (CXS- MILES, [45]) for chemical classes, polymers, and functional groups (wikidata:P10718), via the regular Wikidata mechanism at Wikidata:Property_proposal.

Content Generation: Bacting

Finally, we used a combination of Bacting (continuation of Bioclipse) [46, 47], QuickStatements (quick- statements.toolforge.org), and manual curation to add missing chemical compounds reported in bio- logical pathways from WikiPathways [48, 49]. Where needed, new Wikidata properties were proposed.

3. Results

We first report results closer to the chemistry content in Wikidata and then continue with the chemistry extensions of Scholia, the main topic of this article. This structure follows the dependency that Scholia is primarily an interface around Wikidata and Scholia can only present data that is available in Wikidata via developed SPARQL queries.

Wikidata and Chemistry

First, we deployed a CDK Depict instance at (cdkdepict.toolforge.org). This instance is used to provide 2D depictions on Wikidata and on Scholia, the first with the custom-developed cdkdepict_gadget. js

Figure 2: Screenshot of the chemicalaspect landing page at scholia.toolforge.org/chemical/.

(see Figure 1), and the second with Scholia aspects, as outlined in the next section. The gadget works for SMILES for a specific chemical structure, for SMILES arbitrary target specification (SMARTS) for chemical substructures, and for CXSMILES for compound classes like lipid classes, polymers, and func- tional groups. Any Wikidata user can use the gadget by adding it to their personal UI extensions in the

Wikidata account’s common.js file by including a importScript(User : Egon_Willighagen/cdkdepict_gadget. js); statement.

Second, we developed a set of Groovy scripts using the Bacting library. The first script is cre- ateWDitemsFromSMILES.groovy. In the simplest use case, this script takes a file with a list of SMILES strings, one per molecule. For each molecule, the InChI and InChIKey will be calculated, and the InChIKey will be used to check whether the molecule is already present in Wikidata. If not present, the script will generate a chemical formula, calculate the molecular mass, look up the PubChem compound identifier (using the InChIKey), and create QuickStatements instructions which the user can copy/paste to the online QuickStatements service to create Wikidata items for the missing molecules. A second set of Bacting scripts was used to add external identifiers from the EPA CompTox Dashboard, the PDB Ligand database, and lipid databases such as LIPID MAPS [50] and SwissLipids [51]. These scripts can be found in the ExtIdentifiers folder in the same GitHub repository. Similarly, experimental data can be added, such as melting points from the Jean-Claude Bradley Open Melting Point dataset [52].  Here, too, the InChIKey is used to match items in Wikidata and if a match is found, QuickStatements instructions are created.  These steps were documented in the Melting-Points/createQuickStatements.groovy script.

Added content

Because Wikidata is an open wiki system, it is not trivial to summarize the exact amount of content that was added as part of this work, but in this section, we will attempt to provide some indication of that. An approach in this direction is the Navelgazertool by User:Bamyers99, which allows you – with

Figure 3: Screenshot of the chemicalaspect page for dioxane at /chemical/Q161532.

some caveats – to list the users who added the most statements with a certain property. For example, many canonical SMILES (wikidata:P233) statements have been added by the authors EW and AR, the NPImporterBot), and by an account unknown to Navel gazer, i.e. where it could not determine the source (see bambots.brucemyers.com/NavelGazer.php?property=P233, which we expect to include the team of Andrew Su [26]. Likewise, navel gazing into SwissLipidsIDs(wikidata:P8691), LIPID MAPSID(wikidata:P2063) shows a similar impact of the work of the authors.

The same tool also provides us with a second way to look at the content added by the authors. For example, EW added close to half a million chemical compounds, as shown by Navel gazer for the EgonWillighagen account, see bambots.brucemyers.com/NavelGazer.php?username=Egon+Willighagen. The counts for mass(wikidata:P2067) and PubChemCID(wikidata:P662) overestimate the newly created items, while canonical SMILES (wikidata:P233) underestimates the true number. Getting exact statis- tics is hard, because each item is not formally limited to one SMILES representation or one external identifier. Nevertheless, it gives a reasonable overview of what can be found in Wikidata.

These statistics also provide an overview of the external identifiers of chemicals on which they have been worked on. For example, AR added significant contributions to external identifiers in Wikidata

for NMRShiftDB [53] (wikidata:P9405), KNApSAcK [54] (wikidata:P2064), and the Natural Products Atlas [55] (wikidata:P7746), see bambots.brucemyers.com/NavelGazer.php?username=AdrianoRutz. We also like to comment on the multilingual aspects of the chemistry in Wikidata: Wikidata items

have labels in many languages, and they link to Wikipedia, which often has entries in many languages,

Figure 4: Screenshot of searching for rhodium in Ukranian on the Scholia frontpage.

too. This feature allows searching for chemicals in multiple languages (see Figure 4), though the cov- erage of names other than English is limited. First, of all the chemicals in Wikidata with an InChIKey, only 28,939 have a linked page in any of the Wikipedias (April 2025 data). Of these, 29% (8224 compounds) have only one sitelink, e.g. to the English Wikipedia and 90% has at most 18 sitelinks to various Wikipedias, while 109 compounds have 100 or more sitelinks (edu.nl/3xkhw), with water hav- ing the most sitelinks, 272 in total (edu.nl/qxp6b). However, that leaves a vast majority of chemicals in Wikidata without a matching page in Wikipedia.

As a result of this, many chemical entities in Wikidata originate from other databases such as the EPA CompTox Dashboard, SwissLipids, LIPID MAPS, and LOTUS as major sources of chemical structures. Because these are English-language databases, it is no surprise that many chemical structures only have an English name. Indeed, 96.9% of the 1.3 million chemicals in Wikidata only have an English

name (edu.nl/gre6m). On the other hand, the name of water is available in Wikidata with names in 70 more languages than in Wikipedia.

The Scholia aspects

Three aspects were developed to support chemistry: chemical, chemical class, and chemical element. Each aspect shows information specific for the chemistry and is in addition to the common topic aspect, which applies to all three categories of chemistry. Each aspect comes with a landing page. For example, the chemical aspect has scholia.toolforge.org/chemical/ (see Figure 2). These pages provide a starting point to explore how this particular aspect can be used. The Singleitemscolumn links to items that can be opened in that aspect. The Redirecting column shows some Scholia URL patterns. For example, for molecules, the chemical aspect shows patterns for the PubChem compound identifier, CAS registry number, InChIKey, and LIPID MAPS identifier.

The chemical aspect

The chemical aspect shows information about the molecules. For example, the page for dioxane is scholia.toolforge.org/chemical/Q161532. Figure 3 shows the top of this page, with a short description from Wikipedia, the 2D chemical structure generated from the canonical SMILES using the CDK Depict service on Toolforge, related topics in Wikidata, and structural information. The Table of Contents, also visible, shows other information available on this page. All sections on this page show results from a SPARQL query that the user’s web browser runs live. That is, each time a page is opened, it shows the current data in Wikidata using the developed SPARQL templates. The user can use the Wikidata Query Service links below the query results to get the exact SPARQL query.

Several sections of the chemical aspect focus on the main functionality of Wikidata and Scholia. The Identifiers section shows identifiers to databases. Here, the Wikidata properties are used that are annotated as Wikidatapropertytoidentifysubstances(wikidata:Q19833835). When Wikidata adds new such properties, these are automatically shown on Scholia without new development needed. For dioxane, 23 identifiers were listed at the time of writing.

The Compounds with same connectivity section takes advantage of the structure of the InChIKey. This query performs a search in Wikidata for other items that have an InChIKey with the identical first block. For example, for dioxane it shows all molecules with an InChIKey that starts with RYHB- NJHYFVUHQT. Here, RYHBNJHYFVUHQT is the first fourteen characters of the full InChIKey, covering solely the atomic connectivity [37, 56]. This search, therefore, finds all chemical structures with the same (non-hydrogen) chemical graph, including compounds with different charges and with nonstandard isotopes. For each molecule found for the first InChIKey part, the CAS registry number, the ChemSpider identifier, and the PubChem compound identifier (CID) are shown.

The Physchem Properties section shows experimental data about the chemical, including melting and boiling point, ionization energy, dipoles, and more, when provided in Wikidata. Taking advantage of the rich provenance model of Wikidata, the data can also show where the information came from, when available with the DOI of the source. For the sources, links to other Scholia pages are provided to give more information about those sources, while the DOI links take the user directly to the source itself. It should here also be noted that those source pages, using the work aspect, have a Supports the following statement(s) panel showing a list of statements supported by that source.

The panel Taxa in which the chemical was found is also relevant for chemistry research. Although this page was created well before the LOTUS project [13], this panel got filled increasingly with all the data added by LOTUS. While the page also provides some information about Recent publications and Publicationsperyearabout or that mention the chemical, the original topicaspect is more richer in the context it provides.

Thechemical-classaspect

The chemical-class aspect is similar to the chemical aspect but starts with the class hierarchy, depicted as a network with subclasses in orange and blue, respectively for specific molecules and for chemical classes (see Fig. 5). When the chemical class is linked to a CXSMILES (searchable on docs.chemaxon.com), then a 2D depiction of that is shown, which works very well for polymers, such as polyester, see scholia.toolforge.org/chemical-class/Q188245.

To support browsing the specific chemical compounds in this class, this aspect provides the Example Compoundspanel, which lists specific molecules, and for those, their InChIKey, CAS registry number, ChemSpider identfier, and PubChem CID, similar to the Compounds with same connectivity panel of the chemical aspect.

Other panels for the chemical-class aspect are similar to that of chemical, however, are for any compound in this class, such as the Recent published works, the Publications per year, and the Taxa in which the chemical class was found panels.

Figure 5: Screenshot of the chemical-class aspect page for primary alcohols (wikidata:Q2832210). The hierarchy shows child classes in blue like primary fatty alcohol and specific compounds in orange like choline. The chemical structure in the top right is based on the CXSMILES.

Thechemical-elementaspect

Finally, the chemical-element aspect provides information about the chemical elements. For example, scholia.toolforge.org/chemical-element/Q623 shows information about carbon (see Figure 6). This aspect shows various panels, including a list with some key properties such as oxidation states and electron configuration. Two other panels provide information on the isotopes of the element and the allotropes. Carbon shows fifteen isotopes, each with the mass and half-life. Wikidata further lists twenty-two allotropes, including diamond and graphite. If known, the density of the allotrope is listed.

Figure 6: Screenshot of the chemical-elementaspect page for carbon.

4. Discussion

Perhaps the first question is whether there was a need to update Scholia for the chemistry domain, when Wikipedia already provides a lot of information for important chemicals. A lot of the data in Wikipedia pages can be extracted from Wikipedia and made available as machine-readable data [20]. However, the Wikipedia ChemBoxes are not trivial to use, and SMILES and InChI information is hidden in some Wikipedia languages, including English. Moreover, many chemicals do not have Wikipedia pages (see w.wiki/7qvT), despite them being, for example, important natural products (see w.wiki/7qvn). An- other difference is that Wikidata can link chemical entities to a longer list of primary literature that makes statements about that chemical. Moverover, Scholia can give a summary of all that literature.

The comparison to other databases, such as PubChem, is relevant too. As this paper shows, the approach is quite comparable, listing information about the chemical from multiple perspectives. The added value of Wikidata and Scholia is that these tools are not limited to a chemical graph represen- tation of the chemical entity, and compound classes as well as polymers are supported too. Moreover, because Wikidata is not limited to one type of entity, compounds can be linked to other concepts, like synthesis routes, concepts from which the name was derived, providing flexibility for future data additions.

The approach of SPARQL query-based panels in dedicated aspects provides a versatile solution, and users can inspect, reuse, and modify the query behind each panel or download the data in a repro- ducible way. Finally, the use of SPARQL makes it easy to use the (ontological) hierarchy in Wikidata. This structure allows to present literature about alkanes, even if the article itself is annotated as being about butane. That said, the content captured in Wikidata does not replace PubChem, which is several orders of magnitude larger.

Another point to realize is that Wikidata is a wiki and that many people contribute to it as a main pro, however that there are no gatekeepers as its main con. There are editors, of course, and the Wikidata:WikiProject Chemistry community is quite active. The authors are part of this community, and have contributed to the quality of the project and the validation thereof. Fortunately, these approaches are greatly supported by the machine readability, and quality assurance is performed at various levels. First, there are constraints on the use of certain properties. For example, adding a boiling point without a pressure at which it was measured triggers a flag. Similarly, the validity of SMILES has been tested for consistency with other properties. However, this check is not done routinely. Similarly, external identifiers need to be validated, which we did, for example, for CAS registry numbers against Common Chemistry [12].

Complementary to changes in the information about chemicals, changes in the data model also happen. These can require updates to the SPARQL queries of Scholia. For example, in 2022, the model changed for how chemical compounds are to be represented. Previously, chemicals were instanceof (wikidata:P31) a chemical compound (wikidata:Q11173), but now they are instance of the more general typeofchemicalentity(wikidata:Q113145171). This change required many SPARQL queries to be updated. Similarly, ontological aspects of hierarchical structures in Wikidata continue to evolve and discussions date back to 2016 [57, 58, 59].

5. Conclusion

We here presented Scholia Chemistry as an extension of Scholia, i.e. as an interactive graphical interface to chemistry data in Wikidata. Scholia Chemistry takes advantage of the machine readability aspect of Wikidata by dynamically running SPARQL queries. To support the depiction of chemical structures, CDK Depict running on the Toolforge platform is used, allowing for tight integration. The link to primary literature on the underlying chemistry is essential, which makes Wikidata and Scho- lia a great place to capture knowledge about notable chemicals. Scholia can also be used to identify knowledge gaps, but the manuscript does not touch upon this. Summarizing, this paper shows how the combination of Wikidata and Scholia creates a solution to maintain a large collection of chemical compounds, enabling curation by a community.

Acknowledgements

We are grateful to the Wikidata project, all contributors to Wikidata:WikiProject Chemistry, and all the people who contributed software patches or data to the Scholia project. We particularly thank the people behind the following accounts for their curation of Wikidata content: Data4Chem, GZWDer, James_Hare_(NIOSH), KrBot, Marbletan, NPImporterBot, ProteinBoxBot, Reinheitsgebot, SCIdude, Sebotic, SoCalChemBot, and Wostr. We also thank Dr Magnus Manske for QuickStatements work and Dr John Mayfield for the CDK Depict software.

Fundinginformation  The project received funding from the Alfred P. Sloan Foundation (ror:052csg198) under grant number G-2019-11458 (wikidata:Q64711209). It was also supported by the Deutsche Forschungsgemeinschaft (ror:018mejw64) under grant number 521466146 for the Knowledge Graph

Initiative (KGI4NFDI) of the BASE4NFDI project (wikidata:Q113544452), and by the European Com- mission (ror:00k4n6c32) under grant number 101129751 for the AQUANAVI initiative of the OSCARS project (wikidata:Q134238363).

References

  • Simon J Coles et al. “Taking FAIR on the ChIN: The Chemistry Implementation Network”. In: Data Intelligence 2 (2020). [cito:citesAsAuthority], pp. 131–138. ISSN: 2096-7004. DOI: 10. 1162/dint_a_00035.
  • Noel M O’Boyle et al. “Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on”. en. In: JournalofCheminformatics3.1 (2011). [cito:citesAsAuthority], p. 37. ISSN: 1758-2946. DOI: 10.1186/1758-2946-3-37.
  • Peter Murray-Rust. “Chemistry for everyone”. en. In: Nature451.7179 (Feb. 7, 2008) [cito:citesAsRecommendedReading], pp. 648–651. ISSN: 1476-4687. DOI: 10.1038/451648A.
  • Kevin Maik Jablonka, Luc Patiny, and Berend Smit. “Making the collective knowledge of chem- istry open and machine actionable”. en. In: Nature Chemistry (Apr. 4, 2022). [cito:citesAsAuthority]. ISSN: 1755-4330. DOI: 10.1038/S41557-022-00910-7.
  • Dan Gezelter. InternetArchiveTheOpenScienceProject. https : // web . archive . org / web / 19991003200934/http://www.openscience.org/. [cito:citesAsAuthority]. Mar. 10, 1999.
  • Sunghwan Kim et al. “PubChem in 2021: new data content and improved web interfaces”. In: NucleicAcidsResearch(Nov. 2020). DOI: 10.1093/NAR/GKAA971.
  • Wendy A. Warr. “ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI). Interview by Wendy A. Warr”. en. In: JournalofComputer-AidedMolecularDesign 23.4 (Apr. 2009), pp. 195–8. ISSN: 0920-654X. DOI: 10.1007/S10822-009-9260-9.
  • Janna Hastings et al. “ChEBI in 2016: Improved services and an expanding collection of metabo- lites”. In: Nucleic Acids Research 44.D1 (Jan. 2016), pp. D1214–9. DOI: 10.1093/NAR/GKV1031.
  • Nina Jeliazkova et al. “Towards FAIR nanosafety data”. In: Nature Nanotechnology 16.6 (May 2021), pp. 644–654. DOI: 10.1038/S41565-021-00911-6.
  • Hiba Mohammed Taha et al. “The NORMAN Suspect List Exchange (NORMAN-SLE): facilitat- ing European and worldwide collaboration on suspect screening in high resolution mass spec- trometry”. en. In: Umweltwissenschaften und Schadstoff-Forschung 34.1 (Oct. 21, 2022). ISSN: 2190-4715. DOI: 10.1186/S12302-022-00680-6.
  • Antony J. Williams et al. “The CompTox Chemistry Dashboard: a community data resource for environmental chemistry”. In: Journal of Cheminformatics 9.1 (Dec. 2017). [cito:citesAsDataSource]. ISSN: 1758-2946. DOI: 10.1186/s13321-017-0247-6.
  • Andrea Jacobs et al. “CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community”. In: Journal of Chemical Information and Modeling (May 2022). [cito:citesAsDataSource]. DOI: 10.1021/acs.jcim.2c00268.
  • Adriano Rutz et al. “The LOTUS initiative for open knowledge management in natural products research”. In: eLife 11 (May 2022). [cito:citesAsDataSource]. DOI: 10.7554/eLife.70780.
  • Martin A. Walker. “Wikipedia as a Resource for Chemistry”. In: ACS Symposium Series (Dec. 14, 2010). [cito:citesAsRecommendedReading, pp. 79–92. ISSN: 0097-6156. DOI: 10.1021/BK- 2010-1060.CH005.
  • Martin A. Walker and Ye Li. “Improving Information Literacy Skills through Learning To Use and Edit Wikipedia: A Chemistry Perspective”. In: Journal of Chemical Education 93.3 (Dec. 22, 2015), pp. 509–515. ISSN: 0021-9584. DOI: 10.1021/ACS.JCHEMED.5B00525.
  • Michael D. Mandler. “Glaring Chemical Errors Persist for Years on Wikipedia”. en. In: Journal of Chemical Education 94.3 (Jan. 26, 2017), pp. 271–272. ISSN: 0021-9584. DOI: 10.1021/ACS. JCHEMED.6B00478.
  • Antony John Williams. “Internet-based tools for communication and collaboration in chemistry”. In: Drug Discovery Today 13.11-12 (May 9, 2008), pp. 502–506. ISSN: 1359-6446. DOI: 10.1016/ J.DRUDIS.2008.03.015.
  • Gabriel Sinclair et al. “Wikipedia on the CompTox Chemicals Dashboard: Connecting Resources to Enrich Public Chemical Data”. en. In: Journal of Chemical Information and Modeling (Oct. 10, 2022). ISSN: 1549-9596. DOI: 10.1021/ACS.JCIM.2C00886.
  • Peter Ertl et al. “Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia”. In: JournalofCheminformatics7.1 (2015). [cito:citesAsRecommendedReading], p. 10. DOI: 10.1186/s13321-015-0061-y.
  • Sören Auer et al. “DBpedia: A Nucleus for a Web of Open Data”. In: TheSemanticWeb. 2007, pp. 722–735. DOI: 10.1007/978-3-540-76298-0_52.
  • Michael Färber et al. “Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO”. en. In: Semantic Web:Interoperability,Usability, Applicability9.1 (Nov. 30, 2017). [cito:citesAsRecommendedReadi pp. 77–129. ISSN: 1570-0844. DOI: 10.3233/SW-170275. URL: https://www.semantic-web- journal.net/system/files/swj1366.pdf.
  • Denny Vrandeˇci´c. “Wikidata: A New Platform for Collaborative Data Collection”. In: Proceedings of the 21st International Conference on World Wide Web (2012). [cito:extends], pp. 1063–1064. DOI: 10.1145/2187980.2188242.
  • Denny Vrandeˇci´c, Lydia Pintscher, and Markus Krötzsch. “Wikidata: The Making Of”. In: WWW’23Companion: Companion Proceedingsof the ACM WebConference 2023. [cito:citesForInformation]. May 2023. DOI: 10.1145/3543873.3585579.
  • Daniel Mietchen et al. “Enabling Open Science: Wikidata for Research (Wiki4R)”. In: Research Ideas and Outcomes 1 (Dec. 2015). [cito:citesAsRecommendedReading,cito:updates], e7573. ISSN: 2367-7163. DOI: 10.3897/rio.1.e7573.
  • Tim E. Putman et al. “WikiGenomes: an open web application for community consumption and curation of gene annotation data in Wikidata”. In: Database 2017.1 (Jan. 2017). ISSN: 1758- 0463. DOI: 10.1093/database/bax025. URL: http://dx.doi.org/10.1093/database/bax025.
  • Andra Waagmeester et al. “Wikidata as a knowledge graph for the life sciences”. In: eLife 9 (Mar. 2020). [cito:citesAsRecommendedReading]. ISSN: 2050-084X. DOI: 10.7554/ELIFE.52614. URL: https://elifesciences.org/articles/52614.
  • Thomas Shafee et al. “Ten quick tips for editing Wikidata”. In: PLOS Computational Biology 19.7 (July 2023), e1011235. DOI: 10.1371/journal.pcbi.1011235.
  • Antonin Delpeuch. “Running a reconciliation service for Wikidata”. In: Proceedings of the 1st Wikidata Workshop. Oct. 2020.
  • Andra Waagmeester et al. “A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses”. In: BMC Biology 19.1 (Jan. 2021). [cito:citesAsRecommendedReading] p. 12. DOI: pdf/10.1186/s12915-020-00940-y.
  • Timothy Kanke. “Knowledge curation work in Wikidata WikiProject discussions”. en. In: Library Hi Tech 39.1 (Jan. 3, 2020). [cito:includesQuotationFrom], pp. 64–79. ISSN: 0737-8831. DOI:10.1108/LHT- 04- 2019- 0087.  URL:  https://www.emerald.com/insight/content/doi/10. 1108/LHT-04-2019-0087/full/html.
  • Wikidata:WikiProjectChemistry. https:// www. wikidata. org / wiki / Wikidata : WikiProject_ Chemistry. [cito:citesAsAuthority] [cito:extends].
  • Denise N. Slenter et al. “WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research”. In: Nucleic AcidsResearch46.D1 (Jan. 2018). [cito:obtainsBackgroundFrom], pp. D661–D667. DOI: 10.1093/nar/gkx1064.

  • Martijn P van Iersel et al. “The BridgeDb framework: standardized access to gene, protein and metabolite identifier mapping services”. en. In: BMC Bioinformatics 11.1 (Jan. 4, 2010). [cito:obtainsBackgroundFrom], p. 5. ISSN: 1471-2105. DOI: 10.1186/1471-2105-11-5. PMID: 20047655.
  • Denise Slenter et al. “Discovering life’s directed metabolic (sub)paths to interpret human bio- chemical markers using the DSMN tool”. en. In: Digital Discovery 3.1 (2024), pp. 81–98. ISSN: 2635-098X. DOI: 10.1039/D3DD00069A.
  • David Wishart et al. “HMDB 5.0: the Human Metabolome Database for 2022”. In: Nucleic Acids Research 50.D1 (Jan. 1, 2022). [cito:citesForInformation], pp. D622–D631. ISSN: 0305-1048. DOI: 10.1093/NAR/GKAB1062. PMID: 34986597.
  • Harry E. Pence and Antony Williams. “ChemSpider: An Online Chemical Information Resource”. In: Journal of Chemical Education 87.11 (Nov. 2010). [cito:citesForInformation], pp. 1123– 1124. ISSN: 0021-9584. DOI: 10.1021/ED100697W.
  • Stephen Heller et al. “InChI – the worldwide chemical structure identifier standard”. en. In: Journal of Cheminformatics 5.1 (2013). [cito:citesForInformation], p. 7. ISSN: 1758-2946. DOI: 10.1186/1758-2946-5-7.
  • Sara Totaro et al. “The JRC Nanomaterials Repository: A unique facility providing representa- tive test materials for nanoEHS research”. In: Regulatory Toxicology and Pharmacology 81 (Nov. 2016), pp. 334–340.
  • Egon L. Willighagen. OntologyIRIsfortheJRCrepresentativeindustrialnanomaterials. http://nanocommons.github.io/specifications/2023/WD-jrc-20230606/. June 2023.
  • Dieter Maier et al. “Harmonising knowledge for safer materials via the “NanoCommons” Knowl- edge Base”. In: Frontiersin physics 11 (Nov. 13, 2023). ISSN: 2296-424X. DOI: 10.3389/FPHY. 2023.1271842.
  • Finn Årup Nielsen, Daniel Mietchen, and Egon Willighagen. “Scholia and scientometrics with Wikidata”. en. In: Joint Proceedings of the 1st International Workshop on Scientometrics and 1st International Workshop on Enabling Decentralised Scholarly Communication. 1st International Workshop on Scientometrics (Portorož, Republika Slovenija, May 28–28, 2017). [cito:extends]. Mar. 13, 2017. DOI: 10.5281/ZENODO.1036595.
  • Finn Årup Nielsen, Daniel Mietchen, and Egon Willighagen. “Geospatial data and Scholia”. In: Proceedings of the 3rd International Workshop on Geospatial Linked Data and the 2nd Workshopon Querying the Web of Data. Mar. 2018. DOI: 10.5281/ZENODO.1202256. URL: http://ceur- ws.org/Vol-2110/paper3.pdf.
  • E. Willighagen. Wikidata and Scholia as hub linking metabolite knowledge. Aug. 2018. DOI: 10. 5281/zenodo.3594371.
  • Egon Willighagen et al. “The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching”. en. In: Journal of Cheminformatics 9.1 (June 6, 2017). [cito:usesMethodIn]. ISSN: 1758-2946. DOI: 10.1186/S13321-017-0220-4.
  • ChemAxon. ChemaxonExtendedSMILESandSMARTS-CXSMILESandCXSMARTS. [cito:usesMethodIn]. URL:  https://docs.chemaxon.com/display/docs/formats_chemaxon-extended-smiles-and-smarts-cxsmiles-and-cxsmarts.md.
  • Ola Spjuth et al. “Bioclipse 2: A scriptable integration platform for the life sciences”. In: BMC Bioinformatics10.1 (2009). [cito:usesMethodIn], p. 397. ISSN: 1471-2105. DOI: 10 . 1186 /1471-2105-10-397.
  • Egon Willighagen. “Bacting: a next generation, command line version of Bioclipse”. In: Journal of Open Source Software 6.62 (June 23, 2021). [cito:usesMethodIn], p. 2558. DOI: 10.21105/ JOSS.02558.
  • Alexander R. Pico et al. “WikiPathways: pathway editing for the people.” In: PLoSbiology6.7 (July 2008), e184+. ISSN: 1545-7885. DOI: 10.1371/journal.pbio.0060184.
  • Denise N. Slenter et al. “WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research”. In: Nucleic Acids Research 46.D1 (Jan. 2018), pp. D661–D667. ISSN: 0305-1048, 1362-4962. DOI: 10.1093/nar/gkx1064.
  • Matthew J Conroy et al. “LIPID MAPS: update to databases and tools for the lipidomics com- munity”. In: NucleicAcidsResearch(Oct. 2023). [cito:citesAsDataSource]. DOI: 10.1093/nar/ gkad896.
  • Lucila Aimo et al. “The SwissLipids knowledgebase for lipid biology”. en. In: Bioinformatics 31.17 (May 5, 2015). [cito:citesAsDataSource], pp. 2860–2866. ISSN: 1367-4803. DOI: 10 .1093/BIOINFORMATICS/BTV285.
  • Jean-Claude Bradley, Antony John Williams, and Andrew S.I.D. Lang. Jean-ClaudeBradleyOpen Melting Point Dataset. https://figshare.com/articles/Jean_Claude_Bradley_Open_Melting_ Point_Datset/1031637/2. [cito:usesDataFrom]. 2014. DOI: 10.6084/m9.figshare.1031637. v2.
  • Stefan Kuhn et al. “Twenty years of nmrshiftdb2: A case study of an open database for analytical chemistry”. en. In: Magnetic Resonance in Chemistry (Dec. 19, 2023). ISSN: 0749-1581. DOI: 10.1002/MRC.5418.
  • Yukiko Nakamura et al. “KNApSAcK Metabolite Activity Database for retrieving the relationships between metabolites and biological activities”. en. In: Plant and Cell Physiology 55.1 (Nov. 27, 2013), e7. ISSN: 0032-0781. DOI: 10.1093/PCP/PCT176.
  • Jeffrey A van Santen et al. “The Natural Products Atlas: An Open Access Knowledge Base for Mi- crobial Natural Products Discovery”. en. In: ACS Central Science 5.11 (Nov. 14, 2019), pp. 1824– 1833. ISSN: 2374-7943. DOI: 10.1021/ACSCENTSCI.9B00806.
  • Christopher Southan. “InChI in the wild: an assessment of InChIKey searching in Google”. en. In: JournalofCheminformatics5.1 (Feb. 11, 2013). [cito:citesForInformation], p. 10. ISSN: 1758-2946. DOI: 10.1186/1758-2946-5-10.
  • Giancarlo Guizzardi et al. “Applying a Multi-Level Modeling Theory to Assess Taxonomic Hierar- chies in Wikidata”. en. In: Proceedings of the 25th International Conference Companion on World Wide Web (Apr. 2016), pp. 975–980. DOI: 10.1145/2872518.2891117.
  • Alessandro Piscopo and Elena Simperl. “Who Models the World?: Collaborative Ontology Cre- ation and User Roles in Wikidata”. en. In: Proceedings of the ACM on Human-Computer Interaction 2 (CSCW Nov. 1, 2018), pp. 1–18. DOI: 10.1145/3274410.
  • Valentina Anita Carriero, Paul Groth, and Valentina Presutti. “Empirical ontology design patterns and shapes from Wikidata”. In: Semantic Web: Interoperability, Usability, Applicability (Mar. 20, 2024), pp. 1–25. ISSN: 1570-0844. DOI: 10.3233/SW-243613.