WikiPathways-SPARQL-book

Introduction

WikiPathways is a biological pathway database and describes the interactions between biochemical entities in biological processes [1,2,3,4]. It can be downloaded and used in various formats, one of which is the Resource Description Framework (RDF) [5].

The WikiPathways SPARQL endpoint can be found at http://sparql.wikipathways.org/. SPARQL allows you to query much of the content of the the WikiPathways data in a machine readable way, which has been used, for example, in the Open PHACTS project [6,7].

This book discusses how SPARQL can be used to extract information, using numerous example queries, like the following to get metadata about the data loaded into the SPARQL endpoint.

Metadata queries

The following query provides some information about what is currently loaded in the public SPARQL endpoint at http://sparql.wikipathways.org:

SPARQL sparql/metadata.rq (run, edit)

PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX void:    <http://rdfs.org/ns/void#>
PREFIX pav:     <http://purl.org/pav/>
select distinct ?dataset (str(?titleLit) as ?title) ?date ?license where {
  ?dataset a void:Dataset ;
    dcterms:title ?titleLit ;
    dcterms:license ?license ;
    pav:createdOn ?date .
}

Which gives as output:

dataset title date license
http://data.wikipathways.org/20191210/rdf/ WikiPathways RDF 20191210 2019-12-09T23:28:23.591Z http://creativecommons.org/publicdomain/zero/1.0/

Statistics

The give some idea of the content of the SPARQL endpoint, this section gives some overall statistics.

Number of pathways per species

We can list the number of pathways for each species available in WikiPathways with this query:

SPARQL sparql/pathwayCountBySpecies.rq (run, edit)

PREFIX dc:      <http://purl.org/dc/elements/1.1/> 
PREFIX wp: <http://vocabularies.wikipathways.org/wp#>
SELECT DISTINCT ?organism (str(?label) as ?name) (count(?pw) as ?pathwayCount)
WHERE {
    ?pw dc:title ?title ;
      wp:organism ?organism ;
      wp:organismName ?label .
}
ORDER BY DESC(?pathwayCount)

It shows us that there is a strong bias towards human pathways:

organism name pathwayCount
http://purl.obolibrary.org/obo/NCBITaxon_9606 Homo sapiens 1044
http://purl.obolibrary.org/obo/NCBITaxon_9913 Bos taurus 274
http://purl.obolibrary.org/obo/NCBITaxon_10090 Mus musculus 194
http://purl.obolibrary.org/obo/NCBITaxon_10116 Rattus norvegicus 155
http://purl.obolibrary.org/obo/NCBITaxon_4932 Saccharomyces cerevisiae 115
http://purl.obolibrary.org/obo/NCBITaxon_7955 Danio rerio 83
http://purl.obolibrary.org/obo/NCBITaxon_6239 Caenorhabditis elegans 61
http://purl.obolibrary.org/obo/NCBITaxon_9598 Pan troglodytes 46
http://purl.obolibrary.org/obo/NCBITaxon_9615 Canis familiaris 44
http://purl.obolibrary.org/obo/NCBITaxon_9031 Gallus gallus 40
http://purl.obolibrary.org/obo/NCBITaxon_3702 Arabidopsis thaliana 31
http://purl.obolibrary.org/obo/NCBITaxon_7227 Drosophila melanogaster 30
http://purl.obolibrary.org/obo/NCBITaxon_7165 Anopheles gambiae 14
http://purl.obolibrary.org/obo/NCBITaxon_1773 Mycobacterium tuberculosis 12
http://purl.obolibrary.org/obo/NCBITaxon_4530 Oryza sativa 11
http://purl.obolibrary.org/obo/NCBITaxon_562 Escherichia coli 9
http://purl.obolibrary.org/obo/NCBITaxon_3694 Populus trichocarpa 5
http://purl.obolibrary.org/obo/NCBITaxon_9796 Equus caballus 5
http://purl.obolibrary.org/obo/NCBITaxon_4081 Solanum lycopersicum 4
http://purl.obolibrary.org/obo/NCBITaxon_4577 Zea mays 4
http://purl.obolibrary.org/obo/NCBITaxon_1423 Bacillus subtilis 2
http://purl.obolibrary.org/obo/NCBITaxon_5833 Plasmodium falciparum 1
http://purl.obolibrary.org/obo/NCBITaxon_5518 Gibberella zeae 1

Number of metabolites per species

Counting metabolites is tricky, as metabolites that are biologically the same (e.g. different charge startes) can have different identifiers. A further complications is that not all metabolites in WikiPathways always have stereochemistry defined, for example because it is biologically obvious, as for amino acids. But we can count the number of Wikidata identifiers to get a reasonable estimate:

SPARQL sparql/metaboliteCountBySpecies.rq (run, edit)

PREFIX gpml:    <http://vocabularies.wikipathways.org/gpml#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dc:      <http://purl.org/dc/elements/1.1/>
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
select (count(distinct ?wikidata) as ?count) (str(?label) as ?species) where {
  ?metabolite a wp:Metabolite ;
    wp:bdbWikidata ?wikidata ;
    dcterms:isPartOf ?pw .
  ?pw wp:organismName ?label .
} GROUP BY ?label ORDER BY DESC(?count)

This tells us:

count species
2893 Homo sapiens
843 Bos taurus
840 Mus musculus
489 Rattus norvegicus
439 Arabidopsis thaliana
338 Saccharomyces cerevisiae
169 Danio rerio
125 Canis familiaris
104 Pan troglodytes
97 Mycobacterium tuberculosis
81 Caenorhabditis elegans
75 Gallus gallus
69 Oryza sativa
65 Escherichia coli
63 Drosophila melanogaster
58 Zea mays
49 Anopheles gambiae
39 Solanum lycopersicum
31 Populus trichocarpa
20 Equus caballus
13 Plasmodium falciparum
11 Gibberella zeae
8 Bacillus subtilis

References

  1. Pico AR, Kelder T, van Iersel MP, Hanspers K, Conklin BR, Evelo CT. WikiPathways: pathway editing for the people. PLoS Biol. 2008 Jul 22;6(7):e184. doi:10.1371/JOURNAL.PBIO.0060184 (Scholia)
  2. Kelder T, van Iersel MP, Hanspers K, Summer-Kutmon M, Conklin BR, Evelo CT, et al. WikiPathways: building research communities on biological pathways. NAR. 2012 Jan;40(Database issue):D1301-7. doi:10.1093/NAR/GKR1074 (Scholia)
  3. Summer-Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen E, Bohler A, et al. WikiPathways: capturing the full diversity of pathway knowledge. NAR. 2016 Jan 4;44(D1):D488-94. doi:10.1093/NAR/GKV1024 (Scholia)
  4. Slenter DN, Slenter DN, Kutmon M, Hanspers K, Hanspers K, Riutta A, et al. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. NAR. 2018 Jan 4;46(D1):D661–D667. doi:10.1093/NAR/GKX1064 (Scholia)
  5. Waagmeester A, Summer-Kutmon M, Riutta A, Miller R, Willighagen E, Evelo CT, et al. Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources. PLoS Comput Biol. 2016 Jun;12(6):e1004989. doi:10.1371/JOURNAL.PCBI.1004989 (Scholia)
  6. Williams AJ, Harland L, Groth P, Pettifer S, Chichester C, Willighagen E, et al. Open PHACTS: semantic interoperability for drug discovery. DDT [Internet]. 2012 Nov;17(21–22):1188–98. Available from: http://www.openphacts.org/documents/registered/publications/Williams_Harland_Groth_et%20al_Open%20PHACTS_Semantic%20interoperability%20for%20drug%20discovery_Drug%20Discovery%20Today_06%20June%202012.pdf doi:10.1016/J.DRUDIS.2012.05.016 (Scholia)
  7. Miller RA, Woollard P, Willighagen EL, Digles D, Kutmon M, Loizou A, et al. Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform. Vol. 7, F1000Research. 2018. p. 75. doi:10.12688/F1000RESEARCH.13197.1 (Scholia)