Skip to contents

Imagine you are tasked with exploring the Linked Data Service LINDAS provided by the Swiss Federal Archives. You might be able to read examples and docs if present. But in any case, you could get inspired by the Chapter 11 “SPARQL cookbook” of the “Learning SPARQL” book by Bob DuCharme to explore the dataset. Let’s go through an example.

A word of caution

Depending on the dataset (or triplestore, in our context) you’re working with, some queries might just ask too much of the service so proceed with caution. When in doubt, add a spq_head() in your query pipeline, to ask less at a time, or use spq_count() to get a sense of how many results there are in total.

Asking for a subset of all triples

In the code below we’ll ask for 10 triples. Note that we use the endpoint argument of spq_init() to indicate where to send the query, as well as the request_type argument.

How can one know whether a service needs request_type = "body-form"?

  • The docs might mention it.
  • Trial and error.
  • In LINDAS’ case if you run a request via https://lindas.admin.ch/sparql/ and use your browser’s web developer console at the request tab, you can see the request sends the query in the body.
library("glitter") 
query_basis = spq_init(
  endpoint = "https://ld.admin.ch/query",
  request_control = spq_control_request(
    request_type = "body-form"
  )
)
query_basis %>%
  spq_add("?s ?p ?o") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()
p s o
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8501607 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8501950 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8501611 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8519449 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8505592 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8505581 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8505594 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8505578 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8505584 http://classifications.data.admin.ch/canton/
https://gont.ch/canton https://lod.opentransportdata.swiss/didok/8518155 http://classifications.data.admin.ch/canton/

This first query is helpful in that it shows you can do a query! Its results however can be… more or less helpful.

Classes

Find which classes are declared

The classes occurring in the database will provide information as to the kind of data you will find there. This can be as varied (across triplestores, or even in a single triplestore) as people, places, buildings, trees, or even things that are more abstract like concepts, philosophical currents, historical periods, etc.

At this point you might think you need to use some prefixes in your query. If these prefixes are present in glitter::usual_prefixes, you don’t need to do anything. If they’re not, use glitter::spq_prefix().

query_basis %>%
  spq_add("?class a rdfs:Class") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()
class
http://schema.org/GovernmentOrganization
http://schema.org/Corporation
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://schema.org/Place
http://schema.org/DataCatalog
http://schema.org/Dataset
http://schema.org/CreativeWork
http://schema.org/PostalAddress
http://schema.org/Organization

How many classes are defined in total? This query might be too big for the service.

nclasses = query_basis %>%
  spq_add("?class a rdfs:Class") %>%
  spq_count() %>%
  spq_perform()

nclasses
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1  1195

There are 1195 classes declared in the triplestore. Not so many that we could not get them all in one query, but definitely too many to show them all here! Let us examine a few of these classes:

query_basis %>%
  spq_add("?class a rdfs:Class") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()
class
http://schema.org/GovernmentOrganization
http://schema.org/Corporation
http://www.w3.org/2000/01/rdf-schema#Class
http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://schema.org/Place
http://schema.org/DataCatalog
http://schema.org/Dataset
http://schema.org/CreativeWork
http://schema.org/PostalAddress
http://schema.org/Organization

Until now we could still be very in the dark as to what the service provides.

Which classes have instances?

A class might be declared although very few or even no items fall under it. Getting classes which do have instances actually corresponds to a another triple pattern, “?item is an instance of ?class”, a.k.a. “?item a ?class”:

query_basis %>%
  spq_add("?instance a ?class") %>%
  spq_select(- instance) %>%
  spq_arrange(class) %>%
  spq_head(n = 10) %>%
  spq_select(class, .spq_duplicate = "distinct") %>%
  spq_perform() %>%
  knitr::kable()
class
http://example.com/HydroMeasuringStation
http://filteredpush.org/ontologies/oa/dwcFP#TaxonConcept
http://filteredpush.org/ontologies/oa/dwcFP#TaxonName
http://plazi.org/vocab/treatment#Treatment
http://publications.europa.eu/ontology/euvoc#Continent
http://publications.europa.eu/ontology/euvoc#Country
http://publications.europa.eu/ontology/euvoc#DataTheme
http://publications.europa.eu/ontology/euvoc#Frequency
http://publications.europa.eu/ontology/euvoc#PlannedAvailability
http://purl.org/dc/terms/LicenseDocument

Which classes have the most instances?

The number of items falling into each class actually gives an even better overview of the contents of a triplestore:

query_basis %>%
  spq_add("?instance a ?class") %>%
  spq_select(class, .spq_duplicate = "distinct")  %>%
  spq_count(class, sort = TRUE) %>%                        # count items falling under class
  spq_head(20) %>%
  spq_perform() %>%
  knitr::kable()
class n
https://cube.link/Observation 9068359
https://www.ica.org/standards/RiC/ontology#DateRange 4827519
https://www.ica.org/standards/RiC/ontology#RecordSet 4566445
http://schema.org/DefinedTerm 2995433
http://schema.org/PropertyValue 2211082
http://schema.org/Place 1042148
http://schema.org/PostalAddress 748182
http://schema.org/Organization 744408
https://schema.ld.admin.ch/ZefixOrganisation 737118
http://www.w3.org/ns/locn#Address 736937
http://filteredpush.org/ontologies/oa/dwcFP#TaxonConcept 680282
http://filteredpush.org/ontologies/oa/dwcFP#TaxonName 668920
http://plazi.org/vocab/treatment#Treatment 662319
http://purl.org/spar/fabio/Figure 322029
https://ld.zh.ch/elodzh-ontology/Man 296206
https://ld.zh.ch/elodzh-ontology/Woman 296206
https://ld.zh.ch/elodzh-ontology/MarriageEntry 296206
https://www.ica.org/standards/RiC/ontology#Record 261074
http://rs.tdwg.org/dwc/terms/MaterialCitation 216566
http://www.w3.org/2006/time#TemporalEntity 197002

In this case the class names are quite self explanatory but if they were not we could use

query_basis %>%
  spq_add("?instance a ?class") %>%
  spq_select(class, .spq_duplicate = "distinct")  %>%
  spq_label(class) %>%                                   # label class to get class_label
  spq_count(class, class_label, sort = TRUE) %>%         # group by class and class_label to count
  spq_head(20) %>%
  spq_perform() %>%
  knitr::kable()
class_label class n
https://cube.link/Observation 9068359
Date Range https://www.ica.org/standards/RiC/ontology#DateRange 4827519
Record Set https://www.ica.org/standards/RiC/ontology#RecordSet 4566445
http://schema.org/DefinedTerm 2995433
http://schema.org/PropertyValue 2211082
http://schema.org/Place 1042148
http://schema.org/PostalAddress 748182
http://schema.org/Organization 744408
https://schema.ld.admin.ch/ZefixOrganisation 737118
http://www.w3.org/ns/locn#Address 736937
http://filteredpush.org/ontologies/oa/dwcFP#TaxonConcept 680282
http://filteredpush.org/ontologies/oa/dwcFP#TaxonName 668920
http://plazi.org/vocab/treatment#Treatment 662319
http://purl.org/spar/fabio/Figure 322029
Man https://ld.zh.ch/elodzh-ontology/Man 296206
Woman https://ld.zh.ch/elodzh-ontology/Woman 296206
Marriage entry https://ld.zh.ch/elodzh-ontology/MarriageEntry 296206
Record https://www.ica.org/standards/RiC/ontology#Record 261074
http://rs.tdwg.org/dwc/terms/MaterialCitation 216566
Temporal entity http://www.w3.org/2006/time#TemporalEntity 197002

Properties

Find which properties are declared

Note that you could instead use spq_add("?property a rdfs:Property") but in this case it returned nothing.

query_basis %>%
  spq_add("?property a owl:DatatypeProperty") %>%
  spq_head(n = 10) %>%
  spq_perform() %>%
  knitr::kable()
property
http://purl.org/dc/terms/description
http://qudt.org/schema/qudt/dbpediaMatch
http://qudt.org/schema/qudt/conversionMultiplier
http://qudt.org/schema/qudt/ucumCode
http://qudt.org/schema/qudt/iec61360Code
http://qudt.org/schema/qudt/symbol
http://qudt.org/schema/qudt/uneceCommonCode
http://qudt.org/schema/qudt/plainTextDescription
http://qudt.org/schema/qudt/abbreviation
http://qudt.org/schema/qudt/latexSymbol

How many properties are defined in total? This query might be too big for the service.

query_basis %>%
  spq_add("?property a owl:DatatypeProperty") %>%
  spq_count() %>%
  spq_perform()
#> # A tibble: 1 × 1
#>       n
#>   <int>
#> 1   241

What values does a given property have?

query_basis  %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/"))%>%
  spq_add("?s schema:addressRegion ?value") %>%
  spq_count(value, sort = TRUE) %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()
value n
ZH 128996
BE 66659
VD 64674
GE 54181
AG 43660
TI 42075
SG 40808
ZG 39994
LU 33185
VS 32468

Which class use a particular property?

One of the properties is https://gont.ch/longName. Which class uses it?

query_basis %>%
  spq_prefix(prefixes = c("gont" = "https://gont.ch/")) %>%
  spq_add("?s gont:longName ?o") %>%
  spq_add("?s a ?class") %>%
  spq_select(-o, -s) %>%
  spq_select(class, .spq_duplicate = "distinct") %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()
class

What data is stored about a class’s instances?

The items falling into a given class are likely to be the subject (or object) of a common set of properties. One might wish to explore the properties actually associated to a class.

For instance, in LINDAS, what properties are the schema:Organization class associated to?

query_basis %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
  spq_add("?s a schema:Organization") %>%
  spq_add("?s ?property ?value") %>%
  spq_select(-value, -s, .spq_duplicate = "distinct") %>%
  spq_perform() %>%
  knitr::kable()
property
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://schema.org/name
http://schema.org/url
http://schema.org/dateModified
http://schema.org/identifier
http://schema.org/telephone
http://schema.org/email
http://schema.org/address
http://schema.org/publicAccess
http://schema.org/memberOf
http://schema.org/knowsAbout
http://schema.org/faxNumber
http://schema.org/validFrom
http://schema.org/description
http://schema.org/inDefinedTermSet
http://schema.org/image
http://schema.org/areaServed
http://schema.org/sameAs
http://schema.org/additionalType
http://schema.org/legalName
http://www.w3.org/ns/locn#address
https://schema.ld.admin.ch/municipality
http://schema.org/replacer
http://schema.org/dissolutionDate
http://www.w3.org/2000/01/rdf-schema#seeAlso
http://schema.org/alternateName
http://schema.org/member
http://www.w3.org/2000/01/rdf-schema#label
https://lod.opentransportdata.swiss/vocab/organizationTyp
https://lod.opentransportdata.swiss/vocab/organizationTypLabel
#SAID
#SBOID
http://www.w3.org/2006/vcard/ns#name
https://lod.opentransportdata.swiss/vocab/printLabel
https://lod.opentransportdata.swiss/vocab/printCustomerCode
https://lod.opentransportdata.swiss/vocab/screenLabel
http://schema.org/startDate
http://schema.org/endDate
http://schema.org/validThrough
https://schema.ld.admin.ch/canton
http://schema.org/parentOrganization
http://schema.org/subOrganization

And what about the properties that the schema:PostalAddress class are associated to?

query_basis %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
  spq_add("?s a schema:PostalAddress") %>%
  spq_add("?s ?property ?value") %>%
  spq_select(-value, -s, .spq_duplicate = "distinct") %>%
  spq_perform() %>%
  knitr::kable()
property
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://schema.org/streetAddress
http://schema.org/postalCode
http://schema.org/addressLocality
http://schema.org/addressRegion
http://schema.org/addressCountry
http://schema.org/postOfficeBoxNumber
http://schema.org/name
http://schema.org/sameAs
http://www.w3.org/ns/locn#thoroughfare
http://www.w3.org/ns/locn#locatorDesignator
http://www.w3.org/ns/locn#postCode
http://www.w3.org/ns/locn#postName
http://www.w3.org/ns/locn#adminUnitL2
http://www.w3.org/ns/locn#locatorName
http://schema.org/areaServed
http://www.w3.org/ns/locn#addressArea
http://www.w3.org/ns/locn#poBox

Which data or property name includes a certain substring?

Let us examine whether there exists in LINDAS some data related to water, through the search of string “hydro” or “Hydro” :

query_basis %>%
  spq_add("?s ?p ?o") %>%
  spq_filter(str_detect(o, "[Hh]ydro")) %>%
  spq_select(-s, .spq_duplicate = "distinct") %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()
p o
http://schema.org/name Labor Durchfluss und Hydrometrie
http://schema.org/name Labor Durchfluss und Hydrometrie
http://schema.org/name Labor Durchfluss und Hydrometrie
http://schema.org/name Abteilung Hydrologie
http://schema.org/name Division Hydrologie
http://schema.org/name Etat-major Hydrologie
http://schema.org/name Sektion Hydrometrie
http://schema.org/name Section Hydrométrie
http://schema.org/name Sektion Hydrologische Information
http://schema.org/name Section Informations hydrologiques

An example query based on what we now know

To wrap it up, let us now use the LINDAS triplestore for an actual data query: we could for instance try and collect all organizations which have “swiss” in their name:

query_basis %>%
  spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
  spq_add("?s a schema:Organization") %>%
  spq_add("?s schema:name ?name") %>%
  spq_filter(str_detect(name, "swiss")) %>%
  spq_head(10) %>%
  spq_perform() %>%
  knitr::kable()
s name
https://culture.ld.admin.ch/isil/CH-000765-0/organization Bibliothek Erziehungswissenschaft der Universität Bern
https://culture.ld.admin.ch/isil/CH-000141-6/organization Office fédéral de topographie swisstopo
https://culture.ld.admin.ch/isil/CH-000141-6/organization Ufficio federale di topografia swisstopo
https://culture.ld.admin.ch/isil/CH-000141-6/organization Federal Office of Topography swisstopo
https://culture.ld.admin.ch/isil/CH-000141-6/organization Bundesamt für Landestopografie swisstopo
https://culture.ld.admin.ch/isil/CH-001252-X/organization Geologische Informationsstelle / Bundesamt für Landestopografie swisstopo
https://culture.ld.admin.ch/isil/CH-001064-X/organization Kurzwellendienst (KWD), Schweizer Radio International (SRI) / swissinfo
https://culture.ld.admin.ch/isil/CH-001064-X/organization Kurzwellendienst (KWD), Schweizer Radio International (SRI) / swissinfo
https://culture.ld.admin.ch/isil/CH-001064-X/organization Service des ondes courtes (SOC) / Radio suisse international (RSI) / swissinfo
https://culture.ld.admin.ch/isil/CH-000589-1/organization Universitätsbibliothek Bern / Bibliothek Wirtschafswissenschaften