Imagine you are tasked with exploring the Linked Data Service LINDAS provided by the Swiss Federal Archives. You might be able to read examples and docs if present. But in any case, you could get inspired by the Chapter 11 “SPARQL cookbook” of the “Learning SPARQL” book by Bob DuCharme to explore the dataset. Let’s go through an example.
A word of caution
Depending on the dataset (or triplestore, in our context) you’re
working with, some queries might just ask too much of the
service so proceed with caution. When in doubt, add a
spq_head()
in your query pipeline, to ask less at a time,
or use spq_count()
to get a sense of how many results there
are in total.
Asking for a subset of all triples
In the code below we’ll ask for 10 triples. Note that we use the
endpoint
argument of spq_init()
to indicate
where to send the query, as well as the request_type
argument.
How can one know whether a service needs
request_type = "body-form"
?
- The docs might mention it.
- Trial and error.
- In LINDAS’ case if you run a request via https://lindas.admin.ch/sparql/ and use your browser’s web developer console at the request tab, you can see the request sends the query in the body.
library("glitter")
query_basis = spq_init(
endpoint = "https://ld.admin.ch/query",
request_control = spq_control_request(
request_type = "body-form"
)
)
query_basis %>%
spq_add("?s ?p ?o") %>%
spq_head(n = 10) %>%
spq_perform() %>%
knitr::kable()
This first query is helpful in that it shows you can do a query! Its results however can be… more or less helpful.
Classes
Find which classes are declared
The classes occurring in the database will provide information as to the kind of data you will find there. This can be as varied (across triplestores, or even in a single triplestore) as people, places, buildings, trees, or even things that are more abstract like concepts, philosophical currents, historical periods, etc.
At this point you might think you need to use some prefixes in your
query. If these prefixes are present in
glitter::usual_prefixes
, you don’t need to do anything. If
they’re not, use glitter::spq_prefix()
.
query_basis %>%
spq_add("?class a rdfs:Class") %>%
spq_head(n = 10) %>%
spq_perform() %>%
knitr::kable()
How many classes are defined in total? This query might be too big for the service.
nclasses = query_basis %>%
spq_add("?class a rdfs:Class") %>%
spq_count() %>%
spq_perform()
nclasses
#> # A tibble: 1 × 1
#> n
#> <int>
#> 1 1195
There are 1195 classes declared in the triplestore. Not so many that we could not get them all in one query, but definitely too many to show them all here! Let us examine a few of these classes:
query_basis %>%
spq_add("?class a rdfs:Class") %>%
spq_head(n = 10) %>%
spq_perform() %>%
knitr::kable()
Until now we could still be very in the dark as to what the service provides.
Which classes have instances?
A class might be declared although very few or even no items fall under it. Getting classes which do have instances actually corresponds to a another triple pattern, “?item is an instance of ?class”, a.k.a. “?item a ?class”:
query_basis %>%
spq_add("?instance a ?class") %>%
spq_select(- instance) %>%
spq_arrange(class) %>%
spq_head(n = 10) %>%
spq_select(class, .spq_duplicate = "distinct") %>%
spq_perform() %>%
knitr::kable()
Which classes have the most instances?
The number of items falling into each class actually gives an even better overview of the contents of a triplestore:
query_basis %>%
spq_add("?instance a ?class") %>%
spq_select(class, .spq_duplicate = "distinct") %>%
spq_count(class, sort = TRUE) %>% # count items falling under class
spq_head(20) %>%
spq_perform() %>%
knitr::kable()
In this case the class names are quite self explanatory but if they were not we could use
query_basis %>%
spq_add("?instance a ?class") %>%
spq_select(class, .spq_duplicate = "distinct") %>%
spq_label(class) %>% # label class to get class_label
spq_count(class, class_label, sort = TRUE) %>% # group by class and class_label to count
spq_head(20) %>%
spq_perform() %>%
knitr::kable()
Properties
Find which properties are declared
Note that you could instead use
spq_add("?property a rdfs:Property")
but in this case it
returned nothing.
query_basis %>%
spq_add("?property a owl:DatatypeProperty") %>%
spq_head(n = 10) %>%
spq_perform() %>%
knitr::kable()
How many properties are defined in total? This query might be too big for the service.
What properties are used?
Similarly to counting instances for classes, we wish to get a sense of the properties that are actually used in the triplestore.
query_basis %>%
spq_add("?s ?property ?o") %>%
spq_select(- s, - o) %>%
spq_select(property, .spq_duplicate = "distinct") %>%
spq_head(10) %>%
spq_perform() %>%
knitr::kable()
What values does a given property have?
query_basis %>%
spq_prefix(prefixes = c("schema" = "http://schema.org/"))%>%
spq_add("?s schema:addressRegion ?value") %>%
spq_count(value, sort = TRUE) %>%
spq_head(10) %>%
spq_perform() %>%
knitr::kable()
value | n |
---|---|
ZH | 128996 |
BE | 66659 |
VD | 64674 |
GE | 54181 |
AG | 43660 |
TI | 42075 |
SG | 40808 |
ZG | 39994 |
LU | 33185 |
VS | 32468 |
Which class use a particular property?
One of the properties is https://gont.ch/longName
. Which
class uses it?
query_basis %>%
spq_prefix(prefixes = c("gont" = "https://gont.ch/")) %>%
spq_add("?s gont:longName ?o") %>%
spq_add("?s a ?class") %>%
spq_select(-o, -s) %>%
spq_select(class, .spq_duplicate = "distinct") %>%
spq_head(10) %>%
spq_perform() %>%
knitr::kable()
class |
---|
What data is stored about a class’s instances?
The items falling into a given class are likely to be the subject (or object) of a common set of properties. One might wish to explore the properties actually associated to a class.
For instance, in LINDAS, what properties are the schema:Organization class associated to?
query_basis %>%
spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
spq_add("?s a schema:Organization") %>%
spq_add("?s ?property ?value") %>%
spq_select(-value, -s, .spq_duplicate = "distinct") %>%
spq_perform() %>%
knitr::kable()
And what about the properties that the schema:PostalAddress class are associated to?
query_basis %>%
spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
spq_add("?s a schema:PostalAddress") %>%
spq_add("?s ?property ?value") %>%
spq_select(-value, -s, .spq_duplicate = "distinct") %>%
spq_perform() %>%
knitr::kable()
Which data or property name includes a certain substring?
Let us examine whether there exists in LINDAS some data related to water, through the search of string “hydro” or “Hydro” :
query_basis %>%
spq_add("?s ?p ?o") %>%
spq_filter(str_detect(o, "[Hh]ydro")) %>%
spq_select(-s, .spq_duplicate = "distinct") %>%
spq_head(10) %>%
spq_perform() %>%
knitr::kable()
p | o |
---|---|
http://schema.org/name | Labor Durchfluss und Hydrometrie |
http://schema.org/name | Labor Durchfluss und Hydrometrie |
http://schema.org/name | Labor Durchfluss und Hydrometrie |
http://schema.org/name | Abteilung Hydrologie |
http://schema.org/name | Division Hydrologie |
http://schema.org/name | Etat-major Hydrologie |
http://schema.org/name | Sektion Hydrometrie |
http://schema.org/name | Section Hydrométrie |
http://schema.org/name | Sektion Hydrologische Information |
http://schema.org/name | Section Informations hydrologiques |
An example query based on what we now know
To wrap it up, let us now use the LINDAS triplestore for an actual data query: we could for instance try and collect all organizations which have “swiss” in their name:
query_basis %>%
spq_prefix(prefixes = c("schema" = "http://schema.org/")) %>%
spq_add("?s a schema:Organization") %>%
spq_add("?s schema:name ?name") %>%
spq_filter(str_detect(name, "swiss")) %>%
spq_head(10) %>%
spq_perform() %>%
knitr::kable()
s | name |
---|---|
https://culture.ld.admin.ch/isil/CH-000765-0/organization | Bibliothek Erziehungswissenschaft der Universität Bern |
https://culture.ld.admin.ch/isil/CH-000141-6/organization | Office fédéral de topographie swisstopo |
https://culture.ld.admin.ch/isil/CH-000141-6/organization | Ufficio federale di topografia swisstopo |
https://culture.ld.admin.ch/isil/CH-000141-6/organization | Federal Office of Topography swisstopo |
https://culture.ld.admin.ch/isil/CH-000141-6/organization | Bundesamt für Landestopografie swisstopo |
https://culture.ld.admin.ch/isil/CH-001252-X/organization | Geologische Informationsstelle / Bundesamt für Landestopografie swisstopo |
https://culture.ld.admin.ch/isil/CH-001064-X/organization | Kurzwellendienst (KWD), Schweizer Radio International (SRI) / swissinfo |
https://culture.ld.admin.ch/isil/CH-001064-X/organization | Kurzwellendienst (KWD), Schweizer Radio International (SRI) / swissinfo |
https://culture.ld.admin.ch/isil/CH-001064-X/organization | Service des ondes courtes (SOC) / Radio suisse international (RSI) / swissinfo |
https://culture.ld.admin.ch/isil/CH-000589-1/organization | Universitätsbibliothek Bern / Bibliothek Wirtschafswissenschaften |