Querying Linked Data using SPARQL
SPARQL is a powerful and generic API to query data on the web or your Intranet. In this post, we want to show how you can query CSV, JSON, or XML from any SPARQL endpoint and point you to SPARQL libraries in various programming languages.
Besides result set answers, SPARQL can return RDF subgraphs as well. This is not covered here. If you are looking for more general information on SPARQL, check out the last two sections of this post.
SPARQL web frontends
When you (or someone else) creates a SPARQL query you likely use an HTML-based frontend with a nice editor that provides SPARQL syntax highlighting and previews for what you do. An example of that is the Wikidata Query Service (opens new window) or the LINDAS (opens new window) SPARQL endpoint from the Swiss government. This is the recommended way to work on your queries, at least initially. It helps you find syntax errors and provides nice plugins. In the example below (opens new window) we use a plugin that can represent well-known-text strings for geospatial data directly on the map. In this particular example, we combine data from ElCom (opens new window) with spatial data from Swisstopo (opens new window) to display Swiss energy providers on a map.
Once you finished the query, the easiest way to share it with others is to get a link that encodes the query in an URL. Most frontends provide this functionality, in YASGUI (opens new window) for example, this is the share button:
This returns a long version of the query which is simply using encodeURI (opens new window) to serialize the string in the URL. Some services provide a built-in URL shortener so it's easier to pass around.
This is all great as long as you just want to share the query with other people. For machines, this is not the way you want to access the data.
There are many ways how you can query a SPARQL endpoint with curl. Bob DuCharme, the author of Learning SPARQL, has a more detailed blog with some background here (opens new window).
In the end, either you or curl needs to:
- URL-encode the query
- Send it to the right endpoint
- Request a specific content-type (unless you are happy with whatever the default is)
The right endpoint
In most setups, there is a catch: The HTML frontend is not the SPARQL query endpoint. In other words
curl, a library, or even your SPARQL frontend send the query to the SPARQL endpoint itself. This endpoint is simply a dedicated route on the webserver that speaks the SPARQL 1.1 Protocol. In the example before, this is this part:
So when we want to run the query directly via HTTP, we need to send it to
RDF & SPARQL make heavy use of an HTTP concept called content negotiation (opens new window). This is a very powerful concept but surprisingly few developers heard about it. SPARQL gives you a choice in what way you can get the answer. Depending on your preferences and/or programming environment, one or the other is used.
In SPARQL 1.1, SPARQL-
SELECT queries can be returned as:
- Text, in CSV format (or TSV, see spec below). Content-Type:
- JSON. Content-Type:
- XML. Content-Type:
If you come from a relational DB background, the answers look like a result set containing columns/keys per variables used in the
SELECT-clause. But what do I choose now?
- When I want to do something simple or just look at the results in my console or editor, I use the text format.
- I use XML when the result is another XML consumer. This could be XSLT converting it to a more "XML-like" tree structure or a library/API/bus that expects XML as input.
You only have to worry about content-types in case you don't use a library and I would advise strongly against that. Find a library for your programming language and let it take care of it, so you can focus on the results instead!
We now know ho to encode the query, we know what we want (CSV as content-type) and we figured out where to send it (our endpoint). Let's execute a SPARQL query in our shell using curl! As an example, we take a query that queries Swiss Government and Administration Organisations (opens new window). Copy the query itself in a file with the extension
.rq, which is the most commonly used file extension for SPARQL.
Once you did that, you can run the query via curl:
curl -H "Accept: text/csv" --data-urlencode firstname.lastname@example.org https://ld.admin.ch/query -o swiss-offices.csv
-H "Accept: text/csv" sets the requested content-type to CSV,
--data-urlencode email@example.com uri-encodes the query and sends it to
query on the endpoint, according to the SPARQL 1.1 Protocol. If you don't want to save it to a file, simply omit the
-o swiss-offices.csv option and curl will echo the result on
Libraries in programming languages
In many programming languages, there are libraries available to execute SPARQL queries and iterate on the result. Many languages also provide SPARQL builders but they are outside the scope of this article.
If you would like to recommend other libraries, please let us know.
- sparql-http-client (opens new window), a SPARQL client for easier handling of SPARQL Queries and Graph Store requests.
- d3-sparql (opens new window), a library that creates a simple
d3-csvlike JSON data structure from
- SPARQLWrapper (opens new window) is a simple Python wrapper around a SPARQL service to remotely execute your queries.
- dotnetrdf (opens new window) SPARQL query interface.
- Sparesults (opens new window), a set of parsers and serializers for SPARQL query result formats. There is still some glue code needed for the request itself. The author plans to provide a standalone client soon.
- SPARQL.ex (opens new window), which contains a generic client
- R & SPARQL (opens new window), detailed blog post how SPARQL can be executed within R-Markdown.
Learning SPARQL & Support
If you are completely new to SPARQL and want to do more than just execute other people's queries, I would recommend O'Reillys Learning SPARQL (opens new window) book. There are other good tutorials out there, among others:
I use the W3C specs regularly to look up stuff, they are well written in my opinion. If you get stuck, check out Stackoverflow (opens new window) as well, many very skilled people answer SPARQL related questions there.
There is also a Github Discussion (opens new window) community available where you might post questions.
For this post, the following SPARQL specifications are relevant:
- SPARQL 1.1 Query Language (opens new window): The query language itself (read-only)
- SPARQL 1.1 Query Results: Different types of query results that can be requested
- SPARQL 1.1 Protocol (opens new window): The HTTP operations used to run SPARQL queries
If you want to write data, SPARQL 1.1 Update (opens new window) or SPARQL 1.1 Graph Store HTTP Protocol (opens new window) are the relevant specifications but they are not covered here.