ESWC 2016 report: sampling the preconference tutorials & workshops

by Mike Lauruhn

ESWC 2016 took place from May 29th, 2016 to June 2n, 2016 in Crete, Greece. The program had lots to offer in a variety of formats including Workshops, Tutorials, Papers across several tracks and specializations, posters and demos, and keynote speakers.

The first two days of the conference offered more than 20 workshops and tutorials. After deliberation, I took advantage of parts of two workshops and two tutorials. First up was, "From linguistic predicate-arguments to Linked Data and ontologies: Extracting n-ary relations," a hands on tutorial from the NLP team at Pompeu Fabra University in Barcelona. The ambitious tutorial efficiently covered a lot of ground. In a nutshell the tutorial was designed to: Introduce NLP tools and resources for identifying predicate-arguments in text; and, provide an overview of models and methods for mapping those arguments into the Semantic Web.

This included a variety of sections, from covering methods and tools for deep linguistic text analysis; representing arguments in RDF/OWL for moving natural language to the Semantic Web; and example applications for relation extraction and evaluation. This gave a solid introductory overview to some of available resources -- including PropBank, VerbNet, and FrameNet. The second half of the tutorial focused on tools and resources. The presenters also took the opportunity to showcase their own relation extraction demo, which allows users to enter in a sentence and view five different annotations via BRAT visualization.

The afternoon of the first day, I was able to attend the tail end of the 1st Workshop on Humanities in the Semantic Web (WHiSe). This included a pair of excellent papers and an intriguing round table on where to go (literally and figuratively) moving forward. While highlighting some specific projects (sorry I missed the paper with the title of the conference -- "Linked Death", a paper on the lifecycle and applications of Linked Open Data World War II death records), the workshop gave some time to discussing the big picture applications of semantic web technologies in processes associated with the humanities and why there is less conversation about research ecosystems (and specifically technologies and protocols) in the sphere of humanities.

The papers I attended were "On the description of process in Digital Scholarship", presented by David De Roure, and "An ecosystem for Linked Humanities Data" from Rinke Hoekstra. De Roure pointed out that the provenance of historical artifacts and their digitization can be represented in PROV. Further, he went on to say that when W3C PROV convened, it did so with use cases that were physical and digital, but that since then, little has been done with physical objects. Rinke expanded on the theme with an elaboration on what the research ecosystem for Linked Humanities data should entail. The paper notes that much of the usable data that is available is a result of a top-down approach -- prominent datasets from large collections. The paper then presents a model for individuals to publish their smaller datasets, and link them to existing vocabularies and other datasets. I would be remiss if I didn't mention Rinke's awesome use of herring and the fishing industry as an illustration of the distribution of data.

The lively round table concluded the workshop with two prominent themes: expanding on what the humanities research ecosystem should be; and how to continue to bridge the semantic web community and the humanities community. In my opinion, the most intriguing question from the first part is "what is the equivalent of bioinformaticist in the Humanities space?" On a more practical matter, the question arose as to what would be the appropriate venue for another WHISE workshop.

On the morning of day two, I attended the Second International Workshop on Semantic Web for Scientific Heritage. Christophe Debruyne was the invited speaker and presented on the objectives and challenges encountered in the Linked Logainm project -- "Publishing and Using an Authoritative Linked Data Dataset of Irish Place Names." The service was designed to help librarians who want authoritative vocabularies that can be integrated with existing bibliographic systems. Linked Logainm was a success in that its concepts and structure are fine-grained and account for nuances of Irish history, geography and county types. The issue that stuck out to me the most was a provenance topic around the intermingling of authoritative data and data (concepts, facts, relations) that are semi-automated and perhaps 'less verified' or have less confidence. Debruyne elaborated on the discussions about how best to capture  and note these or even keeping them in a separate graph. 

Two additional workshop presenters had tons of interesting overlap: Seth van Hooland discussed topic modelling and linguistic annotation framework for modeling the Hebrew Bible; while Anja Weingart presented "Lexicon for Old Occitan medico-botanical terminology in Lemon model" ("Lemon is a proposed model for modeling lexicon and machine-readable dictionaries and linked to the Semantic Web and the Linked Data cloud." )


The final stop of the preconference weekend was the LOD Lab tutorial put on by the LOD Laundromat team. The Laundromat is a service that "provides access to all Linked Open Data (LOD) in the world." The Laundromat metaphor refers to the manner in which the data is cleaned with syntax errors, duplicates, and blank nodes removed and represented as N-Triples. For me, the highlight of the tutorial was the hands on dive into LOTUS, a service that allows for searching LOD Laundromat statements based on natural text.