Topics & Outcomes - Datathon 2026

Topics

During the datathon, sessions will be organised to cover topics such as:

Ontologies and Linked Data
The Lexicon Model for Ontologies (Ontolex-Lemon)
Integrating documents, annotations and NLP tools with Linked Data and RDF using Web Annotation and NIF (NLP Interchangeable Format)
Knowledge Graph embeddings and language resources
Neural approaches for linguistic data

Outcomes

During the datathon, participants will be able to:

Generate their own Linguistic Linked Data from existing data sources, using visual tools like VocBench and community standards like OntoLex lemon
Apply semantic technologies (linked data, knowledge graphs, RDF, SPARQL) to the field of language resources and learn about their benefits and applications for specific use cases, particularly those involving multilingual and/or multimodal aspects.
Explore the potential use of embeddings, machine learning, and deep learning techniques in combination with Linguistic Linked Data.

The programme of the summer datathon will contain three types of sessions:

Seminars to explain theoretical aspects and discuss selected topics.
Hands-on sessions to introduce the basic foundations of each topic, method, and technique, which participants will apply directly through different practical assignments.
Datathon sessions, where participants will work, in groups of 3-5, on miniprojects and where they will apply what they have learned, involving the generation and/or use of Linguistic Linked Data.

Participants are encouraged to propose a “miniproject” related to the topics of the datathon, which might include some datasets for their conversion into linked data. A selection of proposals will form the basis for the miniprojects which the participants will work on during the datathon sessions. Participants who do not propose a miniproject, or whose miniproject is not selected, will be able to join another miniproject. There will be an award for the best miniproject.