Welcome to RDF Graph Summarizatin: Principles, Techniques and Applications (2021)


Description: .

RDF is a popular model to represent Knowledge Graphs and Linked Open Data. The explosion in the amount of the RDF data on the Web has led to the need to explore, query, and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Hence, summarization has been applied to RDF data to facilitate these tasks by extracting concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity. 

This course presents a structured analysis and comparison of existing work in the area of RDF summarization. The concepts at the core of these approaches will be presented, and their main technical aspects and implementations as well as their use cases and shortcomings will be discussed.  

Learning goals cover four main parts: 

  1. Semantic summaries applications: The first part will introduce preliminaries and deal with the main classes of application contexts that have justified the need for RDF summaries such as indexing, estimating the size of query results, source selection, graph visualization and schema discovery.
  2. Structural summarization methods: Then, the methods and techniques for summarizing semantic graphs (based mostly on the graph structure, i.e., the paths and sub-graphs available in the RDF graph) will be presented and explained.
  3. Pattern mining methods: This part covers methods that employ mining techniques to identify patterns appearing in the semantic graph. A pattern might be a set of instances having a certain set of properties, which are in exact or approximate terms representative of the graph or provide enough information on the graph using some cost function.
  4. Statistical methods: Finally, we will discuss other techniques that try to qualitatively summarize the contents of a graph by counting occurrences, building histograms, measuring frequencies and other statistical measures based on the available semantic graph.

The final evaluation will be based on exercises on the various techniques and/or a mini-project for implementing such a summarization technique. 

Organizer: Professor Katja Hose - khose@cs.aau.dk

Lecturers: Haridimos Kondylakis

ECTS: 2.0

Time: June 2021

Place: Aalborg University

Zip code: 
9220

City: Aalborg

Number of seats:20 

Deadline: 13 May 2021


Important information concerning PhD courses: We have over some time experienced problems with no-show for both project and general courses. It has now reached a point where we are forced to take action. Therefore, the Doctoral School has decided to introduce a no-show fee of DKK 3.000 for each course where the student does not show up. Cancellations are accepted no later than 2 weeks before start of the course. Registered illness is of course an acceptable reason for not showing up on those days. Furthermore, all courses open for registration approximately four months before start. This can hopefully also provide new students a chance to register for courses during the year. We look forward to your registrations.