Welcome to Big Data Integration

Organizer: Daniele Dell'Aglio, Katja Hose

Lecturers: Giovanni Simonini, University of Modena and Reggio Emilia (Italy)

ECTS: 2

Date/Time: 6-7 May 2024

Deadline: 15 April 2024

Max no. Of participants: 20

Description: The course aims at illustrating recent advancements in the field of big data integration from both the practical and methodological perspective. In particular, the focus will be on tools and techniques for large and heterogenous datasets, such as data lakes and open data. The main tackled topics will be: (i) Data discovery; (ii) Entity Resolution, i.e., the task of identifying and integrating records that refer to the same real-world entity in different datasets when an explicit identifier is not provided; (iii) data preparation, i.e., the set of preprocessing operations performed to transform the data at the structural and syntactical level.

Prerequisites:  Familiarity with a programming language.

Learning objectives: Students will learn core techniques and technologies for the tasks of (i) Data discovery; (ii) Entity Resolution; (iii) data preparation.

Important information concerning PhD courses: We have over some time experienced problems with no-show for both project and general courses. It has now reached a point where we are forced to take action. Therefore, the Doctoral School has decided to introduce a no-show fee of DKK 3000 for each course where the student does not show up. Cancellations are accepted no later than 2 weeks before start of the course. Registered illness is of course an acceptable reason for not showing up on those days. Furthermore, all courses open for registration approximately four months before start. This can hopefully also provide new students a chance to register for courses during the year. We look forward to your registrations.