Welcome to Exploratory Data Analysis

Description: Data usually comes in a plethora of formats and dimensions, rendering the information extraction and exploration processes challenging. Thus, being able to perform exploratory analyses of the data with the intent of having an immediate glimpse of some of the data properties is becoming crucial. Exploratory analyses should be simple enough to avoid complicated declarative languages (such as SQL) and mechanisms, while at the same time retaining the flexibility and expressiveness of such languages. Recently, we have witnessed a rediscovery of the so-called example-based methods, in which the user, or analyst, circumvents query languages by using examples as input. An example is a representative of the intended results or, in other words, an item from the result set. Example-based methods exploit inherent characteristics of the data to infer the results that the user has in mind but may not be able to (easily) express. They can be useful in cases where a user is looking for information in an unfamiliar dataset, when they are performing a particularly challenging task like finding duplicate items, or when they are simply exploring the data. In this course, we present an excursus over the main methods for exploratory analysis, with a particular focus on example-based methods. We show how different data types require different techniques and present algorithms that are specifically designed for relational, textual, and graph data. The course also presents the challenges and new frontiers of machine learning in online settings that have recently attracted the attention of the database community. We conclude with a vision for further research and applications in this area. 

Format:
Readings, lectures, and exercises. 

Prerequisites:
A general background in computer science and general familiarity with database management, as can be achieved through an undergraduate database course, is expected. Participants who have taken a graduate database course will benefit from this additional background.

Learning objectives: 
The goal of this course is to enable the students to  understand ongoing trends in exploratory analysis and example-based methods. In particular, the course will cover techniques designed for relational, textual, and graph data as well as highlight challenges and new frontiers of machine learning in online settings. 

Organizer: professor Katja Hose, khose@cs.aau.dk 

Lecturers: Davide Mottin (Aarhus University)

ECTS: 2

Time: 20, 21 May 

Place: Aalborg

Zip code: 

City: 
Aalborg

Number of seats: 30

Deadline: 29 April

Important information concerning PhD courses: We have over some time experienced problems with no-show for both project and general courses. It has now reached a point where we are forced to take action. Therefore, the Doctoral School has decided to introduce a no-show fee of DKK 5,000 for each course where the student does not show up. Cancellations are accepted no later than 2 weeks before start of the course. Registered illness is of course an acceptable reason for not showing up on those days. Furthermore, all courses open for registration approximately three months before start. This can hopefully also provide new students a chance to register for courses during the year. We look forward to your registrations.