Description: Data Integration systems offer a uniform interface to a multitude of data sources, whether struc-tured (e.g., databases, XML) or not (e.g., text documents, Web pages). Data Integration is one of the key problems faced by major companies, governments, and large science projects. Data integration is especially fuelled by a myriad of challenges in managing data on the Web, which contains hundreds of millions of heterogeneous data sets.

This course considers data integration problems as they occur on the Web, but emphasizes techniques that are relevant in other contexts as well. We begin by explaining the different reasons that make data integration hard: systems, logical, and social reasons. The course then covers the fundamentals of data integration, such as languages for resolving heterogeneity, automatic schema mapping techniques, query processing in heterogeneous systems, and novel architectures for data integration. Finally, we examine a recent trend in Web search in which search engines provide concrete answers to user queries, and we discuss data integration and quality issues that arise in that context.
Format: Readings, lectures, and exercises.
Prerequisites: A general background in computer science and general familiarity with database management, as can be achieved through an undergraduate database course, is expected. Participants who have taken a graduate database course will benefit from this additional background.
Learning objectives:
The goal of this course is to give an introduction to data management and integration and discuss novel architectures and recent trends. In particular, the course will teach fundamental techniques that cover languages for resolving heterogeneity, automatic schema mapping, and query processing.
Lecturer bio: Alon Halevy heads the Structured Data Management Research group at Google. Prior to that, he was a Professor of Computer Science at the University of Washington, Seattle, where he founded the database group.
In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Infor-mation Integration space, and in 2004, he founded Transformic, a company that created search engines for the deep web and was acquired by Google. Dr. Halevy is a Fellow of the Association for Computing Machinery, received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000, and was a Sloan Fellow (1999-2000). He received his Ph.D. in Computer Science from Stanford University in 1993 and his Bachelor’s from Hebrew University in Jerusalem. Halevy is also a coffee culturalist and authored the book "The Infinite Emotions of Coffee," published in 2011; and he is a co-author of the book "Principles of Data Integration," published in 2012.

Organizer: Postdoc Katja Hose & professor Christian S. Jensen, csj@cs.aau.dk


Lecturers: Alon Halevy, Google, halevy@google.com

ECTS: 2

Time: 19-20 November, 2015, 9:00 to 17:00 hours

Place: Aalborg University, Cassiopeia, Selma Lagerlöfs Vej 300

Room: 19 Nov: 02.90 and 20 Nov: 02.13

Zip code: 9220

City: Aalborg

Number of seats:

Deadline: 10 November, 2015