Welcome to Big Data Management on Modern Hardware
Description: The roots of many productive database systems today date back thirty years or more. Prominent systems such as IBM's System R or the then-research prototype Ingres were first developed in the 1970s and were designed to address the hardware landscape of the time: disks or even tapes were the only medium to hold reasonable amounts of data; main memory could be considered as truly random access; and the major cost factor in database processing was I/O.
Since that time, computer architectures have changed significantly. RAM chips have become cheap enough to make in-memory processing feasible; caches and other architectural details lead to non-uniform memory access cost (an increasingly relevant performance factor); and the omnipresence of multi-core systems adds a whole new class of complexity to the problem.
In this course we look at how architectural changes affect database systems. Rather than suffering from the increasing latency gap for accesses to main memory, for instance, we can use available CPU caches to our advantage. A cache-aware design can improve the performance of a database operation by orders of magnitude. Likewise, modern CPU features (such as vector instructions) or specialized CPUs (like IBM's Cell processor or the nVidia CUDA architecture) can accelerate database tasks if the respective implementation has been designed carefully.
Prerequisites: A general background in computer science and general familiarity with database management, as can be achieved through an undergraduate database course, is expected. Participants who have taken a graduate database course will benefit from this additional background.
Learning objectives: The goal of this course is that students can understand ongoing trends in hardware development and link them to the behaviour of algorithms, in particular data-intensive algorithms, when ran on modern hardware. This understanding will help in the design of new algorithms, tailor-made for the characteristics of modern hardware.
Format: Readings, lectures, and exercises
Lecturer bio: Jens Teubner is leading the Databases and Information Systems Group at TU Dortmund in Germany. His main research interest is data processing on modern hardware platforms, including FPGAs, multiKcore processors, and hardware-accelerated networks. Previously, Jens Teubner was a postdoctoral researcher at ETH Zurich (2008-2013) and IBM Research (2007-2008). He holds a PhD in Computer Science from TU München (Munich, Germany) and an M.S. degree in Physics from the University of Konstanz in Germany.
Organizer: Associate Professor Katja Hose, AAU, email: khose@cs.aau.dk
Lecturers: Jens Teubner, TU Dortmund, Germany, email: jens.teubner@cs.tu-dortmund.de
ECTS: 2
Time: 1 and 2 March 2017. From 8:00-16:00
Place: Selma Lagerlöfs Vej 300, room 02.90, Aalborg University
Zip code: DK-9220 Aalborg East
City: Aalborg
Number of seats: 20
Deadline: 8 February 2017
Important information concerning PhD courses We have over some time experienced problems with no-show for both project and general courses. It has now reached a point where we are forced to take action. Therefore, the Doctoral School has decided to introduce a no-show fee of DKK 5,000 for each course where the student does not show up. Cancellations are accepted no later than 2 weeks before start of the course. Registered illness is of course an acceptable reason for not showing up on those days. Furthermore, all courses open for registration approximately three months before start. This can hopefully also provide new students a chance to register for courses during the year. We look forward to your registrations.
- Teacher: Katja Hose