Skip to main content

Enrolment options

Course image
Course summary text:

Welcome to Distributed Data Processing with Dataflow Systems (2025)

Description: In today’s world,...

Computer Science and Engineering (2025)
Introduction:

Welcome to Distributed Data Processing with Dataflow Systems (2025)

Description: In today’s world, data is at the heart of decision-making processes across various domains. Dataflow is a programming paradigm and execution model that underpins many modern distributed data processing systems. In this model, developers create programs by defining sequences of functional transformations on input data. The system runtime then manages the execution of these programs across distributed computing infrastructures, abstracting away complexities related to development, distribution, communication, and fault tolerance.

This course delves into the fundamental concepts of dataflow systems, covering both programming models and implementation details. Starting with basic constructs for analyzing static and streaming data, the course progresses to more advanced topics such as iterations, time-based computations, and user-defined functions. We will explore and compare different approaches to implementing these constructs, highlighting their respective advantages and disadvantages.

Throughout the course, students will engage with examples from modern  dataflow systems and participate in hands-on sessions to complement the theoretical notions.

Prerequisites: Familiarity with Java

Learning objectives: 

On successful completion of this course, students will be expected to be able to:

1. Gain a comprehensive understanding of the dataflow paradigm, its significance in distributed data processing systems and the use cases where it can be used.

2. Design and implement dataflow programs that efficiently process large volumes of data in real-time. Master both basic constructs for static and streaming data analysis and advanced topics such as iterations, time-based computations, and user-defined functions.

3. Evaluate dataflow systems, understand the various performance metrics, design and execute sound experiments.

4. Compare the existing dataflow frameworks, understanding the relative advantages and disadvantages.


Organizer: 
Daniele Dell'Aglio

Lecturers: 
Alessandro Margara, Politecnico di Milano

ECTS: 
2.0

Time: 
9 - 10 June 2025

Place: 
Aalborg University

Zip code: 
9220

City: 
Aalborg

Maximal number of participants: 
25

Deadline: 
19 May 2025

Important information concerning PhD courses: 

There is a no-show fee of DKK 3,000 for each course where the student does not show up. Cancellations are accepted no later than 2 weeks before the start of the course. Registered illness is of course an acceptable reason for not showing up on those days. Furthermore, all courses open for registration approximately four months before start of the course.

We cannot ensure any seats before the deadline for enrolment, all participants will be informed after the deadline, approximately 3 weeks before the start of the course. 

For inquiries regarding registration, cancellation or waiting list, please contact the PhD administration at phdcourses@adm.aau.dk When contacting us please state the course title and course period. Thank you.


Duration: 1 Semester
Open in new window