Welcome to Scientific Computing using Python

Description:

Many research projects involve scientific computing for analyzing [big] data and/or simulating complex systems. This makes it necessary have a systematic approach to obtaining well-tested and documented code. Further, we see an increased interest in reproducible research, which allows other researchers the opportunity to dig further into others research results as well as easy access to results and improving productivity by reusing code and software.

This is an introductory course in scientific computing using the increasingly popular programming language Python. Python is gaining popularity in science due to a number of advantages such as having a rich set of libraries for computing and data visualization, excellent performance optimizing possibilities, standard tools for simple parallel computing, fast development cycle and high productivity – just to name a few. Python is open source and as such an asset for any researcher following the reproducible research paradigm.

The course covers three main areas: i) The Python programming language itself; ii) various aspects of scientific computing; and finally iii) high performance computing. The specific content is as follows.


The Python language:
1. Course introduction:
a. Historical overview of scientific computing and high performance computing.
b. Trends in hardware and software.
2. Python development environment.
3. Python from above.
a. Name space.
b. Modules and packages.
c.  …
4. The Python programming language:
a. Data types.
b. Branching.
c. Looping.
d. Functions.
e.  …
f.  …
5. Debugging and testing.
6. Basic scientific computing packages:
a. Numpy (numerical computing – array based).
b. Scipy (various tools for integration, optimization, etc.).
c. Matplotlib (data visualization).
d. H5py (data storage/access via HDF).
e. Pandas (handling of relational/labeled data in an easy and fast way).
7. Documentation using Sphinx.


Scientific computing:
1. Basic issues related to computational science such as
a.  Floating-point representation.
b.  Numerical accuracy and condition numbers.
c.  Algorithmic complexity.
2.  Computational platforms:
a.  Computational intensity of a CPU.
b.  Computer architectures (parallel / multi-core for example).
c.  Memory organization etc.
3.  Scientific software development:
a.  Code version control (via git).
b.  Code documentation.
c.  Cyclometric complexity measures.
d.  Test procedures (what to test – and how).
e.  Code refactoring.


High performance computing:
1. Profiling:
a. Memory profiling.
b. Time profiling (function based / line based).
2. Performance optimization:
a. Numba (just in time compilation).
b. Cython (compiled Python via C-extensions).
c. SWIG (C integration with Python).
d. Fortran wrapping using f2py
3. Parallel computing.
a. Theoretical aspects (Amdahl's law, Gustafson-Barsis' law etc.).
b. Parallel computing methodologies.
c. Distributed computing and shared memory computing.
4. Parallel computing in Python.


Audience:

The targeted audience is mainly engineers or similar with an interest in developing robust, portable and high quality code for various scientific computing purposes. By this we mean code to solve actual problems where [lots of] floating-point computations are needed. It is not a course in object-oriented programming and we apply a procedural approach to programming in the course.
 
Prerequisites:
Participants must have some basic experience in code development in e.g. MATLAB, C or FORTRAN. Further, some basic skills in general use of a computer are expected. The tools applied work best using Linux or Mac OSX – Microsoft Windows may experience challenges when using parallel computing. We have USB memory sticks, from which you can boot Ubuntu Linux and run Python directly from the memory stick. These can be borrowed if you like.


Learning objectives:

After completing the course the participants will:

1. have fundamental knowledge of important aspects of scientific computing.
2. be able to map a mathematically formulated algorithm to Python code.
3. know how to document, debug, test and profile the developed code.
4. know when and how to optimize Python code.
5. know when and how to apply parallel computing.


Teaching methods:

A combination of lectures, demonstrating examples using iPython notebooks, smaller exercises and a mini-project is used to facilitate learning. The course is rich in examples and active user participation is expected to facilitate learning – the topics covered demand a “learning by doing” approach.


Criteria for assessment:
Solutions to exercises must be delivered individually and a mini-project (preferably a smaller computational task relevant to the participant) must be delivered (5-10 pages) in addition to the developed code. The code must include testing/validation, and performance evaluation. Active participation and completion of assignments must be fulfilled to pass the course.


Key literature:
We expect to use a combination of the following:

1. A good book on Python of which there are several possibilities and none selected yet.
2. References to Python and all relevant packages (freely available via http://python.org).
3. A number of scientific papers relevant for specific parts of the course.


Organizer:

Professor Torben Larsen, Department of Electronic Systems.


Lecturers:

Assistant Professor Thomas Arildsen, Department of Electronic Systems.
Post-doc Tobias Lindstrøm Jensen, Department of Electronic Systems.
Ian Ozsvald, Mor Consulting, UK.
[Professor Torben Larsen, Department of Electronic Systems.]


ECTS for the student
(28 hours of work load per ECTS):
2.5 + 2 ECTS

It is also possible to only sign up for the May 19-21 part here: 
https://phd.moodle.aau.dk/course/view.php?id=368 

or the May 26-27 part here:
https://phd.moodle.aau.dk/course/view.php?id=369

It is expected that the participant has the necessary competences from the May 19-21 part if only the May 26-27 part is followed. Contact the PhD secretariat in case only a part of the course requested.


Time:

May 19, 20, 21, 26, 27 – all 2014 dates.


It is possible to sign up for the May 19-21 part and the May 26-27 part individually. It is expected that the participant has the necessary competences from the May 19-21 part if only the May 26-27 part is followed.


Place:

Aalborg University, Niels Jernes Vej 12A/6-104


Max. number of participants:

25


Deadline for registration:

April 28, 2014