Introduction

In this training workshop you will explore programming models for scientific computing. Topics include

  • Fundamentals of Scientific Computing
  • Programming languages for HPC
  • Multithreading and multiprocessing
  • GPU programming models
  • Widely used HPC libraries

The training will consist of guided sessions, student presentations, and programming exercises.

Schedule

Each day consists of a morning session from 10am to 12:30 and an afternoon session from 2pm to 4pm.

Monday

  • Morning Session
    • Setting up your software environment
    • Fundamentals of HPC: hardware, programming languages, parallelisation, performance models.
  • Afternoon Session
    • Discussion of expectations for the week
    • Preparation time for Tuesday presentations

Tuesday

  • Morning Session
    • Talks on HPC programming languages and environments
  • Afternoon Session
    • CDT Group session on living safely in London
    • ARC social hour

Wednesday

  • Morning Session
    • Introduction to dense and sparse linear algebra software
  • Afternoon Session
    • Preparation time for Thursday presentations

Thursday

  • Morning Session
    • Presentations on mathematical software libraries
  • Afternoon session
    • Hackathon on mathematical software

Friday

  • Morning Session
    • Talk on FEniCS, a major open-source computational simulation platform. (Speaker: Dr Chris Richardson, University of Cambridge)
  • Afternoon Session
    • Summary and discussion of potential follow-up projects

Presentations on Programming Models

The morning session on Tuesday consists of talks introducing HPC Programming Models. You are asked to prepare slides demonstrating the topic together with some simple example live example codes.

Each presentation should be at most 15 minutes in length, talk about the software and perform a small demonstration of the software. The order of talks will be randomly decided on the day.

  • Distributed programming with MPI. MPI is a distributed programming model going back to 1991. Even more than 30 years later it is still the dominant model for parallelisation on large-scale HPC clusters. (Maciej, Bryce)
  • Shared memory paralellisation with OpenMP. OpenMP goes back to 1997 and is a leading model for shared memory parallelisation on CPUs. In recent versions it even supports some offloading to GPUs. (Shany, Anees)
  • CUDA for software development on NVIDIA GPUs. CUDA is the dominant programming model for NVIDIA GPUs. It is a large ecosystem consisting of CUDA compiler , development environments and scientific libraries. (Ross, Callum, Jose Miguel)
  • OpenCL - Vendor independent GPU programming. OpenCL was originally developed as an open standard to compute with the vendor dependent CUDA ecosystem. While it is less used directly in scientific computing it is still very important for development on embedded devices. And even for Scientific Computing applications it still provides a powerful way to write device independent GPU kernels. (Lewis, Louis)
  • SYCL - A growing heterogeneous computing standard. SYCL has evolved from OpenCL to provide a modern heterogeneous computing environment in C++. It is the programming standard underlying modern Intel GPU accelerators but also supports AMD and NVIDIA. (Anastasia, Kelan)
  • Metal - Apple's take on GPU computing. Apple Metal is rarely used for scientific computing. But it is a powerful model and the only way to access GPU cores on Apple Silicon. (Divij, Ammar)
  • SIMD - Low-level vector registers. SIMD is not really a programming model. It describes the set of instructions found on most modern CPUs to efficiently execute operations on vector registers. Any CPU code targeted for performance is optimised to make use of SIMD instructions as much as possible. (Benjamin, Lucas)
  • Parallel Programming with Jax. Jax is a library for efficient parallel array computations. It is mostly used for Machine Learning but can also be used to efficiently implement many other algorithms in scientific computing. (Skye, Oliver, Advaith)

Presentations on mathematical libraries

The morning session on Thursday focuses on mathematical libraries. We will hear from a range of projects spanning core linear algebra up to complex multiphysics environments.

Each presentation should be at most 15 minutes in length, talk about the software and perform a small demonstration of the software. The order of talks will be randomly decided on the day.

  • Blas/Lapack are the core of almost any modern dense linear algebra package. Blas is a definition of basic matrix operations, implemented through fast low-level libraries (e.g. Openblas, Blis, Apple Accelerate). Lapack implements many standard matrix decompositions (e.g. LU, SVD, QR, etc.) based upon efficient Blas calls. (Maciej, Bryce)
  • PETSc is widely used package on HPC systems for linear and nonlinear solvers, supported by huge community of users and developers. (Shany, Anees)
  • Suitesparse is a widely used package specialising on sparse matrix solvers. The solvers built into Matlab, Julia, and many other mathematical environments are built around Suitesparse. (Ross, Callum, Jose Miguel)
  • Gmsh is a powerful package for mesh discretisation and visualization. (Lewis, Louis)
  • Firedrake is a popular PDE solver package based on just-in-time code generation. (Anastasia, Kelan)
  • VTK is an essential visualization library and also data-exchange format for grid based data in scientific computing. Paraview and other visualization tools are built around VTK. (Divij, Ammar)
  • FMM3D is an outstanding library for the fast solution of n-body interactions.
  • Chebfun allows the solution of many ODEs and related problems to almost machine precision accuracy. (Benjamin, Lucas)
  • scikit-learn delivers a range of simple to use algorithms for machine learning. (Skye, Oliver, Advaith)

Setting up your environment

A good development environment on your computer is crucial to effective working as a computational researcher. In this section we discuss basic preliminaries to setup your machine for software development, code editing tools, external services, and other useful resources to get you started.

Configuring your computer

These days it does not really matter any more, whether you use Windows, Linux, or Mac OS for software development. All three of them are fine and support almost anything you need. There is a small caveat with Windows on ARM architectures ( e.g. recent Microsoft Surface devices ) Almost everything should work fine. But it is a relatively recent platform and there still maybe smaller support gaps.

Windows setup

The best way to get started with development on Windows is to use WSL - Windows Subsystem for Linux. This gives you a complete Ubuntu Linux environment under Windows, allowing any Linux software and development libraries to run natively within Windows. It is fully Microsoft supported and the preferred way to do scientific software development under Windows.

To access WSL you can use the Windows Terminal app. For software development within WSL you can use VS Code or a text based editor such as Neovim or Helix.

You should familiarize yourself with command line tools to install software under Ubuntu, the default Linux distribution in WSL. See also below for more information on this.

Linux setup

If you are already running Linux there is not much more you need to do apart from installing your preferred development tools. We are assuming that people use Ubuntu and typically show commands for Ubuntu. But any Linux setup works fine.

MacOS

If you are running Mac it is recommended to install Homebrew. This is a command line package manager that gives you access to a vast ecosystem of software tools and libraries for almost anything.

Coding tools

The most convenient graphical code editor is VS Code. It supports a huge range of plugins with support for almost any programming language. If you like working on the command line you might also consider Neovim or Helix. While these have faster editing speed than VS Code they are much harder to learn though. For Neovim a preconfigured setup such as LazyVim is recommendable.

Many of you will do significant work in Python. One of the most frequently used Python distributions is Anaconda. Here, a personal recommendation is to install Conda-Forge. It is a minimalist version of Anaconda with just the interpreter and the conda-forge repository preconfigured, which has the largest selection of packages.

Working on the command line

Whatever system you have, you will do a lot of work on the command line. Very often, when using remote machines the command line is the only way to interact with a computer. On Ubuntu and WSL your command line will typically by Bash. On Mac OS it will be Zsh. Both are fairly similar for most day to day use. A good tutorial to get started on these and similar so-called Unix like shells is available here.

Version Management

Almost everybody uses Git for version management of code. You should use git together with a Github account. Github also has a simple tutorial for git.

Links to the course material