CSC456-2306 High-Performance Scientific Computing

Spring 2026 Bulletin/discussion board for csc456-2306 -- course outline -- MarkUs (for MarkUs, login with UTorId)

[Announcements] [Material covered[ [Lectures, Tutorials, Assignments (with pass)]

Announcements, Course information for current students:

2026-03-04 Office hours and extra office hours (A2):
Thursday 5 March 2026, 1:15-2:15,
Monday 9 March 2026, 1:30-2:30,
Wednesday 11 March 2026, 1:30-2:30. If these times do not fit, send me e-mail to find other times.
Marks for T1 are now available at the CDF (teach.cs) student secure site. (They are NOT available on MarkUs.) Solutions will be discussed this Thursday.
2026-02-12: Marks for A1 are on MarkUs. Some points of the solutions will be discussed in class today.
2026-02-11: Assignment 2 is posted.
2026-02-11 The term test on Thu 26 Feb 2026, 3-5PM, will take place in room HA 410 Haultain Building. Please make sure you know how to find that building and room ahead of time.
The test will cover all the material up to (but NOT including) cyclic reduction. That is, all of introduction and direct solution of linear systems (LU, b/f/s, symmetric, banded matrices, etc) are included.
See some practice questions here. Also, consider that, from A1, Q1, Q3, Q5 and from A2, Q1, Q2, are appropriate for an exam.
Office hours and extra office hours:
Tuesday, 17 Feb 2026, 16:30-17:30,
Thursday 19 Feb 2026, 15:30-16:30,
Monday, 23 Feb 2026, 13:30-14:30,
Wednesday, 25 Feb 2026, 13:30-14:30.
If none of the above fits you or you need to meet remotely through Zoom, please send me e-mail ahead of time.
2026-02-03: MarkUs is open.
2026-01-25: Assignment 1 is posted. Due Mon Feb 9, 4 PM. Note the deadline is later than the one posted in the outline. The rest of the assessments' deadlines will remain as in the outline.
Extra and regular office hours: Wednesday 28 Jan 2026, 1:30-2:30 PM, Friday 30 Jan 2026, 1-2 PM, Tuesday 3 Feb 2026, 4:30-5:30 PM, Wednesday 4 Feb 2026, 1:30-2:30 PM, Friday 6 Feb 2026, 4-5 PM.
2026-01-22: See some information about running MPI on the teaching lab workstations.
The first meeting of the course is Thursday, January 8, 2026, 3-5 PM. For the first week only, there will be no Monday meeting. After that, we will use the Monday time mostly for regular lectures, that is we will have 3 hours per week. Towards the last few weeks, depending on how the course goes, we may skip some Mondays.
No classes February 16-20, 2026 (Spring break / Reading week -- A&S calendar).
To access the assignment, or the notes, you will need to type your CDF (teaching labs) username (same as UTorId) as loginname, and last 5 digits of your student number as password. This password (for accessing the website) cannot be reset, and it is only for accessing the website.
If you registered for the course today, please wait a day, to check access to the website.
Please note that to access the servers/computers, the password is different. There is an initial password set by the system (usually the student number), and then you can set it to whatever you want.
Bulletin/discussion board for csc456-2306 Spring 2026
Need to register with your utoronto.ca e-mail address.
Important note on the use of bulletin boards: No parts of or whole answers to the assignment/exam problems should be posted to the boards (or anywhere else), even after the assignment is due.
Questions and answers (even written by you) should never be shared with anyone or anywhere. Any violation of this rule will bring trouble to the poster.
Please use judgement before posting.
Any questions posted on the bulletin boards should be general enough and should not reveal intermediate or final results (correct or wrong). If unsure, ask by e-mail to instructor.
All assignments and exams are to be done individually by each student. See the course outline about academic integrity and additional information.
Here is a latex example file, with associated files spyalt.eps, spyblock.eps, trochoid1.m, assign.bib, to compile correctly, and output 456a1.pdf. You can use it for assignments, but you may also use you own latex template. Please always use font size 12 and linespread 1.1, as shown in the sample file. Do NOT use dark background in any page or figure. See the course outline for more details on presentation.
CDF/Teaching Labs: Please see http://www.cdf.toronto.edu or http://www.teach.cs.toronto.edu (they are the same), especially the Resources, Intro to new students tab, and the Using Labs, Remote Access Server tab.
Running remotely, etc: If you want to run matlab and other applications remotely on teach.cs (CDF), and you are using Windows or Mac OS on your computer/laptop, you will need the so-called X forwarding on your computer/laptop.
X forwarding allows you to run any application, including matlab, remotely on teach.cs (CDF) and the output/display/plots/etc be shown on your computer/laptop. See https://www.teach.cs.toronto.edu/using_cdf/x2go.html for a way to have X forwarding on your computer/laptop.
We will be using MPI (Message Passing Interface) for parallel programming. We will need at least 32 independent processors (with individual memory). CDF runs MPI and has many workstations/processors. All registered students will get accounts on CDF. Information on workstation names will be given later as the course proceeds.
Matlab: The basic Matlab is free for students. (UT pays for it.)
See https://www.mathworks.com/academia/tah-portal/university-of-toronto-676468.html
However, you need to first register with matlab, as shown in the above page. (You do this only once.)
You can also consider free (perpetually) alternatives to Matlab, such as Octave.
To use matlab on cdf, through the linux shell, on the workstations in the teaching labs, first you need to register with matlab as in
https://www.mathworks.com/academia/tah-portal/university-of-toronto-676468.html
(You do this only once.)
Then, on one of the workstations in the teaching labs, you type
```
/usr/local/bin/matlab -softwareopengl
```
or
```
/usr/local/bin/matlab -nodesktop -softwareopengl
```
The first time it is slow, as it loads lots of stuff, but the second time and on, it should be faster.
To use matlab on cdf, through the linux shell, remotely, you need to register and use it once on the workstations in the teaching labs, then, you can use it remotely. You need to login via ssh (on an xterm or other terminal in unix, linux, mac, or cygwin, and on putty in windows, or through some free X forwarding application such as mobaxterm, or X2Go) with a command such as
```
ssh -X user@cdf.toronto.edu
```
or
```
ssh -l user -X -f wolf.cdf.toronto.edu xterm
```
where ``user'' is your cdf username, then, once on cdf, run
```
/usr/local/bin/matlab -softwareopengl
```
or
```
/usr/local/bin/matlab -nodesktop -softwareopengl
```
Within matlab, you may want to go to a certain directory, say ~/matlab, and for this you can use the unix shell command
```
cd ~/matlab
```
within matlab. You may also want to have a startup.m file in that directory, to always run some standard commands (e.g. format compact) every time you start matlab.

Material covered or to be covered in the course (with textbook sections in parentheses) (later)

8-1-2026 (2 hrs)
1   Introduction
1.1 Motivation for high-performance and parallel computing
1.2 Parallel architectures
    [roughly from Ortega 1.1, see also Foster 3.7.2, Zhu 1.2, 2.3-5]
  * Vector versus parallel
  * Parallel versus distributed
  * SIMD versus MIMD
  * Shared versus local memory
    Def: contention and communication time
    Def: communication diameter
    Def: valence
  * Interconnection networks
  - Completely connected
  - Bus network, ring network
  - Mesh connection
    + 1-dim (linear array), ring
    + 2-dim, 2D torus
    + 3-dim, 3D torus
  - k-ary tree
  - Hypercube (d-dim cube)
  - Butterfly (switching network), cube connected cycles network
  - Shuffle-exchange
  - Omega network
  - Other: hybrid schemes, hierarchical schemes, clusters, etc
  * Mappings between interconnection networks
  - equivalence between a n x log n butterfly (for normal algorithms),
    a n leaves binary tree (for normal algorithms),
    a (log n)-dim cube and a n processor shuffle-exchange
    [Ullman, pgs 219-221]
  - simulation of a k-dim mesh by a d-dim hypercube
    [Bertsekas, Tsitsiklis, pgs 52-54, Kumar 2.7]
1.3 Some concepts and definitions in parallel computing
    [roughly from Ortega 1.2 (pgs 20-25), see also Zhu 1.3, Foster 3.3,
     Kumar 3.1, 5.1-3]
  * degree of parallelism of a parallel algorithm
  * granularity of a parallel algorithm
  * speedup and efficiency of a parallel algorithm on a parallel machine
  * data ready time
  * load balancing
1.4 Simple examples [roughly from Ortega 1.2 and 1.3]
  * adding two vectors of size n
  * summing up n numbers (directed sum-up to processor npout or global sum-up)
  * broadcast a number
    [see also Foster 2.3.2, 2.4.1, 2.4.2]
  * inner product of two vectors
12-1-2026 (1 hr)
  * matrix-vector multiplication (by rows and by columns) [pg 36-38, Kumar Ex 2.5, Ex 3.1]
  * all-to-all broadcast (total exchange) algorithm [Kumar 6.6]
  * global sum-up of n vectors
1.8 MPI
    General 
    Example 1 (test0c.c)
15-1-2026 (2 hrs) NOT DONE due to UT snow closure
19-1-2026 (1 hr)
    Send and Receive
    Example 2 (test1c.c)
    Collective operations
    Timing in MPI
22-1-2026 (2 hrs)
    Example 3 (test3c.c)

1.5 Performance study
    Modelling performance - computation time, communication time
    [Foster 3.3, 3.7, Kumar 2.5]
    Obtaining experimental data [Foster 3.5]
    Fitting data to models [Foster 3.5]
1.6 Measuring and studying speedup and efficiency
    [Ortega 1.2, pgs 25-27, Zhu 1.3, Foster 3.4]
  * speedup based on the sequential time, Amdahl's law
  * speedup based on the parallel time, Gustavson's model
  * scaled (workload) speedup, scaled memory speedup
  * ways to experimentally measure scaled speedup
26-1-2026 (1 hr) NOT DONE due to UT snow closure
29-1-2026 (2 hrs)
1.7 Scalability analysis [Foster 3.4, Kumar 4.4]
    Scalability with fixed problem size
    Scalability with scaled problem size - isoefficiency function
    Efficiency and scalability: an example and some considerations

2   Solution of linear systems - Direct methods
2.0 LU factorisation and the Gauss elimination algorithm [Ortega 2.2]
  * The algorithm and its use for solving linear systems
2.1 Medium and coarse grain parallel LU factorisation algorithms [Ortega 2.2]
  * simple model, p = n, row assignment
  * simple model, p = n, column assignment
  * block storage, row assignment
  * wrapped interleaved storage, row assignment
  * reflection interleaved storage, row assignment
  * Notes
  - communication
  - column assignment
  - shared memory machines
  - dynamic load balancing, pool of tasks
  - send-ahead technique
  - partial pivoting, row or column assignment
2-2-2026 (1 hr)
2.2 Fine grain LU factorisation - Data Flow algorithm [Ortega 2.2]
2.3 Symmetric and symmetric positive definite matrices [Ortega 2.2]
    The LDL^T and the Cholesky factorisations
5-2-2026 (2 hrs)
2.4 Triangular systems [Ortega 2.2]
  * ways of viewing the sequential algorithm
  * column sweep  algorithm - row    wrapped interleaved storage
  * inner product algorithm - column wrapped interleaved storage
  * send-ahead and compute-ahead
  * symmetric matrices
  * shared memory machines
2.5 Multiple right side vectors
2.6 Banded systems, sequential banded LU [Ortega 2.3]
    Banded systems, parallel banded LU - pivoting [Ortega 2.3]
2.7 Triangular banded systems [Ortega 2.3]
II. Boundary Value Problems: an one-dimensional example [Ortega 2.3, pg. 120]
    start

9-2-2026 (1 hr)
II. Boundary Value Problems: an one-dimensional example [Ortega 2.3, pg. 120]
    end
-.- Boundary Value Problems: a  two-dimensional example [Ortega 3.1, pg. 134-135]
2.8 Tridiagonal systems - odd-even and cyclic reduction [Ortega 2.3, pg. 125]
9-2-2026 (2 hrs)
discussion on A1
2.8 Tridiagonal systems - odd-even and cyclic reduction [Ortega 2.3, pg. 125]
    end
2.9 Narrow banded systems - Partitioning methods [Ortega 2.3, pg. 114-120]
  - Partitioning Method I
  - Partitioning Method II
23-2-2026 (1 hr)
2.10 Domain decomposition - Schur complement methods [Ortega 2.2, pg. 120-125]
     [also Saad 3.1, 3.2, Zhu 2.5.3.3]
  *  Domain decomposition in 1D - ordering - arrowhead matrix
  *  General banded matrix - ordering - arrowhead matrix
  *  Schur complement - capacitance - Gauss transform system
  *  Solving the arrowhead system
  *  A parallel domain decomposition - Schur complement method
  *  Domain decomposition in 2D - ordering - arrowhead matrix
     size of reduced system, bandwidth of blocks

26-2-2026 (2 hrs) midterm
[more to come]

2-3-2026 (1 hr)
  *  Domain decomposition in 2D - ordering - arrowhead matrix
     size of reduced system, bandwidth of blocks (end)
III  Inner products, vector, matrix and function norms
     condition number of matrix

5-3-2026 (2 hrs)
Discussion on TT
3    Iterative methods for the solution of linear systems
3.1  Introduction - iterative methods - stopping criteria - splittings
     [Ortega 3.1, pg. 133-134, 138-139, see also Saad 4.1]
3.2  Basic iterative methods: Jacobi, Gauss-Seidel, SOR, SSOR
     [Ortega 3.1, pg. 133-134, 3.2, pg. 156-160, see also Saad 4.1]
3.3  Convergence of iterative linear solvers
     [Ortega 3.1, pg. 134, 3.2, pg. 157-158, see also Saad 4.2]
3.4  The Conjugate Gradient method [Ortega 3.3, see also Saad 6.7, Zhu 2.5.1]
9-3-2026 (1 hr)
3.5  Preconditioning [Ortega 3.4, see also Saad 10.1-3]
  *  Incomplete Factorisation preconditioning [Ortega pg. 211-214]
  *  Block diagonal preconditioning
  *  SSOR preconditioning
3.6  The Preconditioned Conjugate Gradient method
     [Ortega 3.4, Saad 9.2, Zhu 2.5.3]
3.7  Parallel Jacobi method - application to the 2D BVP
     [Ortega 3.1, Saad 11.4-6, Zhu 2.2.1, 2.2.3.1]
12-3-2026 (2 hrs)
3.8  Asynchronous iterative methods [Ortega 3.1, pg 138]
3.9  Block iterative methods - Parallel block Jacobi for the 5-pt-star matrix
     [Ortega 3.1, pg. 145-148, see also Saad 12.2]
3.10 Parallel Conjugate Gradient method - application to the 2D BVP
     [Ortega 3.3, Zhu 2.5.2]
3.11 The use of CG in solving the Schur complement system
     [Ortega 3.3, pg. 194-195, see also Saad 13.4, Zhu 2.5.3.3]
3.12 Parallel Gauss-Seidel and related methods - application to the 2D BVP
     [Ortega 3.2]
3.13 The red-black ordering - // GS, SOR and SSOR methods for the 5-pt-star mat
     [Ortega 3.2, Saad 12.4, Zhu 2.2.2, 2.2.3.2]
3.14 Multicolour orderings - // GS, SOR and SSOR methods for the 9-pt-star mat
     [Ortega 3.2, Saad 12.4, Zhu 2.2.3.2]
3.15 The block Gauss-Seidel and related methods for the 5-point-star matrix
     [Ortega 3.2]

Notes and handouts:
Note on use of notes: Notes will be available when the course starts. While it may be convenient to study for the course by reading the notes, it should be made clear that the notes are not there to substitute the textbook or any relevant book. Notes are always more condensed and give less overall information than books.
Notes with math notation, etc, are difficult to read online. It may be preferable for some of you to print out the 4-page style notes on paper (preferably double-sided).

Access to the data below requires that you type in your CDF (teaching labs) username (same as UTorId) and last 5 digits of your student number as password. This password (for accessing the website) cannot be reset.

Lecture notes

Assignments

Assignment 1, photos: Q1, Q1, Q2, Q3, Q4, Q5,

Other

Course outline
A brief introduction to MATLAB, Christina C. Christara and Winky Wai
Tutorial on MATLAB, 1-page Christina C. Christara