[Announcements] [Material covered[ [Lectures, Tutorials, Assignments (with pass)]
Announcements, Course information for current students:
/usr/local/bin/matlab -softwareopenglor
/usr/local/bin/matlab -nodesktop -softwareopenglThe first time it is slow, as it loads lots of stuff, but the second time and on, it should be faster.
ssh -X user@cdf.toronto.eduor
ssh -l user -X -f wolf.cdf.toronto.edu xtermwhere ``user'' is your cdf username, then, once on cdf, run
/usr/local/bin/matlab -softwareopenglor
/usr/local/bin/matlab -nodesktop -softwareopenglWithin matlab, you may want to go to a certain directory, say ~/matlab, and for this you can use the unix shell command
cd ~/matlabwithin matlab. You may also want to have a startup.m file in that directory, to always run some standard commands (e.g. format compact) every time you start matlab.
8-1-2026 (2 hrs)
1 Introduction
1.1 Motivation for high-performance and parallel computing
1.2 Parallel architectures
[roughly from Ortega 1.1, see also Foster 3.7.2, Zhu 1.2, 2.3-5]
* Vector versus parallel
* Parallel versus distributed
* SIMD versus MIMD
* Shared versus local memory
Def: contention and communication time
Def: communication diameter
Def: valence
* Interconnection networks
- Completely connected
- Bus network, ring network
- Mesh connection
+ 1-dim (linear array), ring
+ 2-dim, 2D torus
+ 3-dim, 3D torus
- k-ary tree
- Hypercube (d-dim cube)
- Butterfly (switching network), cube connected cycles network
- Shuffle-exchange
- Omega network
- Other: hybrid schemes, hierarchical schemes, clusters, etc
* Mappings between interconnection networks
- equivalence between a n x log n butterfly (for normal algorithms),
a n leaves binary tree (for normal algorithms),
a (log n)-dim cube and a n processor shuffle-exchange
[Ullman, pgs 219-221]
- simulation of a k-dim mesh by a d-dim hypercube
[Bertsekas, Tsitsiklis, pgs 52-54, Kumar 2.7]
1.3 Some concepts and definitions in parallel computing
[roughly from Ortega 1.2 (pgs 20-25), see also Zhu 1.3, Foster 3.3,
Kumar 3.1, 5.1-3]
* degree of parallelism of a parallel algorithm
* granularity of a parallel algorithm
* speedup and efficiency of a parallel algorithm on a parallel machine
* data ready time
* load balancing
1.4 Simple examples [roughly from Ortega 1.2 and 1.3]
* adding two vectors of size n
* summing up n numbers (directed sum-up to processor npout or global sum-up)
* broadcast a number
[see also Foster 2.3.2, 2.4.1, 2.4.2]
* inner product of two vectors
12-1-2026 (1 hr)
* matrix-vector multiplication (by rows and by columns) [pg 36-38, Kumar Ex 2.5, Ex 3.1]
* all-to-all broadcast (total exchange) algorithm [Kumar 6.6]
* global sum-up of n vectors
1.8 MPI
General
Example 1 (test0c.c)
15-1-2026 (2 hrs) NOT DONE due to UT snow closure
19-1-2026 (1 hr)
Send and Receive
Example 2 (test1c.c)
Collective operations
Timing in MPI
22-1-2026 (2 hrs)
Example 3 (test3c.c)
1.5 Performance study
Modelling performance - computation time, communication time
[Foster 3.3, 3.7, Kumar 2.5]
Obtaining experimental data [Foster 3.5]
Fitting data to models [Foster 3.5]
1.6 Measuring and studying speedup and efficiency
[Ortega 1.2, pgs 25-27, Zhu 1.3, Foster 3.4]
* speedup based on the sequential time, Amdahl's law
* speedup based on the parallel time, Gustavson's model
* scaled (workload) speedup, scaled memory speedup
* ways to experimentally measure scaled speedup
26-1-2026 (1 hr) NOT DONE due to UT snow closure
29-1-2026 (2 hrs)
1.7 Scalability analysis [Foster 3.4, Kumar 4.4]
Scalability with fixed problem size
Scalability with scaled problem size - isoefficiency function
Efficiency and scalability: an example and some considerations
2 Solution of linear systems - Direct methods
2.0 LU factorisation and the Gauss elimination algorithm [Ortega 2.2]
* The algorithm and its use for solving linear systems
2.1 Medium and coarse grain parallel LU factorisation algorithms [Ortega 2.2]
* simple model, p = n, row assignment
* simple model, p = n, column assignment
* block storage, row assignment
* wrapped interleaved storage, row assignment
* reflection interleaved storage, row assignment
* Notes
- communication
- column assignment
- shared memory machines
- dynamic load balancing, pool of tasks
- send-ahead technique
- partial pivoting, row or column assignment
2-2-2026 (1 hr)
2.2 Fine grain LU factorisation - Data Flow algorithm [Ortega 2.2]
2.3 Symmetric and symmetric positive definite matrices [Ortega 2.2]
The LDL^T and the Cholesky factorisations
5-2-2026 (2 hrs)
2.4 Triangular systems [Ortega 2.2]
* ways of viewing the sequential algorithm
* column sweep algorithm - row wrapped interleaved storage
* inner product algorithm - column wrapped interleaved storage
* send-ahead and compute-ahead
* symmetric matrices
* shared memory machines
2.5 Multiple right side vectors
2.6 Banded systems, sequential banded LU [Ortega 2.3]
Banded systems, parallel banded LU - pivoting [Ortega 2.3]
2.7 Triangular banded systems [Ortega 2.3]
II. Boundary Value Problems: an one-dimensional example [Ortega 2.3, pg. 120]
start
9-2-2026 (1 hr)
II. Boundary Value Problems: an one-dimensional example [Ortega 2.3, pg. 120]
end
-.- Boundary Value Problems: a two-dimensional example [Ortega 3.1, pg. 134-135]
2.8 Tridiagonal systems - odd-even and cyclic reduction [Ortega 2.3, pg. 125]
9-2-2026 (2 hrs)
discussion on A1
2.8 Tridiagonal systems - odd-even and cyclic reduction [Ortega 2.3, pg. 125]
end
2.9 Narrow banded systems - Partitioning methods [Ortega 2.3, pg. 114-120]
- Partitioning Method I
- Partitioning Method II
23-2-2026 (1 hr)
2.10 Domain decomposition - Schur complement methods [Ortega 2.2, pg. 120-125]
[also Saad 3.1, 3.2, Zhu 2.5.3.3]
* Domain decomposition in 1D - ordering - arrowhead matrix
* General banded matrix - ordering - arrowhead matrix
* Schur complement - capacitance - Gauss transform system
* Solving the arrowhead system
* A parallel domain decomposition - Schur complement method
* Domain decomposition in 2D - ordering - arrowhead matrix
size of reduced system, bandwidth of blocks
26-2-2026 (2 hrs) midterm
[more to come]
2-3-2026 (1 hr)
* Domain decomposition in 2D - ordering - arrowhead matrix
size of reduced system, bandwidth of blocks (end)
III Inner products, vector, matrix and function norms
condition number of matrix
5-3-2026 (2 hrs)
3 Iterative methods for the solution of linear systems
3.1 Introduction - iterative methods - stopping criteria - splittings
[Ortega 3.1, pg. 133-134, 138-139, see also Saad 4.1]
3.2 Basic iterative methods: Jacobi, Gauss-Seidel, SOR, SSOR
[Ortega 3.1, pg. 133-134, 3.2, pg. 156-160, see also Saad 4.1]
3.3 Convergence of iterative linear solvers
[Ortega 3.1, pg. 134, 3.2, pg. 157-158, see also Saad 4.2]
3.4 The Conjugate Gradient method [Ortega 3.3, see also Saad 6.7, Zhu 2.5.1]
3.5 Preconditioning [Ortega 3.4, see also Saad 10.1-3]
* Incomplete Factorisation preconditioning [Ortega pg. 211-214]
* Block diagonal preconditioning
* SSOR preconditioning
9-3-2026 (1 hr)
3.6 The Preconditioned Conjugate Gradient method
[Ortega 3.4, Saad 9.2, Zhu 2.5.3]
3.7 Parallel Jacobi method - application to the 2D BVP
[Ortega 3.1, Saad 11.4-6, Zhu 2.2.1, 2.2.3.1]
3.8 Asynchronous iterative methods [Ortega 3.1, pg 138]
3.9 Block iterative methods - Parallel block Jacobi for the 5-pt-star matrix
[Ortega 3.1, pg. 145-148, see also Saad 12.2]
3.10 Parallel Conjugate Gradient method - application to the 2D BVP
[Ortega 3.3, Zhu 2.5.2]
3.11 The use of CG in solving the Schur complement system
[Ortega 3.3, pg. 194-195, see also Saad 13.4, Zhu 2.5.3.3]
3.12 Parallel Gauss-Seidel and related methods - application to the 2D BVP
[Ortega 3.2]
3.13 The red-black ordering - // GS, SOR and SSOR methods for the 5-pt-star mat
[Ortega 3.2, Saad 12.4, Zhu 2.2.2, 2.2.3.2]
3.14 Multicolour orderings - // GS, SOR and SSOR methods for the 9-pt-star mat
[Ortega 3.2, Saad 12.4, Zhu 2.2.3.2]
3.15 The block Gauss-Seidel and related methods for the 5-point-star matrix
[Ortega 3.2]
Notes and handouts:
Note on use of notes:
Notes will be available when the course starts.
While it may be convenient to study for the course by reading the notes,
it should be made clear that the notes are not there to substitute
the textbook or any relevant book. Notes are always more condensed
and give less overall information than books.
Notes with math notation, etc, are difficult to read online.
It may be preferable for some of you to print out the 4-page style notes
on paper (preferably double-sided).
Access to the data below requires that you type in your CDF (teaching labs) username (same as UTorId) and last 5 digits of your student number as password. This password (for accessing the website) cannot be reset.
Lecture notes