CSC456-2306 High-Performance Scientific Computing

Spring 2026 Bulletin/discussion board for csc456-2306 -- course outline -- MarkUs TBA (for MarkUs, login with UTorId)

[Announcements] [Material covered[ [Lectures, Tutorials, Assignments (with pass)]


Announcements, Course information for current students:


Material covered or to be covered in the course (with textbook sections in parentheses) (later)
8-1-2026 (2 hrs)
1   Introduction
1.1 Motivation for high-performance and parallel computing
1.2 Parallel architectures
    [roughly from Ortega 1.1, see also Foster 3.7.2, Zhu 1.2, 2.3-5]
  * Vector versus parallel
  * Parallel versus distributed
  * SIMD versus MIMD
  * Shared versus local memory
    Def: contention and communication time
    Def: communication diameter
    Def: valence
  * Interconnection networks
  - Completely connected
  - Bus network, ring network
  - Mesh connection
    + 1-dim (linear array), ring
    + 2-dim, 2D torus
    + 3-dim, 3D torus
  - k-ary tree
  - Hypercube (d-dim cube)
  - Butterfly (switching network), cube connected cycles network
  - Shuffle-exchange
  - Omega network
  - Other: hybrid schemes, hierarchical schemes, clusters, etc
  * Mappings between interconnection networks
  - equivalence between a n x log n butterfly (for normal algorithms),
    a n leaves binary tree (for normal algorithms),
    a (log n)-dim cube and a n processor shuffle-exchange
    [Ullman, pgs 219-221]
  - simulation of a k-dim mesh by a d-dim hypercube
    [Bertsekas, Tsitsiklis, pgs 52-54, Kumar 2.7]
1.3 Some concepts and definitions in parallel computing
    [roughly from Ortega 1.2 (pgs 20-25), see also Zhu 1.3, Foster 3.3,
     Kumar 3.1, 5.1-3]
  * degree of parallelism of a parallel algorithm
  * granularity of a parallel algorithm
  * speedup and efficiency of a parallel algorithm on a parallel machine
  * data ready time
  * load balancing
1.4 Simple examples [roughly from Ortega 1.2 and 1.3]
  * adding two vectors of size n
  * summing up n numbers (directed sum-up to processor npout or global sum-up)
  * broadcast a number
    [see also Foster 2.3.2, 2.4.1, 2.4.2]
  * inner product of two vectors
12-1-2026 (1 hrs)
  * matrix-vector multiplication (by rows and by columns) [pg 36-38, Kumar Ex 2.5, Ex 3.1]
  * all-to-all broadcast (total exchange) algorithm [Kumar 6.6]
  * global sum-up of n vectors
1.8 MPI
    General 
    Example 1 (test0c.c)
15-1-2026 (2 hrs) NOT DONE due to UT snow closure
19-1-2026 (1 hrs)
    Send and Receive
    Example 2 (test1c.c)
    Collective operations
    Timing in MPI
23-1-2026 (2 hrs)
    Example 3 (test3c.c)

1.5 Performance study
    Modelling performance - computation time, communication time
    [Foster 3.3, 3.7, Kumar 2.5]
    Obtaining experimental data [Foster 3.5]
    Fitting data to models [Foster 3.5]
1.6 Measuring and studying speedup and efficiency
    [Ortega 1.2, pgs 25-27, Zhu 1.3, Foster 3.4]
  * speedup based on the sequential time, Amdahl's law
  * speedup based on the parallel time, Gustavson's model
  * scaled (workload) speedup, scaled memory speedup
  * ways to experimentally measure scaled speedup

[more to come]

Notes and handouts:
Note on use of notes: Notes will be available when the course starts. While it may be convenient to study for the course by reading the notes, it should be made clear that the notes are not there to substitute the textbook or any relevant book. Notes are always more condensed and give less overall information than books.
Notes with math notation, etc, are difficult to read online. It may be preferable for some of you to print out the 4-page style notes on paper (preferably double-sided).


Access to the data below requires that you type in your CDF (teaching labs) username (same as UTorId) and last 5 digits of your student number as password. This password (for accessing the website) cannot be reset.

Lecture notes

Assignments Other