80-629 -- Introduction to Python for scientific computations and machine learning

Python is a great language for machine learning applications and research. It is widely used in academia and in industry. Python programs are often not as fast as equivalent C (or Fortran) ones but implementing ideas in Python is often much faster. Further Python contains many libraries useful for scientific computation which means that you can quickly implement and test out complex ideas.

Today we will explore some of these libraries. We will start with NumPy and SciPy which provide access to powerful mathematical tools. NumPy provides building blocks for storing data in N-dimensional structures (e.g., vectors and matrices) and performaing linear algebra computations on these structures. SciPy provides higher-level tools for doing various mathematical operations (e.g., statistics, optimization, advanced linear algebra, sparse structures). We will then explore scikit-learn a library that provides a wide range of machine learning algorithms (models and fitting procedures) and other machine learning tools (e.g., to pre-process data).

Getting started. We will again use ipython notebooks running on a Google Cloud machine. You can create a new notebook (if your old notebook still exists you can re-use it). Remember to use a meaningful name for it. The cluster is accessible at this URL:

http://35.237.167.110:8080/

Today's program. You will go through parts of the three tutorials. Exercise accompany the NumPy tutorial. Note: the exercises do not test every aspect of the tutorial (in other words even if you successfully complete the exercises of a section I would still advise that you skim the tutorials in search of material you may be less familiar with).

  1. NumPy tutorial: All sections except "Less Basic","Fancy indexing and index tricks","Tips and Tricks".
  2. SciPy tutorial: Skim the "Introduction", "Basic functions", and "Statistics" Sections. Skip the rest.
  3. Scikit-learn tutorial:

NumPy Exercises

  1. Basic array
    • Generate a random 3x4x5 array of integers.
    • Calculate the average value of all the entries in this array.
    • Calculate the average value by row and by column.
    • Reshape the array into a 60x1 vector.
  2. Solving systems of equations
    • Solve the following system of equations in two different ways:
      1) using the numpy.linalg.solve() function; and
      2) using the numpy.linalg.inv() function.
      3x + 2y + 10z = 52
      2x + 2y + 7z = 12
      x + 2y + 12z = 10