A bit about file encoding

When we are reading in a file, we are using f = open(filename, encoding="latin1")

Here's what encoding="latin1" means. The file we are reading, like any file, is just a bunch of 0's and 1's:

In [1]:
1011100110111100011111010000011101101011001100111000100001111001101110101000100100100101000101010100111010110001001001110000101001011110000110011001001011100001000000110111100010100110000001101001011111111111010100100001010111010111100111001110010111110101111001100000111010011101111010111110010010000101110101100111011011001010010011100101100011100010011001111100110101101011010111100011110001011110101011010111110011011011110101111111000001000111010010011011011000000101010001011000101011000010111
Out[1]:
1011100110111100011111010000011101101011001100111000100001111001101110101000100100100101000101010100111010110001001001110000101001011110000110011001001011100001000000110111100010100110000001101001011111111111010100100001010111010111100111001110010111110101111001100000111010011101111010111110010010000101110101100111011011001010010011100101100011100010011001111100110101101011010111100011110001011110101011010111110011011011110101111111000001000111010010011011011000000101010001011000101011000010111

When we specify that the file is encoded using the "latin1" encoding, Python reads the file 8 digits (bits) at a time:

    10111001

    10111100

    01111101

    00000111

    01101011
    ...

Each 8 bits correspond to a character (so that there are 256 characters in total), You can read about which 8-bit sequences correspond to which characters here.

Obviously, all the world languages cannot be expressed using 256 characters. For example, there are many tens of thousands Chinese characters. In order to encode them, more complex encoding schemes are needed. Just one of them is Unicode, which also is able to encode the alphabets of languages such as Japanese, Arabic, Hebrew, Russian, Korean, etc.