Files and While Loops

See the following video for an introduction to files and while loops.

In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo("9-YH_M6_J-g")
Out[2]:

To summarize:

In order to open a file, you need to first set Python's working directory to the directory where the file is (if you're reading the file) or will be (if you're writing to the file).

On different systems, directories are accessed slightly differently. Assuming that your username is guerzhoy and you have created the folder c4mW4 on your desktop, here is how you would do things on different systems:

Mac:

import os
os.chdir("/Users/username/Desktop/c4mW4")


Windows:

import os
os.chdir("c:/Users/username/Desktop/c4mW4")


Linux:

import os
os.chdir("/home/guerzhoy/Desktop/c4mW4")

Once you're in the right directory, you can open the filen, and then write to it:

In [4]:
f = open("rain.txt", "w")                 #"w" means you will be writing to the file. Use "r"
                                          #(or nothing) if you want to read from the file
f.write("There are holes in the sky\n")
f.write("Where the rain gets in\n")
f.write("But they're ever so small\n")
f.write("That's why the rain is thin\n")
f.close()                                 #Close the file once you're done with it
                                          #This is sometimes necessary to avoid problems with
                                          #Writing files

Above, f is a file object. The file object keeps track of where you are in the file. By default, you just keep writing (and reading) where you left off (as opposed to, for example, always writing to the beginning of the file). Note one thing that's different here: unlike with print, we need to explcitly add the newline (\n) characters to strings, or else everything we write to rain.txt will appear on a single line.

To read the entire file into a string, you can use

In [6]:
f = open("rain.txt", "r")
text = f.read()
f.close()
In [7]:
print(text)
There are holes in the sky
Where the rain gets in
But they're ever so small
That's why the rain is thin

You can use for-loops in order to read in the file line-by-line:

In [8]:
f = open("rain.txt")
for line in f:
    print(line[:-1])
There are holes in the sky
Where the rain gets in
But they're ever so small
That's why the rain is thin

line is assigned the first line from the file, then the second line from the file, and so on.

Note that we printed line[:-1] rather than line. That's because be default, print adds a newline to the string anyway, and the last character in line is \n. If we just print line, we'll get the following:

In [9]:
f = open("rain.txt")
for line in f:
    print(line)
There are holes in the sky

Where the rain gets in

But they're ever so small

That's why the rain is thin

You can also read lines from a file using readline

In [12]:
f = open("rain.txt")
print(f.readline()[:-1])
There are holes in the sky

This got as the first line. This is where the fact that f remembers where we are in the file is useful. If we call f.readline again, we'll get the next line

In [13]:
print(f.readline()[:-1])
Where the rain gets in

While Loops

This will be a brief diversion into another kind of loop: the while-loop. The structure of the while-loop is as follows:

while <cond>:
  <block>

Every time, we check whether the condition cond is true, and if it is, we run <block>. We then repeat the same thing again.

This is useful when interacting with the user: the computer keeps waiting for input from the user, and reading in the input, until some the user types in input that signifies the end of the interaction,

Here's a silly example:

In [14]:
def dishes():
    '''Repeatedly ask the user for input,
    and returns "Yes" or "No" once the user
    inputs "Yes" or "No"'''
    
    answer = ""
    while answer != "Yes" and answer != "No":
        answer = input("Did you do the dishes? ")
        
    return answer

The function will keep running and getting input from the user until the user inputs either "Yes" or "No".

Reading and Processing Large Files

Let's read in the file january06.txt. Note that the first two lines are the "header":

Year    Day    Hour    Temp    RH    WndSp    WndDir    Precip    Solar
    Julian    EST    *C    %    kmh    degrees    mm    W/m2
2005    355    1100    -3.377    78.4    5.128    268    0    113.8
2005    355    1200    -2.559    72.1    6.285    236.8    0    172.7
2005    355    1300    -2.445    66.72    6.721    246.3    0    155.9
2005    355    1400    -1.791    61.91    6.07    258.2    0    184.4
2005    355    1500    -2.795    66.26    6.71    250.8    0    95.7
2005    355    1600    -3.336    69.96    6.74    244.2    0    59.73
2005    355    1700    -3.999    75.6    5.915    250.4    0    11.03
2005    355    1800    -4.782    81.8    6.395    247.7    0    -3.375
2005    355    1900    -4.27    83.3    5.279    240    0    -1.233

The fourth column is the temperature. Let's obtain the temperature from the first line automatically:

In [15]:
os.chdir("/home/guerzhoy/Desktop/c4m")
f = open("january06.txt")
header = f.readline()
temp_index = header.split().index('Temp')  #temp_index will be 3m subce header.split()[3] is "Temp"
f.readline()                               #Skip the next line, don't store it anywhere since it's not needed 
line = f.readline()                        #The first line of the data

#Now, get the fourth item in line:
print("Temperature on first line:", float(line.split()[temp_index]))
Temperature on first line: -3.377

Let's get five more lines:

In [16]:
for i in range(5):
    print(f.readline()[:-1])
    
2005	355	1200	-2.559	72.1	6.285	236.8	0	172.7
2005	355	1300	-2.445	66.72	6.721	246.3	0	155.9
2005	355	1400	-1.791	61.91	6.07	258.2	0	184.4
2005	355	1500	-2.795	66.26	6.71	250.8	0	95.7
2005	355	1600	-3.336	69.96	6.74	244.2	0	59.73

Suppose we now want to get the first line where the information is positive. We can do this using a while-loop: we'll keep reading in lines using readline while the temperatures are negative

In [18]:
f = open("january06.txt")
header = f.readline()
temp_index = header.split().index('Temp')  #temp_index will be 3m subce header.split()[3] is "Temp"
f.readline()                               #Skip the next line, don't store it anywhere since it's not needed 
line = f.readline()
cur_temp = float(line.split()[temp_index])
while cur_temp < 0:
    line = f.readline()
    cur_temp = float(line.split()[temp_index])
print(line.split())
['2005', '356', '1600', '0.128', '76.5', '8.43', '232.8', '0', '57.36']

What we accomplished is reading in the data up until a certain point, and then stopping. We already saw a mechanism that can accomplish the same kind of thing: if we use a return statement inside a function, we exit whatever loop we were in inside the function

In [19]:
def get_first_warm_day(filename):
    f = open(filename)
    header = f.readline()
    temp_index = header.split().index('Temp')
    f.readline()
    for line in f:
        cur_temp = float(line.split()[temp_index])
        if cur_temp >= 0:
            return line

print(get_first_warm_day("january06.txt").split())
['2005', '356', '1600', '0.128', '76.5', '8.43', '232.8', '0', '57.36']

Practice with while loops

A list contains the patient's report about their well-being for each day. Return the number of days it takes for the patient to say "I'm fine".

For example, for

reports = ["My leg hurts",
           "I'm tired",
           "I'm sleepy",
           "Awesome!",
           "Hi Doctor, I'm fine",
           "Bad again"]

It takes 4 days for the patient to say "I'm fine".

Here is how to compute that using a while-loop:

In [22]:
reports = ["My leg hurts",
           "I'm tired",
           "I'm sleepy",
           "Awesome!",
           "Hi Doctor, I'm fine",
           "Bad again"]

n_days = 0
while reports[n_days].find("I'm fine") == -1:
    n_days += 1

print(n_days)               
4

Here is how we can write a function with a for-loop in it to accompish the same thing:

In [23]:
def time_to_recover(reports):
    n_days = 0
    for report in reports:
        n_days += 1
        if reports[n_days].find("I'm fine") != -1:
            return n_days
            
            
In [24]:
time_to_recover(reports)
Out[24]:
4