Files and While loops

Introduction

So far all the data in our programs has either been hardcoded into the program itself or else it came from the user who typed it in at the keyboard. This is pretty limiting and it is fairly clear that we will want programs that can read data from files.

In this lesson we'll talk about what we can do with text files. Text files are files that use one of a number of standard encoding schemes where the file can be interpretted as printable characters. Later we might learn about binary files where we can't view the file as characters but for most of our purposes we can assume we have text.

Opening a File

First we will need a string to specify the name of our file.

We could have a variable for this and could:

  • Say it literally, if your program is always going to use the same name.
  • Read it from the user using input() and save it in a variable

Note: avoid using 'file' as a variable name, because it is a type.

Next, we use the command open and the name of the file

In [1]:
f = open('story.txt', 'r')
f
Out[1]:
<_io.TextIOWrapper name='story.txt' mode='r' encoding='UTF-8'>

This opens the file named story.txt from the current directory. It is open for reading (that's the r mode) and the type of object is io.TextIOWrapper. Don't stress about the type at all. Just think of it as an open file. The important conceptual idea here is that this object not only knows the contents of the file, but it knows our current position in the file. So once we start reading, it knows how much we've read and is able to keep giving us the next piece.

Reading from a File

There are several other ways to read from a file.

  1. Read a single line
In [3]:
myfile = open('story.txt', 'r')
s = myfile.readline()   # Read a line into s.
print(s)
s                       # Notice the \n that you only see when you look
                        # at the contents of the variable.
Mary had a little lamb

Out[3]:
'Mary had a little lamb\n'

The \n (backslash n) character is a single character representing a new line.

In [3]:
 s = myfile.readline()   # The next call continues where we left off.
 print(s)    
 s = myfile.readline()   # And so on...
 print(s)   
 myfile.close()
His fleece was white as snow,

and everywhere that Mary went

I can use this to read an entire file, bit by bit, under my control.

  1. Read a certain number of characters
In [4]:
filename = 'story.txt'
myfile = open(filename)
s = myfile.read(10)   # Read 10 characters into s.
print(s)
s = myfile.read(10)   # Read the next 10 characters into s.
print(s)
myfile.close()
Mary had a
 little la

I can also use this to read an entire file, bit by bit, under my control.

  1. Read a line at a time from beginning to end

If I know I want to read line by line through to the end, a for loop makes this easy. This is probably the most common way to read a file. Use this unless you have a reason not to.

In [5]:
f = open('story.txt')
for line in f:
    print(line)     # Or do whatever you wish to line

myfile.close()     # Good habit: close a file when you are done with it.
Mary had a little lamb.

His fleece was white as snow,

and everywhere that Mary went

the lamb was sure to go.

Question: Why is the output from the for loop double-spaced? Answer: print gives you a \n and there was one on the end of each line.

Question: How can you single space the output?

Strip the newline character from the end of each line before you print.

In [7]:
f = open('story.txt')
for line in f:
    line = line.strip('\n')
    print(line)
Mary had a little lamb.
His fleece was white as snow,
and everywhere that Mary went
the lamb was sure to go.

(4) Read everything in the file into one string

In [8]:
filename = "story.txt"
myfile = open(filename)
s = myfile.read()  # Read the whole file into string s.
print(s)
myfile.close()
Mary had a little lamb.
His fleece was white as snow,
and everywhere that Mary went
the lamb was sure to go.

In [9]:
s
Out[9]:
'Mary had a little lamb.\nHis fleece was white as snow,\nand everywhere that Mary went\nthe lamb was sure to go.\n'

(5) Use readlines() to read the file into a list of lines.

In [10]:
myfile = open('story.txt')
contents = myfile.readlines() 
type(contents)
contents
Out[10]:
['Mary had a little lamb.\n',
 'His fleece was white as snow,\n',
 'and everywhere that Mary went\n',
 'the lamb was sure to go.\n']

Beginners often do one of these last two approaches because they seem easy.

  • Question: What is the downside of reading it all in at once?
  • Answer: Takes potentially a lot of space!

Don't use this technique unless you really need access to the whole file at once.

Usually, we can read a piece, deal with it, and toss it out.

Dealing with the end of a file

With the for loop approach, the loop automatically stops when the end of the file is encountered. Or never even iterates once if the file is empty!

But what happens if you are at the end of the file when you call read or readline?
You get the empty string. You then know you can stop trying to read more.

Example

In [11]:
# Detecting the end of the file while reading line by line
myfile = open('story.txt')
next_line = myfile.readline()
while next_line != "":
    print(next_line)
    next_line = myfile.readline()
Mary had a little lamb.

His fleece was white as snow,

and everywhere that Mary went

the lamb was sure to go.

While Loops

This example introduces a new kind of loop -- a while loop

Structure of a while loop

    while (condition):
        while-body

What it does

    Check to see if the condition is true.
    If it is, execute the entire body of the loop and go back to the top
    Check again ...

Important Note: If the condition becomes false during the body of the loop, the loop does not stop at that moment. The only time it decides whether to continue or stop is at the top of the loop on each iteration.

An Exercise with While Loops

Write a function yes_or_no that asks a user to enter either 'yes' or 'no' and keeps looping asking again and again until the user enters one of these two options.

If you finish that exercise, change your function so that it accepts any case variation such as 'Yes', 'YES' or even 'nO' and then returns the lowercase version of what the user provided. But if the user says 'nope' or 'maybe', it doesn't return and asks again for 'yes' or 'no'.

In [4]:
def yes_or_no():
    answer = input("Please enter yes or no ")
    while answer.lower() != 'yes' and answer.lower != 'no':
        answer = input("Please enter yes or no ")
    return answer.lower()
yes_or_no()
Please enter yes or no yesterday
Please enter yes or no nope
Please enter yes or no maybe
Please enter yes or no YeS
Out[4]:
'yes'

Exercise on reading a file

The file january06.txt contains data from the UTM weather station for January 2006. Download it from the C4M website to your local machine and put it in the same directory as where Pyzo is storing your programs. Figuring out where to store the files or how to specify the paths to your file is half the battle!

  1. Open it up in Pyzo to see what it looks like.

  2. Write a Python program to open the file and read only the first line

  3. Read the second line (this is still a header)

  4. Read the third line into a variable line.

  5. What is the type of line?

  6. Call the method split() on line and save the return value. What is the type that is returned by this method?

  7. Look up the method split() in the Python 3 documentation.

In [13]:
f = open('../january06.txt')
f.readline()  # notice that I didn't bother to save the returned string from readline()
f.readline()
line = f.readline()   # this time I saved the string because I want to use it
print(type(line))
print(line)
result = line.split()
print(result)
<class 'str'>
2005	355	1100	-3.377	78.4	5.128	268	0	113.8

['2005', '355', '1100', '-3.377', '78.4', '5.128', '268', '0', '113.8']

Some more questions and steps to do:

  1. What is the type of result?
  2. What is the type of the elements of result?
  3. Which element is the temperature?

  4. Write a program that:

    1. opens the file january06.txt
    2. reads in the header and ignores it
    3. uses a loop to read in all the rest of the lines one by one
    4. prints out only the day and the temperature from each line
  5. Run your program and make sure it works. Once it works, show a TA or instructor

  6. Now change your program so that it doesn't print any temperatures but it looks at each temperature and keeps the warmest temperature it has seen so far and the corresponding day and time. When the whole file has been read, print the day and time of the coldest reading in the file. Be careful. You want to convert the values to integers before you compare them. The string '11' < '2' but 11 > 2.