## Parallel lists

Let's write a function that compares two lists to see if they are the same

In [1]:
def lists_equal(L1, L2):
'''Return True iff L1 and L2 have the same contents

Arguments:
L1, L2 -- lists of integers
'''
#Need to check whether the two lists are the same length first,
#since otherwise we'll get an out-of-range error in the for-loop
if len(L1) != len(L2):
return False

for i in range(len(L1)):
if L1[i] != L2[i]:
return False

return True



## P1: parallel lists

We store patient data in two lists of the same length. For example:

    sex_data =             ["m", "f", "f", "m", "m", "f"]
ward_data =            [  1,   3,   2,   2,   1,   2]
length_of_stay_data =  [ 10,   5,   7,   2,   3,   4]

Write a function that computes the average length of stay in the hospital for a given sex, in a given ward. The function signature is

    def avg_stay(sex, ward, sex_data, ward_data, length_of_stay_data):
'''Return the average length of stay for patients of sex sex in ward ward

Arguments:
sex_data -- a list containing N strings that correspond to patients' sex ("m" or "f")
ward_data -- a list of N ints that correspond to patients' ward
length_of_stay_data -- a list of N ints that correspond to patients' legnths of
stay
'''

### P1: solution

In [4]:
def avg_stay(sex, ward, sex_data, ward_data, length_of_stay_data):
'''Return the average length of stay for patients of sex sex in ward ward

Arguments:
sex_data -- a list containing N strings that correspond to patients' sex ("m" or "f")
ward_data -- a list of N ints that correspond to patients' ward
length_of_stay_data -- a list of N ints that correspond to patients' legnths of
stay
'''

s = 0
count = 0
for i in range(len(length_of_stay_data)):
if sex_data[i] == sex and ward_data[i] == ward:
s += length_of_stay_data[i]
count += 1

return s/count


## P2: matching strings

Write a function that returns True iff string s1 starts with string s2. For example,

starts_with("abc", "ab") should return True

starts_with("ad", "ab") should return False

### P2: solution

In [5]:
def starts_with(s1, s2):
'''Return True iff the string s1 starts with the string s2

Arguments:
s1, s2 -- strings
'''
if len(s2) > len(s1):
return False

for i in range(len(s2)):
if s1[i] != s2[i]:
return False

return True


## P3: Standard deviation/looping over the list more than once

Recall that you can estimate the extend to which the data is spread by computing the standard deviation of the data.

The standard deviation of ${x}_{1},{x}_{2},...,{x}_{n}$$x_1, x_2, ..., x_n$ can be estimated as $\stackrel{^}{\sigma \left(x\right)}=\sqrt{\frac{\sum _{i}^{n}\left({x}_{i}-\overline{x}\right)}{n-1}}$$\hat{\sigma(x)} = \sqrt{\frac{\sum_{i}^n (x_i-\bar{x})}{n-1}}$. Write a function to compute the standard deviation of the length of stay in a given ward.

### P3: solution

In [9]:
def avg_stay_ward(ward, ward_data, length_of_stay_data):
'''Return the average length of stay in ward ward
Arguments:
ward_data -- a list of N ints that correspond to patients' ward
length_of_stay_data -- a list of N ints that correspond to patients' legnths of
stay

'''
s = 0
count = 0
for i in range(len(length_of_stay_data)):
if ward[i] == ward:
s += length_of_stay
count += 1

return s/count

def sd_stay(ward, ward_data, length_of_stay_data):
'''Return the estimate of the sd of the length of stay for patients in ward ward

Arguments:
ward_data -- a list of N ints that correspond to patients' ward
length_of_stay_data -- a list of N ints that correspond to patients' legnths of
stay
'''

s_sq_diff = 0
count = 0
avg_stay = avg_stay_ward(ward, ward_data, length_of_stay_data)

for i in range(len(length_of_stay_data)):
if ward_data[i] == ward:
s_sq_diff += (ward_data[i] - avg_stay)**2
count += 1

return (s_sq_diff/count)**.5


## P4: Matching DNA subsequences with nested loops

Write a function that returns True iff a DNA subsequence matches a DNA sequences somewhere. Both the subsequence and the sequence are represented as strings. For example, if

    seq = "CGGGGAATAGCCCCC"
subseq = "AATA",

then match_subseq(seq, subseq) should return True since you can match subseq to seq, but if

    seq = "CGGGTCGGGCGC"
subseq = "AAA"


then match_subseq(seq, subseq) should return False.

Hint: think of what a useful helper function would be that's similar to what we already wrote.

### P4: Solution

In []:
def match_subseq_to_subseq(seq, subseq, start_i):
'''Return True iff the subsequence of seq that starts at start_i
and is of the same length as subseq is equal to subseq

Arugments:
start_i -- the starting index in seq, an integer
seq, subseq -- two strings. seq is no shorter than start_i+len(subseq)
'''
for i in range(len(subseq)):
if subseq[i] != seq[start_i + i]:
return False

return True

def match_subseq(seq, subseq):
'''Return True iff the subsequence seq matches a the sequence seq
at some index

Arguments:
seq, subseq -- sequences of DNA bases, represented as a string consisting
of the characters "A", "T", "G", "C"
'''

for i in range(len(seq)-len(subseq)):
if match_subseq_to_subseq(seq, subseq, i):
return True

return False