Lab 10: Disease Propagation

This lab is part of Assignment 5. You are to complete the exercises here before arriving in lab on Monday/Tuesday. In lab, you and your partner will demonstrate your code to your TA, and get feedback. Your revised code is due on April 10 at noon.

In this assignment, we will be modeling how sexually transmitted infections propagate through populations. In epidemiology, the study of disease distribution, it is often useful to think of sexually active people as a graph. (Indeed, Wikipedia notes that "In a surprising result, mathematical models predict that the sexual network graph for the human race appears to have a single giant component that indirectly links almost all people who have had more than one sexual partner, and a great many of those who have had only one sexual partner (if their one sexual partner was themselves part of the giant component). Most people who are not part of the giant component are either virgins, or couples who have never had sex with anyone except each other.") Indeed, one such model of high school students found more than a quarter of the school was connected!

As always, your code must comply with the CS 190 Style Specifications, and work on the ECF machines. The marking scheme for Assignment 5 is available here.

Part One: Modelling Recovery (Chlamydia, etc)

  1. Download test_bacterial.c and test_bacterial.txt, the expected output of test_bacterial.c.
  2. Now, let's try adding a new element to the simulation: recovery. Infected people, after some time, will recover and become susceptible again. This is called a SIS model of disease.
    S <-> I

    Chlamydia, gonhorrea, and other bacterial STIs can be modelled this way. Let's start by simulating that a person stays in the Infected category for two time steps, and has a 100% chance of transmitting the disease. Note that transmission should happen before changing an I to an S. Create a method, propagate_with_recovery, which simulates the SIS model.

               and so on...

    This simulation will result in an equilibrium where some people are infected, and some are infectious. (The common cold also behaves similarly; it never completely goes away from a population, and never totally takes it over.)

    Note that when a person is infected, we will only keep track of how long they have been infected since their initial infection. If they are reinfected, the amount of time they have been infected for should not increase. When a person recovers, they will be temporarily be immune for one time step.

    Like propagate_n_times, this should also print the statistics about our population at every time step, along with the time step. See test_bacterial.txt for examples. Remember to make the output conform to the format EXACTLY (use diff to be certain!)

  3. As usual, you should ensure there are no memory leaks in your code, nor any memory errors. You should also diff your output with the test_bacterial.txt that is provided.


Part Two: Modelling Death (HIV/AIDS)

  1. Like the previous parts: Download test_viral.c and test_viral.txt, the expected output of test_viral.c.

  2. While all the STIs we have modeled so far have been non-fatal, AIDS is currently a fatal disease. While medical advances have extended the lives of infected individuals, we currently lack a cure or vaccine. Create a method, propagate_with_death for this step.

    Modify our SIS model so that a person stays in the Infected category for two time steps, and then dies. A dead individual should have the state 'D', and all connections to and from them should be deleted. (Note that your graph statistics should only include living individuals!). Like previous propagate methods, the stats should be printed out at each iteration. For this method, you should also print a mortality rate: how many members of our population have died. See test_viral.txt for examples of how your code should look.

    Again, let's start with 100% transmission:

     
    Fortunately, HIV does not transmit with 100% probability! (Like most STIs, condom use reduces the chance of transmission!) Try the simulation with a 10% chance of transmission.

    Aside: In reality, the chance a person transmits the disease is not uniform. For HIV/AIDS, the chance of transmission is much smaller in populations practicing safe sex than in populations practicing risky sex (or sharing needles). The transmission of HIV/AIDS is linked to viral load, which is high at first infection, and reduces over time -- but spikes when the individual is reinfected. For more information on HIV/AIDS epidemiology, check out Elizabeth Pisani's TED talk, "Sex, Drugs and HIV -- let's get rational". Due to the large variation in how HIV/AIDS is transmitted through disparate populations, HIV/AIDS is a difficult disease to model properly!

  3. As usual, you should ensure you have no memory leaks, and diff your output with the provided output.


Part Three: Modelling Vaccination (HPV)

  1. Like the previous parts: Download test_vaccination.c and test_vaccination.txt (which is the expected output of test_vaccination.c).

  2. Before a vaccine was created for HPV types 6 and 11 (the most common causes of genital warts), these STIs could be modeled with our SIS model. However, now we have people who are immune to infection. For this step, we will be creating a simulation involving vaccination (simulating disease immunity).

    Start by creating a method, vaccinate_person which vaccinates a single person (given as a parameter). A vaccinated person has the status 'V' and cannot be infected.

    Then, create a method, vaccinate_n_people which will vaccinate n random susceptible people. (To keep disease propagation simple, we won't be vaccinating infected people.)

    Next, we want to know if our population now has herd immunity -- i.e. it can withstand any person getting infected. For our purposes, we have herd immunity if and only if no two susceptible individuals share a connection. Create a method has_herd_immunity that will return 1 if we have herd immunity, else 0.

    Create a method, vaccinate_randomly, which keeps vaccinating random susceptible people in our graph until we have herd immunity. It should return the number of people we had to immunize to get herd immunity.
    In our example, the smallest fraction of the population we need to vaccinate is 50%. But since we are randomly vaccinating people, we actually need to vaccinate 75% of the population. If for instance, only Persons #2 and #1 are vaccinated, we do not have herd immunity (#0 and #3 are connected, and they both begin in state S.)


    Every possibility in which 3 of the 4 people are vaccinated results in the disease being contained. But not every possibility in which 2 of the 4 people are vaccinated results in the disease being contained! Hence, we need to vaccinate at least 3 of the 4 people.


  3. Like we did in lab 9, we will also profile our code for this part. Compile test_vaccination with -pg, and look at the gprof data for your executable. Compare it to test_vacc_profile.txt. Save your profiling data as vaccination_profile.txt. Your goal here is to have a total s/call for vaccinate_randomly that is less than or equal to 9.69 s when timed on remote.ecf.utoronto.ca.

    You will get an additional point during automarking if your total s/call for vaccinate_randomly is less than or equal to 4.15 s. (E.g. test_vacc_profile_fast.txt)

Part Four: Minimal Vaccination

  1. Like the previous parts: Download test_minimal.c and test_minimal.txt, the expected output of test_minimal.c.

  2. In Part Three, we vaccinated populations randomly until they got herd immunity. This tends to need >90% of the population. But, if we vaccinate the right people, we can accomplish this with far fewer people. For example, in tiny_population all we needed was 50% of the population, if we got the right people!

    Create a method, vaccinate_minimal which vaccinates the minimal set of people, and returns the minimum number of people needed to gain herd immunity. Like vaccinate_randomly, the method should result in the population being vaccinated -- in this case, using the minimum number of people. (You may find it helpful to write a method, devaccinate, which resets your population to a non-vaccinated state.)

    In graph theory, we refer to this problem as finding a "minimum vertex cover". This is a NP-Complete problem -- a brute-force solution is recommended for this part!

Part Five: Bonus!

  1. Bonus part! Download test_bonus_large.c and test_bonus_huge.c. This runs vaccinate_minimal on our large and huge populations, respectively. Bonus marks are available for finding the minimum vertex covers of these populations. Since this is an NP-Complete problem, bonus marks are available for good approximations. Up to ten bonus marks are available.

    Bonus marks are available as such:

  2. Large Population

    For the large population, if you produce a solution that is 80% accurate you will get one bonus mark (e.g. your code comes up with a vertex cover of 43 or less; Elizabeth's best approximation of the minimum vertex cover is 35.) If you produce 35 (or better!) you will get a second bonus mark. If your code manages to do find a solution that is at least 80% accurate in under one minute, you will get another bonus mark. Note that we will kill your process after 10 minutes; your code must run in less than 10 minutes on remote.ecf.utoronto.ca.

    We will only mark your large_population if you hand in a file called mark_my_large_pop.txt. The file should be empty.

  3. Huge Population

    For the huge population, a total of *seven* more bonus marks are available. For accuracy:
    • 1 bonus mark for 80% accurate
    • 2 bonus marks for 85% accurate
    • 3 bonus marks for 90% accurate
    • 4 bonus marks for 95% accurate
    • 5 bonus marks for 100% accurate

    As for speed, we will kill your process after 20 minutes. If your program runs in less than 10 minutes and is at least 80% accurate, you will get yet another bonus mark. If your program runs in less than 5 minutes and is at least 80% accurate, you will get a final bonus mark. (And if your program runs in less than a minute and is 100% accurate you may be eligible for one million dollars.)

    We will only mark your huge_population if you hand in a file called mark_my_huge_pop.txt. The file should be empty.

  4. Some hints: a brute force solution will not work here; you will have to think about how to prune down the possible cases. Try pruning down a bit at a time, rather than all at once. Remember: you want to get working code first, and then optimize once it's working. Also note that you don't necessarily need to calculate all the possible cases upfront: you can do this as you go. I recommend that you work with large_population first, rather than jumping immediately to huge_population.

    It is also worth noting that your vaccinate_minimal can have multiple cases depending on the size of your input. For example, if n is less than 50, you may want to use your brute force solution from Part Four; for n <= 64 (large population) you may want to use a different approach entirely; for n > 64 (huge population) you may want an even different approach. Note that the input files we will be automarking you with will be similar to large_population.txt and huge_population (n=64 and n=512), but will be different.

Waiting to Talk to Your TA?

Get started on Report 5!