Home
About
Projects
TA Office hours
Study Guide
CSC2515 Grad Project

CSC411/2515 Project 4 bonus: Tic-Tac-Toe with Self-Play

For this project, you will continue training an agent to play Tic-Tac-Toe using Policy Gradient.

Starter Code.

You should first complete Project 4, and use your code from Project 4 as starter code. You will need to use PyTorch 0.3+ for this project (you can use torch.__version__ to see which version you are using).

Part 1: X and O (up to 0.5 marks toward your final grade)

For project 4, the problem was set up so that the machine learning agent is always the first player, and the random agent is the second player. Modify the code so that during training, the machine learning agent plays the first move 50% of the times, and plays the second move 50% of the times.

Train your agent, starting with random weights. Plot a training curve like you had in project 4, along with the win rate across episodes if the ML agent is the first player and the win rate across episodes if the ML agent is the second player.

Is this is an easier or harder problem than Project 4? How well does your agent perform? Include 5 games that your final trained agent plays against a random player.

Part 2: Self-Play (up to 0.5 marks toward your final grade)

Starting with a (best) set of weights in Part 1, train your tic-tac-toe agent by playing against a version of itself (instead of a random player).

Plot a training curve like you had in project 4. Note that the training curve here is not going to be as interpretable, since the opponent is improving at the same time as the agent. Plot the win rate against a random player with the ML agent as the first player, and with the ML agent as the second player (two plots).

How well does your agent perform? Include 5 games that your final trained agent plays against a random player, and 5 games that your final trained agent plays against itself.

What to submit

The project should be implemented using Python 2 or 3 and should be runnable on the CS Teaching Labs computers. Your report should be in PDF format. You should use LaTeX to generate the report, and submit the .tex file as well. A sample template is on the course website. You will submit at least the following files: tictactoe_bonus.py, tictactoe_bonus.tex, and tictactoe_bonus.pdf. You may submit more files as well. You may submit ipynb files in place of py files.

Reproducibility counts! We should be able to obtain all the graphs and figures in your report by running your code. Submissions that are not reproducible will not receive full marks. If your graphs/reported numbers cannot be reproduced by running the code, you may be docked up to 20%. (Of course, if the code is simply incomplete, you may lose even more.) Suggestion: if you are using randomness anywhere, use numpy.random.seed().

You must use LaTeX to generate the report. LaTeX is the tool used to generate virtually all technical reports and research papers in machine learning, and students report that after they get used to writing reports in LaTeX, they start using LaTeX for all their course reports. In addition, using LaTeX facilitates the production of reproducible results.

Available code

You are free to use any of the code available from the CSC411 course website.

Readability

Readability counts! If your code isn’t readable or your report doesn’t make sense, they are not that useful. In addition, the TA can’t read them. You will lose marks for those things.

Academic integrity

It is perfectly fine to discuss general ideas with other people, if you acknowledge ideas in your report that are not your own. However, you must not look at other people’s code, or show your code to other people, and you must not look at other people’s reports and derivations, or show your report and derivations to other people. All of those things are academic offences.