Using Git Effectively

Hello! If you're on this webpage, you're likely part of the Students Developing Software team or working on another git-based project with me. Because I work with students with a broad range of experience with git and GitHub, including none at all, I've put together this page as a reference of processes and tips for using git effectively for your work. It's not meant to be comprehensive—see the Resources section for more information. Rather, you should use this page when first starting out to make sure you have git set up, and then consult this page periodically when you have questions. I have attempted to provide some tips and solutions to some common problems that arise.

Important warning: all instructions in this guide use the terminal, a text-based way of interacting with your computer, and with git and GitHub in particular. There are graphical ways of using git/GitHub (e.g., PyCharm, GitHub Desktop), but you should NOT use use these tools for using git as part of SDS. Using the terminal can be a bit tricky at first, but forces you to learn exactly what happens when you use git/GitHub, rather than hiding actions behind a graphical user interface. Get used to using these commands, and you'll go a long way to mastering git and GitHub!

Getting started

  1. Install git.

  2. Create a GitHub account.

  3. Open a terminal (or Git bash on Windows), and set the following configuration settings (replace the generic name and email with your GitHub info):

    $ git config --global user.name "Test User"
    $ git config --global user.email "email@example.com"
    
  4. Find your assigned SDS project on GitHub. (This can be done with a quick online search.)

  5. Work through the Fork A Repo GitHub guide, replacing their octocat/Spoon-Knife repository with your project's repository. Make sure that after you've done the final step, you have both an "upstream" and "origin" remote.

    $ git remote -v
    > origin    https://github.com/YOUR-USERNAME/YOUR-FORK.git (fetch)
    > origin    https://github.com/YOUR-USERNAME/YOUR-FORK.git (push)
    > upstream  https://github.com/ORIGINAL-OWNER/ORIGINAL-REPOSITORY.git (fetch)
    > upstream  https://github.com/ORIGINAL-OWNER/ORIGINAL-REPOSITORY.git (push)
    
  6. Run the following command in your project repository to set your master branch to track changes from the "upstream" (i.e., main project) repository:

    $ git fetch upstream
    $ git branch -u upstream/master
    

    Throughout your time contributing to your project, your master branch should be an up-to-date copy of the upstream/master branch. You'll periodically "synchronize" this master branch with upstream/master to obtain new changes to the project made by other students.

  7. Finally, send your GitHub username to your project supervisor (usually me) so that you can be added as a developer to the organization!

A deeper look

After you have completed the above steps, there are three different copies of the project code that you have access to.

  1. The GitHub repository which you forked from. This is the version of the repository that is managed by your supervisor, and is the "definitive" version of the code. Anyone can access and read the files in the repository (which is why you were able to fork it in the first place), but only the project supervisor can modify the repository. If you followed the GitHub tutorial correctly, this repository will be named upstream when you type git remote -v.

  2. The GitHub repository which you created through a fork. This is your public version of the repository. Everyone can access it on GitHub, but only you are allowed to modify it. Use this repository to share your work with the outside world, including when you want to submit a change to the "upstream" repository. If you followed the GitHub tutorial correctly, this repository will be named origin when you type git remote -v.

  3. Your local copy of the repository. This is the one which consists of the files stored on your computer, not on a GitHub server. You have total control over this repository (it's on your machine, after all), but it is private to you, so no one else can see any changes you make here. This is what you obtained when you did the git clone command in the repository.

Making a change: basic workflow

After you have a copy of the project on your computer, you can start making changes. Most of your changes will be small and follow the same basic workflow, which we describe here.

Setting up

Note that you should do these steps before making any changes to the code!

  1. Switch to your master branch: git checkout master.
  2. Make sure your master branch is up to date with the definitive repo's master branch: git pull upstream master.

    Note that if you correctly switched your branch to track upstream/master (following the Getting started instructions), you can just run git pull instead.

  3. Create and switch to a new branch to do your work on: git checkout -b <branch-name>.

    If you are working on a fix for a particular issue, name the branch issue-<X>, replacing <X> with the issue number. Otherwise, pick a short descriptive name, with all lowercase words separated-by-hyphens. We call this branch a feature branch.

Committing your work

At this point (and only at this point) are you ready to make changes to your code. Git uses commits to record a logical change in the code. Git commits are quite flexible, and it will be up to you to decide what exactly constitutes a "logical change". The entirety of the change you make for a task might just be a single commit, or it might be several. Use your judgement when deciding when to commit, and try to err on the side of smaller commits.

With git, all commits you make are local to your computer, and are not sent to your public fork. This means that you don't need to worry too much about messing up a commit; even if you do, it won't be public, or affect anyone else! And it is possible to go back and edit your commit history later, though this is more advanced.

When you want to make a commit:

  1. Make sure your files have been saved (you don't want to commit stale changes, after all).
  2. Run git status. This should show you all of the files in your repo which you have modified, as well as which files are new or have been removed, since the last commit. Pay attention to this output -- if there are any files which you didn't expect to see, that should be investigated. All changes at this point are unstaged, meaning git has noticed a change, but is not prepared to commit it.
  3. Run git diff. This lets you see all of the changes you have made since the last commit. This can be used as a final quality control check (think of the usual English meaning of the word "commit"). I often catch extraneous debugging statements or accidental whitespace changes here, and fix them before the actual commit.
  4. Run git add <file1> [<file2> ...] for any files that have been modified/created that you want to commit, and similarly run git rm <file1> [<file2> ...] for files that you've deleted.

    Tips:

    • If you want to add all files in a folder, you can run git add <folder> rather than listing each file separately.
    • You can use glob syntax to specify patterns of files to commit. For example, git add *.py will add all .py files in the current directory to your commit.
    • Run git status again. You should now see a list of files that have staged changes, which are changes that have been marked for committing. Once again, this is a good check to make sure you are only committing the changes you want.
    • Finally, run git commit -m "type your commit message here". Your commit message should be a short but descriptive message describing the purpose of your commit.

    About pre-commit hooks: All SDS projects use pre-commit hooks to run code checks on changes on each commit. The first time you make a commit you'll see a message saying that the pre-commit checks are being installed, which will take a bit of time. Most of the checks will make automatic changes to your files (fixing style errors), but occasionally they will report issues that can't be fixed automatically, and you'll need to fix them manually.

    If the pre-commit checks report any issues, including when all of the issues have been fixed automatically, git will not actually commit your changes. If this happens, first make sure all of the issues have been fixed, and then repeat Steps 2--5 to add these fixes to your staged changes and then commit them. 7. After making your commit, run a git status to check that your changes no longer appear as either staged or unstaged changes. That's because they're part of the git commit history now! If you now run git log, you should see the commit you just made as the most recent commit. You can make further changes, repeating the above steps every time you want to make a commit.

Adding yourself as a contributor

If you are making your first contribution to the project, please also add yourself to the list of the project's contributors. This helps us keep a record of everyone who's contributed to each project! Each SDS project has a list of contributors in the repository, though you might need to do a bit of searching to find it. Make sure to respect the existing order of contributors when adding your own name.

Note about selecting files to commit

There are some shortcuts you can take to save yourself some typing. You can list files, directories, and glob patterns as arguments to git commit rather than running git add/rm first. This is probably good enough for most times when you'll want to commit.

You may also have heard about the -a flag to git commit. This causes the commit to automatically include all modified and deleted files in the commit. Use this flag with caution. It is very easy to overlook files that you created by haven't added, or accidentally commit files that you don't actually want to commit.

Sharing your changes

Because commits are local to your repository, you need to take some extra steps to share your changes with others. When you are ready to receive feedback on your work, do the following.

  1. Do a git status to check and make sure that you have no more changes left to commit.
  2. Update your branch with the latest version from master: git pull upstream master. It is important to make sure that your work is compatible with other updates to master which might have happened since you started working on your changes.
  3. Push your changes to your branch: git push origin <branch-name>. Note that <branch-name> should be the same as what you named your local branch, and that you should use origin (your fork) rather than upstream.
  4. Visit your repository webpage on GitHub, and click on "New Pull Request." Make sure you have selected the correct fork and branches: the base branch should be the definitive master branch, and compare should be the new branch where you did your work.

    Each SDS project uses a GitHub Pull Request Template to help you write informative pull request descriptions. Follow the instructions in the comments of the template to fill out each section.

    Notes:

    • GitHub uses [ ] to display a checkbox after you create the pull request; we uses these to indicate the type of change you're making, and in the "Checklist" at the bottom of the pull request template. You can turn these checkboxes into a "checked" state by replacing the [ ] with [x], or by clicking on the checkbox manually after creating your pull request.
    • If your pull request resolves a GitHub issue (you will know if this is the case based on the task that was assigned to you), use a GitHub closing keyword to link your pull request to the issue. Then when your pull request is merged in,
  5. Before creating the pull request, carefully review your file changes (by scrolling down below the pull request description). This may seem redundant because you'll be seeing all the changes you made, but it serves as a final check before you request that others review your work. If you find some things you want to change, cancel the pull request, and update your feature branch (see below).

  6. Then, create your pull request!
  7. Wait until the continuous integration checks pass (see the status at the bottom of the pull request page). If the checks do not all pass, you'll need to click on "Details" to investigate why—I encourage you to ask other students about this.
  8. After you have performed one last self-review of your code an ensured that all of the checks pass, request a review from me (david-yz-liu). That will send me a notification that your work is ready for review.

After making these changes, you can start working on new tasks, but remember to start back at the very beginning of this guide (right at Setting up).

Modifying your fork

You may want to modify the code you pushed to your fork, either when reviewing your work before making a pull request, or after receiving feedback on a pull request. To do this, simply make new commits, and do another git push origin <branch-name>; the branch will be updated according to the new commits. If you have already made a pull request, it will update automatically with the new changes, so there is no need to make a new one.

Common questions/issues

I accidentally committed my changes to my master branch rather than a feature branch!

  1. First, create a new branch (following the naming guidelines described above):

    $ git checkout -b <branch-name>
    
  2. Then, switch back to your master branch.

    $ git checkout master
    
  3. The above two steps ensure that your commits have been saved in a new feature branch. This last step is to reset your master branch to be an exact copy of the upstream master branch:

    $ git reset upstream/master
    

To check your work, run git log and verify that your commits are no longer on your master branch. Afterwards, you may want to do the following:

  • Run git pull upstream master on both your master and feature branches to update them with the latest updates.
  • Switch back to your feature branch and make a pull request.

I was working on two different issues, but accidentally created my second feature branch off of the first one rather than off of my master branch (and made commits)!

When this happens, your second feature branch will contain the commits from both the first and second issue. This can be a bit tricky to resolve, depending on the complexity of the commits you've made, so we've provided a few different options.

  • Option 1: use git cherry-pick (works best when the second feature has a small number of commits).

    To do so, you should first do a git log to determine hashes of the [commit(s)] that belong to the second issue you're working on, and record them.

    Then, switch back to master, then create a new branch off of master; this branch will be your corrected "Issue 2" branch. Then use git cherry-pick to "bring over" the commits for just the second issue onto your new branch.

  • Option 2: use git interactive rebase (git rebase -i).

    Git rebase is a powerful tool for rewriting the current branch's commit history, and in particular can be used to remove certain commits from the branch. Be warned, however, that rebase is a more advanced git feature, so if you are trying it out for the first time, I encourage you to first switch to a new branch so that any changes you make to this branch won't affect the original branch for the second feature.

  • Option 3: use git reset master to the commit history back to the master branch, but preserve the current state of all files. After doing this option, your repository should be in a state as if you've made changes for both issues you're working on, but have not committed any changes. (Do a git status to check this.)

    Then, you can "undo" the changes you made for the first issue (e.g., using git restore), and then commit just the changes for the second issue.

Tips

This section includes a few tips to improve your git workflow. While not strictly necessary, I recommend going through this section at the start of the semester.

Using SSH for authentication

You can set up your GitHub account to use ssh keys instead of passwords for authentication. The following GitHub guides cover how to do this:

Useful git commands

Below are some git commands that students often find useful. I recommend running git status before and after trying one of these commands to help you keep track of what's going on.

  • View the changes you have made on branch issue-1234 (git diff documentation):

    $ git diff --full-index master issue-1234
    
  • Temporarily undo local unstaged changes, saving them for later (git stash documentation):

    $ git stash
    
  • Redo previously-stashed changed:

    $ git stash pop
    
  • Undo unstaged changes made at a specific path (git restore documentation):

    $ git restore <path1> [<path2> ...]
    
  • Unstage changes made at a specific path:

    $ git restore --staged <path1> [<path2> ...]
    
  • Delete a local branch, typically after your pull request has been merged in (git branch documentation):

    git branch -D <branch-name>
    

Useful GitHub features

Recommended git configuration options

I recommend the following git configuration options to help simplify your workflow. If you are an experienced git user you may not wish to use (all of) these settings.

Based on this blog post.

  • pull.rebase false: ensure that you trigger a merge (instead of a "rebase") when running git pull.

    $ git config pull.rebase false
    
  • merge.conflictStyle zdiff3: use a more helpful algorithm for displaying merge conflicts.

    $ git config merge.conflictStyle zdiff3
    
  • push.default current and push.autoSetupRemote true: when running git push, automatically create a branch on your fork with the same name as your current local branch.

    $ git config push.default current
    $ git config push.autoSetupRemote true
    
  • rerere.enabled true: enable git rerere to help with resolving merge conflicts.

    $ git config --global rerere.enabled true
    
  • diff.algorithm histogram: use a more helpful algorithm for displaying git diffs.

    $ git config --global diff.algorithm histogram
    
  • transfer.fsckobjects true: detect malformed data when fetching/receiving data.

    $ git config transfer.fsckobjects true
    

Resources