Lecture Notes: Version Control

Version 1.0.1

Learning Objectives

From today’s topic, students are anticipated to be able to:

  • use git on their own computer (locally).
  • connect between a local git repository and that repository on GitHub, using RStudio.
  • make commits in git using RStudio.
  • make a branch in git using RStudio or GitHub.
  • use collaborative GitHub features such as Issues and pull requests.

After this class, you should be able to start working on your Collaborative Project.

Resources

Today’s class is a digest of the following resources:

Video lectures:

Tutorials:

Some additional resources that you might find useful:

Test Your Understanding

  1. You’ve just successfully pushed your branch to GitHub! True or False: it’s still possible for the branch on GitHub to contain work that’s not on your computer.
  2. You’re ready to push some code you added to code.R to GitHub, but just found out that your teammate completely changed code.R on GitHub! True or False: If you pull the repository from GitHub, you’ll get an error because your code.R conflicts with GitHub’s code.R.
  3. True or False: If you’ve just changed a file on your computer, in order to push it to GitHub, you’ll first need to commit the file.

Instructor demo: Get Acquainted with GitHub

Repositories, Organizations, and Personal Accounts

A repository stores files and the history of the files; the usual convention is to use a single repository to organize a single project.

Github is a place where repositories can live online. Being online provides us a way to share and collaborate on projects. It also serves as a backup for your project.

Example 1: Lucy’s clusterpval R package

Example 2: The STAT 545 webpage

The first repository lives under Lucy’s personal Github account. The second repository lives under the UBC-STAT organization. Organizations are useful if you have lots of different projects with a common theme which lots of people are collaborating on.

Useful Github tips for the course

  • All of your projects in this class will live in the STAT 545 @ UBC Organization.

  • When you watch a Github repo, you get notifications when things happen in them. So if you “Watch” the STAT 545 webpage repo, then you will get email notifications when I update the site!

  • The Issues page on a Github repo is a forum where Github users can bring up issues related to the repository. Our commmunication guidelines suggest that you post an Issue on your homework repo if you have concerns about your grade.


Your turn: working on Github

Try making your own Github repository and editing it on Github! (This exercise is slightly adapted from Data Carpentries.)

  1. Go to your GitHub profile. Eg. Mine is https://github.com/lucylgao.

  2. Create a new GitHub repository by clicking the + symbol in the top bar and using the dropdown menu to create a new repository.

  3. Name your repository stat-545-demo-YOUR-NAME. In the description write “STAT 545 Demo”. Check the box for initializing the repository for adding a README file.

  4. You are now redirected to the repository main page. The repository page tells you you have 1 commit. Click on it to get to the history page. This tells you all the changes that have been tracked for the files in the repository so far.

  5. Go back to your repository main page. Click on README.md, then click “edit this file”. Add the following information into the README.md file:

    • Your name
    • What kind of scientist do you tell people you are at dinner parties?
  6. Commit your changes: click the commit changes button, and briefly summarize the changes you’ve made in the Commit message.

  7. Check out the Github commit history again. Has anything changed?


Instructor demo: working locally, and synchronizing with Github

We will go through the Data Carpentries tutorial for this together. This demonstrates how to keep and work on the project files in a local repository on your machine, and how to keep it in synch with a remote Github repository.

Instructor demo: merge conflicts

Merge conflicts happen when we’ve created multiple versions of files that can’t be obviously combined into one definitive version.

Here is an example of something that would not cause a merge conflict:

  • At 9am, my TA pulls from the STAT 545 repository, makes a local change to the course dashboard, and commits and pushes her changes.
  • At 10am, I forget to pull from the STAT 545 repository, and start working on the Day 1 notes locally.
  • When I commit and push, Git is a bit confused, because I wasn’t working off of the “freshest” version of the STAT 545 repository - but since my TA and I were working on different lines of code, it will fairly seamlessly figure out that the right thing to do is to add my changes to the Day 1 notes to the current version of the STAT 545 repository online.

Here is an example of something that WOULD cause a merge conflict:

  • At 9am, my TA and I both pull from the STAT 545 repository.
  • At 10am, my TA changes the front page to say “STAT 555 @ UBC”, and commits/pushes those changes.
  • At 11am, I change the front page (without pulling!!!) to say “STAT 777 @ UBC”.
  • When I commit and push, Git doesn’t know what to do. Should it make the version that says “STAT 555 @ UBC” or “STAT 777 @ UBC” the definitive version? The push will fail, and Git will tell us we need to fix the conflict and then commit the result.

How do we fix this?

  1. Pull.

  2. Open the file that caused the merge conflict. You should see something like this:

<<<<<<< HEAD
STAT 555 @ UBC
=======
STAT 777 @ UBC
>>>>>>> 526363991d21ed20e7e0c57b5e99d944ac5ce5aa
  1. The stuff below the <<<<<<< and above the ======= is what was in your local version; the stuff above the >>>>>>> and below the ======= is what was in the remote conflicting version. Decide what you want to have on the offending line (e.g. “STAT 555 @ UBC), and replace the whole block of text above with that.

  2. Save and commit the file. (An informative message here might be “Fixing a merge conflict.”) You should now be able to successfully push!


Your turn: Branching, merging, and pull requests

Find a partner. In this exercise, we will learn how to have different team members save their work separately on branches, and how to merge those changes to the main project branch.

  1. Teammate A adds Teammate B to their stat545-demo repository as a collaborator. (Go to Settings from the main repo page, then Access => Collaborators). This should send Teammate B an email with a collaboration invitation; accept that invitation.

  2. Teammate B clones Teammate A’s stat545-demo repository.

  3. On their own computer, Teammate B will make a new branch in Teammate A’s stat545-demo repository. You can do this either on Github by clicking on “1 branch” on the repo homepage then the green “New branch” button, or on your own computer in R Studio with the “New Branch” button.

  4. On their own computer, Teammate B will create a new file in the newly created branch, then commit and push it.

  5. Teammate B will start a pull request (basically a request to merge content onto the main branch) on GitHub, by going to “Pull Requests” -> “New Pull Request”, and selecting the branch you intend to merge to the main branch. In this pull request, include a comment and title indicating (at a high level) what the change made is about.

  6. Teammate A will follow the instructions here to merge the pull request.

Too easy? Then either switch roles, or try creating a pull request that causes a merge conflict and resolving it!