Git and the Centralized Workflow

Lesson 3 with Ian Carroll


Centralized Workflow

As your research project moves from conception, through data collection, modeling and analysis, to publishing and other forms of dissemination, it’s components can fracture, lose their development history, and—worst of all—become conflicted or lost.

This lesson explains a high level strategy for organizing your collaborative workflow and introduces accompanying software and cloud solutions. This strategy for distributed work on a shared codebase—the centralized workflow—is widespread in collaborative research.

A central hub stores project files and their history. Researchers are spokes on the wheel, working on private copies of the project. Project integrity is maintained through rules enforced by the hub for synchronizing between hub and spokes.

Top of Section


Objectives for this lesson

Specific achievements

Top of Section


Git in the Shell

The namesake of GitHub is the command-line utility git. It performs the clone, push, pull, and merge procedures just mentioned, and many more.

The software has no GUI of it’s own, and works through commands always beginning with “git “ given in the shell. The comamnd to turn the “current folder” into a git repo is:

git init

Add files to git’s watchlist with the “add” command

git add <path>
git status

“Commit” updates the added files in a newly labeled version of your project’s history.

git commit -m "initial commit"
*** Please tell me who you are.

Run

  git config --global user.email "you@example.com"
  git config --global user.name "Your Name"

to set your account's default identity.
Omit --global to set the identity only in this repository.

fatal: empty ident name (for <(null)>) not allowed

Every commit needs an author. Follow git’s instructions, using a real email address so your commits can be associated with your GitHub account, and try again.

git commit -m "initial commit"
git status

Checkout the Log

Version control gives you access to the state of the repository at any previous commit. View this history in the log.

git log
commit <sha>
Author: <author>
Date:   <datetime>

    initial commit

Exercise 1

Edit your commited file with some small, breaking change. Create a second commit that includes this change, and make sure it shows up in the log.

Revert

Let’s investigate the most recent commit.

git show
commit <sha>
Author: <author>
Date:   <datetime>

    <message>

<diff>

The , or however many digits of it are needed, provides a unique label for each commit. Use "revert" to undo the changes introduced in a specified commit.

git revert --no-edit <sha>
[master <sha>] Revert <message>
 1 file changed, 1 insertion(+), 1 deletion(-)

Top of Section


A Plug for Reproducible Research

Reproducibility is a core tenent of the scientific method. Experiments are reported in sufficient detail for a skilled practitioner to duplicate the result.

The principle applies equally to modeling, analysis, and perhaps most of all to data synthesis.

Hallmarks of reproducible research:

Reviewable All details of the method used are easily accessible for peer review and community review.
Auditable Records exist to document how the methods and conclusions evolved, but may be private.
Replicable Given sufficient resources, a skilled practitioner could duplicate the research without any guesswork.
Open The orginator grants permissions for reuse and extension of the research products.

Let your workflow help achieve these same goals:

Thoroughly-comment scripts and share continusously with collaborators Reviewable
Maintain project history to correct mistakes when necessary Auditable
Provide “one-click” file & data sharing, of a streamlined analysis “pipeline” Replicable
Publically release on GitHub (or similar) with (implied) open licensing Open

Top of Section


What’s a GitHub?


Image by Atlassian / CC BY

The origin is the central copy of the project, a repository that lives on GitHub. Every member of the team uses a local copy of the entire project, called a clone.


Image by Atlassian / CC BY

Cloning is the initial pull of the entire project and all its history. In general, a worker pulls the work of other teammates from the origin when ready to incorporate their work, and she pushes updates to the origin when ready to share her own work.

A commit is a unit of work: any collection of changes to one or more files in the repository. A versioned project is like a tree of commits, although the current tree has just one branch. After a worker creates a clone, the local copy is viewing the same commit as the origin.


Image by Atlassian / CC BY

A pull, or initially a clone, applies commits copied from the origin to your local repo, syncing them up.


Image by Atlassian / CC BY

A pull, or initially a clone, applies commits copied from the origin to your local repo, syncing them up.


Image by Atlassian / CC BY

A push copies local commits to the origin and applies them remotely.


Image by Atlassian / CC BY

A push copies local commits to the origin and applies them remotely.


Image by Atlassian / CC BY

Top of Section


Create a GitHub Repository

  1. Sign in or create a GitHub account.

  2. Create a new repository on your GitHub page.

  1. Give the repo a name
  2. Add a short “tag line” to jog your memory
  3. Do not check the box or add anything

Empty repository

You have created an empty repository. The quick start information provides clues on how to see your first commits.

Configure your clone

To push and pull from your local repo to GitHub, you must configure your local repo with the URL of the remote repo. By convention, we call the central copy the “origin”.

git remote add origin <URL>

Push your commit up to the origin.

git push
Username for 'https://github.com': <username>
Password for 'https://<username>@github.com': 
Counting objects: <progress>
Delta compression using up to 4 threads.
Compressing objects: <progress>
Writing objects: <progress>
<stats>
remote: Resolving deltas: <progress>
To 'https://github.com/<username>/<repo>.git'
   <sha>..<sha>  master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.Counting objects: <progress>

Take a look at the repository on GitHub.

GitHub Editor

The online editor is good for quick-n-easy fixes, and for working on documentation. Its a bad place to modify code, because it’s not tested before reaching the origin. It’s great for creating a project README.

Exercise 2

Create a new file called “README.md” and add the following content on separate lines with a blank line in between.

  1. A title, preceded by # (the markdown “level 1” heading)
  2. A “About” section, preceded by ## (the markdown “level 2” heading)
  3. A “Contributors” section, preced by ##
  4. Your name, preceded by - (the markdown bulleted list)

As you go, utilize the Preview tab to see the result of rendering your Markdown to HTML.

Top of Section


Merging

An essential component of the centralized workflow is the ability to merge commit histories that have diverged. Each fork in the log has to be re-integrated, and git does this automatically through merging.

git add <path>
git commit -m 'feel the learn'
[master <sha>] feel the learn
 5 files changed, 955 insertions(+)

Merge commits most commonly arise when a commit shows up on GitHub that isn’t in your local clone. Such as the current situation.


Image by Atlassian / [CC BY]

Even though these changes do not conflict, GitHub won’t allow you to push. Take a moment to read the message, it gives a good explanation of what has happened.

git push
To https://github.com/<username>/<repo>.git
 ! [rejected]        master -> master (fetch first)
 error: failed to push some refs to 'https://github.com/<username>/<repo>.git'
 hint: Updates were rejected because the remote contains work that you do
 hint: not have locally. This is usually caused by another repository pushing
 hint: to the same ref. You may want to first integrate the remote changes
 hint: (e.g., 'git pull ...') before pushing again.
 hint: See the 'Note about fast-forwards' in 'git push --help' for details.

Merge Locally

The origin does not even attempt to reconcile diverging commit histories; it does not matter that the diverging commits affect separate files. In order to preserve the repo, the contributor is always responsible for “overseeing” the merge on a local clone.

Take the Hint!

git pull
remote: Counting objects: <progress>
remote: Compressing objects: <progress>
remote: <stats>
Unpacking objects: <progress>
From https://github.com/<username>/<repo>
   <sha>..<sha>  master     -> origin/master
   Auto-merging README.md
   Merge made by the 'recursive' strategy.
    README.md | 1 +
	 1 file changed, 1 insertion(+)

The message tells you about any changes made by this merge commit, which seamlessly integrates changes to the same file by multiple authors.

Top of Section


Working with Collaborators

True collaboration goes deeper than commenting on a final report, but integrated work on a project from start to finish raises workflow challenges.

Centralized workflows, managed by git, help solve these challenges.

Project Integrity

Note, version control works really well with text. Non-textual components of your project (e.g. large or binary data) rarely live in a repository. Use cloud storage for more static files and a database for dynamic records.

## Collaborators

- <your name>
- My Neighbor

Our aim is to let your project collaborator replace “My Neighbor” with his or her name.

Commit it with git

Before you can commit changes involving a new file, you have to tell the version control system (that’s git!) what changes to include.

git add README.md
git commit -m 'just me so far!'

Push

Look at the git status and notice that your branch is ahead of origin/master! Push those commit(s) to your GitHub repo.

Collaborate!

The first step to collaborative workflows is granting access to the origin of your project. Introduce yourself to your neighbor, and decide which of you will be the “owner” and which the “collaborator”. The owner will need the collaborator’s GitHub username.

Add your neighbor as a collaborator!

Exercise 3

As the collaborator on your neighbors repository, you have permission to edit his or her “README.md”. Make sure you accept the invitation to collaborate in your email!

The text below shows where you’ll see the owner’s name if you’re looking at the right (not your own). The collaborator should edit the file in the owner’s repo, by replacing “My Neighbor” with his or her own name.

## Collaborators

- <the owner's name>
- <your name>

Write a meaningful commit message while “saving” your work. Note that on the GitHub editor, there’s no distinction between save and commit. The owners should then pull the new commit into their local clone of the project.

Top of Section


Merge Conflicts

Diverging commits that do not affect the same files, or affect different lines within a file, can usually be merged automatically. If git cannot safely merge commits, it guides you through conflict resolution.

A “merge conflict” will arise when two contributors change a line of text. For example, if you both add a project description.

The owner adds a description under “# About” in the local clone. Meanwhile the collaborator adds a description under “# About” using the GitHub editor in the owner’s repository.

# About

...

The owner commits his or her change, but receives an error message from git when attempting to pull.

git pull
CONFLICT (content): Merge conflict in <path>
Automatic merge failed; fix conflicts and then commit the result.

Any conflicted region is fenced off in the named files, and must be manually tidied up.

<<<<<<< HEAD:master
 ...
=======
 ...
>>>>>>>

Follow all the instructions in the original message (or ask again with a git status):

git status
You have unmerged paths.
 (fix conflicts and run "git commit")
 
Unmerged paths:
 (use "git add <file>..." to mark resolution)

Exercise 4

Switch roles with your neighbor and repeat both Exercise 3 and the steps above to introduce and resolve a merge conflict.

Top of Section


Share and Contribute

Top of Section


If you need to catch-up before a section of code will work, just squish it's 🍅 to copy code above it into your clipboard. Then paste into your interpreter's console, run, and you'll be ready to start in on that section. Code copied by both 🍅 and 📋 will also appear below, where you can edit first, and then copy, paste, and run again.

# Nothing here yet!