Exercises done in groups of three. Incidentally George Simmel in early 20th century says that everything changes when you go from 1 to 2 people (a dyad), and then again going from 2 to 3 (a triad). He said that because at 3 people it’s the first time you can be outvoted :) Check it out.
Arrange your chairs so that you can see each other’s screens. That means sitting kinda back to back, which is unnatural but you do want to easily see what each other are seeing
Access the LearnGitBranching vizualizer or the github vizualizer for pretty branching diagrams.
Group of three: Maintainer, Contributor A, Contributor B.
Maintainer creates new repository on github.
Contributor A and Contributor B log into github, find the new repository created by Maintainer and fork it.
Before running git clone, everyone needs to ensure they are in a folder that is not inside another repo. To do this run git status
and you want to see a fatal error.
fatal: Not a git repository (or any of the parent directories): .git
If you do not see the fatal error, then you need to move up in your folder hierarchy until you are not in a repo. Use cd ..
.
Then clone your repository and change directory into it.
If setup properly then the git remote -v
command will show two lines for Maintainer, but four lines for Contributor A and Contributor B:
Maintainer
origin <fork_repo_url> (fetch)
origin <fork_repo_url> (push)
Contributor A and Contributor B:
origin <fork_repo_url> (fetch)
origin <fork_repo_url> (push)
upstream <maintainers_repo_url> (fetch)
upstream <maintainers_repo_url> (push)
Note that git automatically creates the origin
remote as whatever URL was used with git clone
. This has the odd implication that for the Maintainer their origin
is the shared repository, while for Contributor A and Contributor B their origin
is their individual forks.
Each exercise below assumes this setup in Basic forking workflow setup as the starting point.
(If starting here, confirm that your Group of three has the setup in Basic forking workflow setup as the starting point.)
The point of this exercise is to intentionall create conflicts in editing, to demonstrate how they show up in Pull Requests.
animals.txt
with this content:lion
tiger
leopard
turtle
Contributor A and Contributor B each edit the file by adding a different color to the first line in the animals.txt
(so it reads lion red
or lion green
). They add, commit, push to their forks. Note that these edits are incompatible, so they will generate a conflict when we try to merge them below
Contributor A and Contributor B should then go to github and look at their forks. They should create a pull request by hitting the ‘New Pull Request’ button. Note that the pull request will not say “Able to automatically merge” but create the PR anyway.
With everyone looking at Maintainer’s computer, Maintainer should refresh the github page for the shared repository and will see two pull-requests in the pull-request tab. Accept each one, resolving any conflicts that emerge.
They can choose whatever contributions they want (the original line, the line with the color from Contributor B or the line with the color from Contributor A); however the file is after removing the <<<<<
========
and >>>>>>
lines will be what is in the shared repository.
git pull
(since they have a direct clone of upstream). For Contributor B and Contributor A they have to first get the changes in upstream, then push them to their fork.In the exercises above all the work was on the master
branch. That was to keep things simple, but in reality new work should always be on a branch, usually called a (short-lived) “feature branch”. Pull requests then come from the feature branch to the master branch on the shared repo. This is also true for Maintainer as well as Contributor A and Contributor B, the only difference for the Maintainer is that their pull request is from a branch inside the shared repo, while the Contributor A and Contributor B is from a branch in a different repository.
(If starting here, confirm that your Group of three has the setup in Basic forking workflow setup as the starting point.)
for-pull-request
)git push
will fail but the error message will show you a command that will run). Note that is the fork for Contributor A and Contributor B but the shared repo for Maintainer, but called origin
for everyone.fatal: The current branch for-pull-request has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin for-pull-request
Copy the --set-upstream
line and execute. Note that the “upstream” here has nothing to do with the “upstream” remote (which is a name for the shared repo). Here it means “which remote should be connected to this branch” and almost always that is going to be an identically named branch in origin
.
Total 0 (delta 0), reused 0 (delta 0)
remote:
remote: Create a pull request for 'for-pull-request' on GitHub by visiting:
remote: https://github.com/jameshowison/pp2019_pr_owner/pull/new/for-pull-request
remote:
To https://github.com/jameshowison/pp2019_pr_owner.git
* [new branch] for-pull-request -> for-pull-request
Branch for-pull-request set up to track remote branch for-pull-request from origin.
master
on the shared repo. All should do this (Maintainer, Contributor A, and Contributor B).(Note that I think that up-to-date message is wrong because branch is behind master, hence being able to pull things, if anyone knows what’s up there please tell me!)
remote: Enumerating objects: 1, done. remote: Counting objects: 100% (1/1), done. remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 Unpacking objects: 100% (1/1), done. From https://github.com/jameshowison/pp2019_pr_owner 36cd034..6d0673d master -> origin/master Updating 36cd034..6d0673d Fast-forward README.md | 1 + 1 file changed, 1 insertion(+) $ git status On branch master Your branch is up-to-date with ‘origin/master’.
nothing to commit, working tree clean
</div>
8. Delete the branch locally and on github
<div class="spoiler">
```sh
$ git branch -d for-pull-request
Deleted branch for-pull-request (was 01541c3).
$git push origin --delete for-pull-request
To https://github.com/jameshowison/pp2019_pr_owner.git
- [deleted] for-pull-request
Here the scenario is that you’ve created a pull request but it hasn’t yet been accepted.
If you add additional commits to that branch then they will get added to the pull request. Remember, a pull request says, “Please come to this repo and get everything on this branch.” So it’s not the same as puttings some commits in a zip file and mailing them, it’s more like puttings things into a particular spot (like a mailbox or deaddrop in a spy movie), and telling people to pick them up from there.
So you can always add additional things before the person comes to pick them up. This is useful because if there is a conversation around the pull request then you can easily update things. For example, if someone said “Please fix a typo or pull from upstream before we consider your pull request,” you’d be able to do so without opening another PR (just add, commit, push to your branch on your fork and the PR is updated).
However, it is an issue if you accidentally add new commits to a branch before a pull request is accepted. Now your pull request has two sets of commits: the first set you meant to include and the second set you didn’t. This mistake is particularly easy to make if you are developing on the master
branch in your fork (which you shouldn’t do), but also happens if you are have more than one contribution that you are working on, as when you are doing something else while waiting for a PR to be accepted. If you’ve accidentally added too many files to your pull request—something that is easy to do if you use git add *
or some variant—you’ll also find yourself needing to remove content from your PR.
First, take a look at this in the Learning Git Visualizer, I have created a level called Split Pull Request.
Things would be better if you had created a new branch for the first set of commits, then a second branch for your second set of commits, never adding either set to your master branch and following the “always work on a (short-lived) feature branch” rule. Then each set of commits would be “sent” through a different pull request.
/ Branch for first set.
/
master -
\
\ Branch for second set
The grey sections below are commands, the white is output.
This is output.
Below, I make a new repo, create a README file and make a commit, and edit it once making another commit:
Initialized empty Git repository in /Users/howison/Documents/UTexas/Courses/PeerProduction/practice/.git/
[master (root-commit) d5eb1e8] Adding readme
1 file changed, 1 insertion(+)
create mode 100644 README
* d5eb1e8 (HEAD -> master) Adding readme
So now we have a single commit on a single branch master
. To create a new file we can use touch
(which creates an empty file here, but that’s ok).
[master 749454e] add master2
1 file changed, 2 insertions(+)
git log --oneline --abbrev-commit --all --graph --decorate --color
* 749454e (HEAD -> master) add master2
* d5eb1e8 Adding readme
Now we have added a commit on master
so the HEAD -> master
shows the ‘tip’ of the master branch.
Now we can create the feature branch (first_set_branch
) for the first set and look at the commits we’ve made. Using git checkout
with -b
creates a new branch:
Switched to a new branch 'first_set_branch'
[first_set_branch d71ade9] first set edit
1 file changed, 1 insertion(+)
create mode 100644 first_set_file
* d71ade9 (HEAD -> first_set_branch) first set edit
* 749454e (master) add master2
* d5eb1e8 Adding readme
Now we can see that we have two branches (master)
and first_set_branch
. HEAD
shows us that we are currently in the first_set_branch
branch.
Now we’re going to change back to master.
Switched to branch 'master'
* d71ade9 (first_set_branch) first set edit
* 749454e (HEAD -> master) add master2
* d5eb1e8 Adding readme
See how the HEAD
changed, showing us that we’re looking at the master branch. Notice also that there is a commit (d71ade9
) that isn’t in master
.
Now we can create the second branch:
Switched to a new branch 'second_set_branch'
[second_set_branch 9dc0a23] first edit in second set
1 file changed, 1 insertion(+)
create mode 100644 second_set_file
* 9dc0a23 (HEAD -> second_set_branch) first edit in second set
| * d71ade9 (first_set_branch) first set edit
|/
* 749454e (master) add master2
* d5eb1e8 Adding readme
Now the visualization has changed a bit. We can see that there are two branches that “come off” head after 749454e
. We can work independently, and could create pull requests for each branch separately. We are still on the second set’s branch, let’s add a second file to it:
[second_set_branch 04d6967] added to second set
1 file changed, 1 insertion(+)
create mode 100644 second_set_file2
* 04d6967 (HEAD -> second_set_branch) added to second set
* 9dc0a23 first edit in second set
| * d71ade9 (first_set_branch) first set edit
|/
* 749454e (master) add master2
* d5eb1e8 Adding readme
And just for fun swap back to the first branch and add a second file there:
Switched to branch 'first_set_branch'
git log --oneline --abbrev-commit --all --graph --decorate --color
* 04d6967 (second_set_branch) added to second set
* 9dc0a23 first edit in second set
| * d71ade9 (HEAD -> first_set_branch) first set edit
|/
* 749454e (master) add master2
* d5eb1e8 Adding readme
[first_set_branch b74cb32] added second file to first set
1 file changed, 1 insertion(+)
create mode 100644 first_set_file2
* b74cb32 (HEAD -> first_set_branch) added second file to first set
* d71ade9 first set edit
| * 04d6967 (second_set_branch) added to second set
| * 9dc0a23 first edit in second set
|/
* 749454e (master) add master2
* d5eb1e8 Adding readme
Now the branching in this in-terminal visualization is a bit clearer.
Ok, so this is the right way to do it: with each file/commit on the correct branch (and thus separate PRs).
Pretend we had never made the two additional branches but made those commits all on the master branch. Then if a pull request opened after the first set of files 51f2622
was never accepted, our second set would have just piled on top and been added to the pull request.
git log --oneline --abbrev-commit --all --graph --decorate --color
* b5d4aff (HEAD -> master) added to second set
* dfa98b0 first edit in second set
* 43bc368 added second file to first set
* 51f2622 first set edit
* f945961 add master2
* 367e25a Adding readme
The problem here is that the top two commits (reading downward) are on master
and should be on second_set_branch
and the third and fourth commits are on master
and should be on first_set_branch
.
So our challenge is to make this linear setup look like our branched setup above. There are a few routes we can take.
First we’re going to use the ability to create a new branch from an earlier point in history. By default git checkout -b newbranch
will branch from the current HEAD, but we can tell it to branch earlier. So first we’re going to create our two branches as though we’d done it before we started working, which is when HEAD was at f945961
.
Switched to a new branch 'first_set_branch'
* b5d4aff (master) added to second set
* dfa98b0 first edit in second set
* 43bc368 added second file to first set
* 51f2622 first set edit
* f945961 (HEAD -> first_set_branch) add master2
* 367e25a Adding readme
So now we can see (HEAD -> first_set_branch)
at f945961
as the branching point for first_set_branch. However the commits we need on that branch are not on it, we need to bring them over from master
.
We can do that using git cherry-pick
pointing to each of the two commits we want to bring over. Below we can see (moving up the left) that we have a first_set_branch
which has both of our needed commits. Note that this doesn’t `move'' them from the
master` branch, but creates new commits with the same content.
[first_set_branch ef2a3ee] first set edit
Date: Tue Apr 10 17:42:47 2018 -0500
1 file changed, 1 insertion(+)
create mode 100644 first_set_file
* ef2a3ee (HEAD -> first_set_branch) first set edit
| * b5d4aff (master) added to second set
| * dfa98b0 first edit in second set
| * 43bc368 added second file to first set
| * 51f2622 first set edit
|/
* f945961 add master2
* 367e25a Adding readme
[first_set_branch 9f2b69a] added second file to first set
Date: Tue Apr 10 17:43:07 2018 -0500
1 file changed, 1 insertion(+)
create mode 100644 first_set_file2
* 9f2b69a (HEAD -> first_set_branch) added second file to first set
* ef2a3ee first set edit
| * b5d4aff (master) added to second set
| * dfa98b0 first edit in second set
| * 43bc368 added second file to first set
| * 51f2622 first set edit
|/
* f945961 add master2
* 367e25a Adding readme
So that’s good, because now we have the right commits on first_set_branch
. (We also have them on master, but that’s not a problem as we’d be making our pull request from first_set_branch
not from master.)
We can do the same thing to create our separate second_set_branch
, starting by pointing back at the same branching point:
git checkout -b second_set_branch f945961
Switched to a new branch 'second_set_branch'
* 9f2b69a (first_set_branch) added second file to first set
* ef2a3ee first set edit
| * b5d4aff (master) added to second set
| * dfa98b0 first edit in second set
| * 43bc368 added second file to first set
| * 51f2622 first set edit
|/
* f945961 (HEAD -> second_set_branch) add master2
* 367e25a Adding readme
[second_set_branch 0d8e90d] first edit in second set
Date: Tue Apr 10 17:43:35 2018 -0500
1 file changed, 1 insertion(+)
create mode 100644 second_set_file
[second_set_branch 0841579] added to second set
Date: Tue Apr 10 17:43:57 2018 -0500
1 file changed, 1 insertion(+)
create mode 100644 second_set_file2
* 0841579 (HEAD -> second_set_branch) added to second set
* 0d8e90d first edit in second set
| * 9f2b69a (first_set_branch) added second file to first set
| * ef2a3ee first set edit
|/
| * b5d4aff (master) added to second set
| * dfa98b0 first edit in second set
| * 43bc368 added second file to first set
| * 51f2622 first set edit
|/
* f945961 add master2
* 367e25a Adding readme
So now we show three branches coming off at f945961
: master
, first_set_branch
, and second_set_branch
. Each of first_set_branch and second_set_branch have just the commits and files that they should:
first_set_branch
master
* second_set_branch
README second_set_file second_set_file2
Switched to branch 'first_set_branch'
README first_set_file first_set_file2
So now if we create pull requests from each of those branches, they’ll contain just the content we wanted in each of them. Hurray!
Groups of 3. Nominate Maintainer, Contributor A, and Contributor B.
will_need_split
upstream
from their will_need_split
branch (including all four commits).split_branch_1
and split_branch_2
send through separate pull requests with only the right commits/files in them.Now we just have to consider what to do about the additional commits on master
. This is going to depend on a few things. Do you have additional commits on master
that you haven’t distributed among new branches? Do you want master
to exactly reflect the history of master
on upstream
? How tolerant are you of “messy history”? It’s also going to depend on whether master
had been pushed up and whether others might have built on top of it.
There are many options, here are two:
git revert
creates “opposite” commits, that undo any changes. So if you lay a revert commit on top of a commit then the content is as though the commit was never made.git reset -hard
, create a new branch from your branching point, then rename it to master
, orphaning your current master.First I created a copy of the repo, practice_split_revert/
, so I could show both options. Then I switched to the master
branch in that copy.
* first_set_branch
master
second_set_branch
Switched to branch 'master'
* 0841579 (second_set_branch) added to second set
* 0d8e90d first edit in second set
| * 9f2b69a (first_set_branch) added second file to first set
| * ef2a3ee first set edit
|/
| * b5d4aff (HEAD -> master) added to second set
| * dfa98b0 first edit in second set
| * 43bc368 added second file to first set
| * 51f2622 first set edit
|/
* f945961 add master2
* 367e25a Adding readme
Then I used a range expression (the ..
) with revert to revert the set of four commits on master
. Note that the double dot range syntax used like this is left exclusive (i.e. you have to go back one further than the first you want). I also had to provide four commit messages, one for each of the reverting commits.
git revert f945961..b5d4aff
[master afcee46] Revert "added to second set"
1 file changed, 1 deletion(-)
delete mode 100644 second_set_file2
[master 45731a7] Revert "first edit in second set"
1 file changed, 1 deletion(-)
delete mode 100644 second_set_file
[master 302d9de] Revert "added second file to first set"
1 file changed, 1 deletion(-)
delete mode 100644 first_set_file2
[master 0e022e3] Revert "first set edit"
1 file changed, 1 deletion(-)
delete mode 100644 first_set_file
* 0e022e3 (HEAD -> master) Revert "first set edit"
* 302d9de Revert "added second file to first set"
* 45731a7 Revert "first edit in second set"
* afcee46 Revert "added to second set"
* b5d4aff added to second set
* dfa98b0 first edit in second set
* 43bc368 added second file to first set
* 51f2622 first set edit
| * 0841579 (second_set_branch) added to second set
| * 0d8e90d first edit in second set
|/
| * 9f2b69a (first_set_branch) added second file to first set
| * ef2a3ee first set edit
|/
* f945961 add master2
* 367e25a Adding readme
README
Again, I create a copy of the repo, practice_split_reset_master:
Switched to branch 'master'
* 0841579 (second_set_branch) added to second set
* 0d8e90d first edit in second set
| * 9f2b69a (first_set_branch) added second file to first set
| * ef2a3ee first set edit
|/
| * b5d4aff (HEAD -> master) added to second set
| * dfa98b0 first edit in second set
| * 43bc368 added second file to first set
| * 51f2622 first set edit
|/
* f945961 add master2
* 367e25a Adding readme
Now we use git reset --hard
to force HEAD back to f945961
. That disconnects the four unwanted commits (I think they aren’t actually gone from .git folder yet, but they could be deleted any time as they aren’t connected to anything.).
HEAD is now at f945961 add master2
* 0841579 (second_set_branch) added to second set
* 0d8e90d first edit in second set
| * 9f2b69a (first_set_branch) added second file to first set
| * ef2a3ee first set edit
|/
* f945961 (HEAD -> master) add master2
* 367e25a Adding readme
That seems neater, but keep in mind that if you’d pushed master
to github while those commits were there, anyone else who had built on your repo would have lots of trouble. The revert
approach goes for messier history, but has the advantage that it doesn’t disconnect anyone else.
When one is working on a problem, others may be working in parallel. And their parallel work may be finished before one’s own work is. Thus the starting point for one person’s work (branching point) can “go stale” making it harder to integrate.
While git merge
and resolving syntax level conflicts can resolve some of this, it is often easier to understand and review work if it is presented as changes against an updated starting point.
As a concrete example imagine a project to build a dashboard. Imagine that you fork the repo in January to implement a new type of visualization (let’s say a pie chart). You work on this during January and February, finally nailing it down at the start of March. Meanwhile, though, others in the project have spent February introducing a whole new way of accessing databases. By the time you make a pull request at the start of March things have changed a lot since you branched in January.
If you submit a PR without updating, the maintainers will likely ask you to update your branch to make it work with the new database system.
--January-|------February--------------|--March
__pie_chart_branch___________
/ \
master--|--|-----------------------|----|-------
\ /
\__new_database_____/
First thing to do is to update your local repository with the changes from upstream.
Then you could try two options:
master
into pie_chart_branch
yourselfpie_chart_branch
on master (as though pie_chart_branch
was created in late February and you did all the work very quickly!)Option 1 is possible, but often merging your work involves touching parts of the system you don’t know what much about and is better left to the core developers. In addition, merging in this way leaves merge commit messages and some projects really don’t like those because they make the history harder to read.
Option 2 is generally preferred, since it focuses on clear communication via PRs that are easier to read and review.
Option 2 is called rebase
and is explained usefully at this page from the EdX project. As you rebase, you can also squash
some of your commits (treat many commits as one) to make them easier to follow for those reviewing your pull request. See the link above for details.
The purpose of git
is to retain all of your history, so that you can go back to any point in development and recover (as well as experiment while not breaking the mainline of development). Simultaneously when we are working in the open that means that anyone can view any file that was ever in a repository. With that in mind it is not too surprising that if you accidentally add something to git and then push it to github you can have trouble putting “the genie back in the bottle.”
Let’s say that we create a repo and add a README, then add a SPECIAL_SECRET file with the password “swordfish” in it. Note that I use git add *
below which is a very common way to accidentally add a problematic file, try to get into the habit of adding files one by one.
SOI-A14570-Howison:PeerProduction howison$ cd practice_history_edit/
git init
Initialized empty Git repository in /Users/howison/Documents/UTexas/Courses/PeerProduction/practice_history_edit/.git/
vi README
git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
README
nothing added to commit but untracked files present (use "git add" to track)
git add README
git commit -m "now we have a README"
[master (root-commit) f4878b0] now we have a README
1 file changed, 1 insertion(+)
create mode 100644 README
vi SPECIAL_SECRET
git add *
git commit -m "whoops added secret"
[master 018f6b5] whoops added secret
1 file changed, 1 insertion(+)
create mode 100644 SPECIAL_SECRET
git log --oneline --abbrev-commit --all --graph --decorate --color
* 018f6b5 (HEAD -> master) whoops added secret
* f4878b0 now we have a README
Now I’ll go ahead and make one more edit to README
vi README
git add READMEgit commit -m "README edit 2"[master 4d51f91] README edit 2
1 file changed, 1 insertion(+)
git log --oneline --abbrev-commit --all --graph --decorate --color* 4d51f91 (HEAD -> master) README edit 2
* 018f6b5 whoops added secret
* f4878b0 now we have a README
ls
README SPECIAL_SECRET
cat SPECIAL_SECRET
swordfish
Ok, so we realize that the password file got into git and we swing into action and delete it from git.
git rm SPECIAL_SECRET
rm 'SPECIAL_SECRET'
git commit -m "phew removed it, or did we"
[master ff229ba] phew removed it, or did we
1 file changed, 1 deletion(-)
delete mode 100644 SPECIAL_SECRET
ls
README
So now the file is not there. Or rather it is not in our working directory. The problem is that it is still inside out .git
folder and we can get it out easily.
git checkout HEAD~1
Note: checking out 'HEAD~1'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 4d51f91... README edit 2
lsREADME SPECIAL_SECRET
cat SPECIAL_SECRET
swordfish
Here I just used git checkout HEAD~1
which goes one commit back in time, to before we deleted the SPECIAL_SECRET file. Even if we were far ahead, or over on other branches etc, I could always get back by asking to see the code just after the commit that added the file git checkout 018f6b5
(btw, to get out of DETACHED HEAD
state just checkout the branch again, we’re working on master so it would be git checkout master
).
So using git rm
removes a file from the working directories but it doesn’t remove it from the git history. And that’s a sensible thing, usually you want to be able to go back in time. But sometimes you want to remove something from the history entirely. You can do that using the approaches outlined by Github here: Removing sensitive data from a repository
The process is a bit complex (as it should be) but simplified with the bfg
tool, as described at the link above. First you have to download the tool (which requires Java to run) then follow the instructions step by step.
Keep in mind that if you had pushed this sensitive info to a repo on github and others had then forked or cloned it then that info is not going to be deleted from the clones, so passwords should definitely be changed and you should ask everyone to delete forks/clones and start again.
THere are a set of approachs to avoid uploading sensitive data. A good starting point is discipline around using .gitignore
which will prevent adding files that should not be added. Another approach is to become familar with using environment variables to hold secrets. This is an evolving area, so ask others how they handle secrets (usually access credentials) when using git.
Undo is sort of the point of version control, but it can be complicated. Here’s a useful blog entry: How to undo almost anything with git