It can be intimidating and overwhelming to try to contribute to a large open source project. I found this out when I made it my goal to make a significant contribution to a certain library on GitHub. That said, I won’t talk specifically about the project, because this article is about breaking into open source and avoiding some big pitfalls with Git.
At first, contributing didn’t seem very likely because the project was little bit outside of my realm, albeit in a familiar language. In addition, when I began looking over the code base, it appeared as if all of it was complex. But I continued on, looking for any low-hanging fruit that would let me get my foot in the door. I did find an opportunity for contributing, and while I’ll share a few things I learned, I found every developer is capable of making an impact in open source.
For this project, I had forked the repository and had the code running in Xcode where I noticed a few warnings on the left side panel. Warnings in Xcode are something I am familiar with, and this was exactly the (very) low-hanging fruit I was looking for. I fixed the errors with a couple of changes and two Git commits. Then, looking at the upstream repo, I saw nothing had changed yet, so I was ready to make a pull request. I did just that, and eventually all 18 line changes I made got merged into the project. Yay!
I would recommend this approach to anyone trying to break into an open source project that seems overwhelming: Go for the easy tasks now, just as long as you have plans to make larger contributions later. While this was great for my confidence, this was not the level of contributing I wanted to stay at.
Soon after, I found a bigger piece of work I could accomplish. After some back and forth with the repo owners, I did finish my task, submitted my pull request, and had my code merged. This contribution was more significant, with 200+ lines of code, and it helped me hit a point where I felt I was actually making an impact.
It can be hard to contribute to frameworks or apps that are already established in open source. For example, IBM Ready App for Venue is a full app experience with iOS and iPad components and a full back end with server, database, and Bluemix service integrations. It can take time to get to where you are familiar enough with the code that you can identify a need. That said, I did learn a few things about Git and collaborating with people on GitHub.
For example, with most open source projects you have to fork the original repo. Eventually, your fork will become out of date as more work continues to be done on the original repo, so you’ll need the latest one. There are different ways to update your forked repo on your local machine (Git has many features), but this is one way that works:
- Add a remote that points to the original repo. Calling it “upstream” is a common convention.
git remote add upstream https://github.com/user/[openSourceProject].git
- Fetch all the branches so you have a local copy.
git fetch upstream
- Ensure you are on the branch you want updated in your local fork, such as master.
git checkout master
- Lastly, rewrite history a bit with a rebase so any local changes you have are just placed on top of the new changes coming from the upstream master. This allows for a clean history in your pull request later.
git rebase upstream/master
I mentioned a clean history, because this is hugely important to many open source projects. Many repos have a contributing markdown file that explains what pull requests should look like, how they are reviewed, etc. One way to have a clean history is by avoiding merge commits. I won’t delve into merge commits, but they don’t tell you anything about a change. Most of the time, they will just bundle changes from a different branch into one commit with a message like, “Merge branch ‘master’ into feature_branch” — not pretty and not descriptive.
The best way I know how to avoid this ugliness is by rebasing. One way to rebase is by first performing a
git fetch origin (if just updating from your remote repo). This doesn’t actually perform any changes, it just gets all the branches from the remote repo and pulls them down. I will then make sure I’m on the branch I want updated and perform a
git rebase origin/master command, meaning I want my local master to be updated with any changes others have made to the remote master. There could be conflicts, but after you work through them, you have a clean history with any non-pushed local commits placed on top of any new commits you just pulled down.
In much of my time using Git, I avoided the rebase command because I didn’t understand it, it seemed destructive, and merge did the job for me. As with many aspects of software development, I figured out what it does and now find it to be very useful. It does rewrite your history, so I’ll typically use it on a branch or forked repo that only I have worked on.
One other concern I had was the dates of my commits, because when you rebase, by default it doesn’t change the timestamp of a commit but still reorders it. This seemed bad at first, but when you have a couple of related commits, it doesn’t matter when you committed them locally as long as you push them to your master remote branch at the same time. In many ways, I find this cleaner since it puts related commits together instead of spreading them out on your commit timeline.
The last thing I want to mention is that knowing how to squash commits is an important skill. For example, let’s say you are working on a feature for this open source project over the span of four days. You make a little bit of progress each day and commit your work locally with a less than acceptable commit message, like “more changes.” You obviously want to keep all your hard work, but you don’t need four different commits to express what you have done. This is where squashing with an interactive rebase comes in. You start by executing the command to interactively rebase your last four commits,
git rebase -i HEAD~4. Now a vim editor will open up. To achieve the desired affect, you will most likely need to replace the word “pick” with “squash” for all but the top most commit, as seen in this terminal snippet:
pick e6cdee1 first real master commit squash 4c54550 again squash be237c4 a combo commit squash 87940f3 my latest commit # Rebase 04eaab4..87940f3 onto 04eaab4 (4 command(s)) # # Commands: # p, pick = use commit # r, reword = use commit, but edit the commit message # e, edit = use commit, but stop for amending # s, squash = use commit, but meld into previous commit # f, fixup = like "squash", but discard this commit's log message # x, exec = run command (the rest of the line) using shell # d, drop = remove commit # # These lines can be re-ordered; they are executed from top to bottom. # # If you remove a line here THAT COMMIT WILL BE LOST. # # However, if you remove everything, the rebase will be aborted. # # Note that empty commits are commented out ~
Afterwards, you should save that with vim, leave one commit message uncommented, and save that change with vim. Now, those four commits have been squashed into one clean commit. For more-thorough documentation on this process, I would check this out. Additionally, if you are working on a GitHub hosted project, there is a new squash feature right in the pull request interface that repo owners can use. Note, I wouldn’t assume the repo owner wants to use this feature, but it is good to be aware of.
If you spend a little bit of time learning about Git, open source collaboration will be much less intimidating and you can spend more time focusing on new content and fixes rather than wrestling with version control. That said, my learning process did entail some wrestling, but I am happy with the results of what I know and the contribution I ended up making to the open source community.