Introduction to Git
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What is Git and why should I use it?
How can you use Git for code management?
Objectives
Introduction to Git
Git is a common command-line interface used by developers worldwide to share their work with colleagues and keep their code organized. Teams are not the only ones to benefit from version control: lone researchers can benefit immensely. Keeping a record of what was changed, when, and why is extremely useful for all researchers if they ever need to come back to the project later on (e.g., a year later, when memory has faded).
Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.
Git is the version control software and tool, while GitHub is the website where Git folders/code can be shared and edited by collaborators.
This lesson is accompanied by this Powerpoint presentation.
What can Git do for you?
- Archive all your code changes, for safekeeping and posterity
- Share and build code within your group and across the globe
Why Git is valuable
Think about Google Docs or similar… but for code and data!
- Version Control
- Collaboration
- One True Codebase – authoritative copy shared among colleagues
- Documentation of any changes
- Mark and retrieve the exact version you ran from any point in time, even if it’s been “overwritten”
- Resolve conflicts when editors change the same piece of content
- Supporting open science, open code, and open data. A requirement for a lot of publications!
Basic commands
Turn my code folder into a Git Repository
git init
git add .
adds ALL files to Git’s tracking indexgit commit -m 'add your initial commit message here, describing what this repo will be for'
saves everything that has been “added” to the tracking index.
You will always need to ADD then COMMIT each new file.
Link your Git Repository to the GitHub website, for storage and collaboration
git remote add origin [url]
telling git the web-location with which to linkgit push -u origin master
pushes your work up to the website, in the “master” master!
To add the latest changes to the web-version while you’re working you will always have to ADD, then COMMIT, then PUSH the changes.
Clone a Git Repository to your computer to work on it
git clone [paste the url]
git pull
to get the newest changes from the web-version at any time!
In summary, you should PULL any new changes to keep your repository synced with the website where other people are working, then ADD/COMMIT/PUSH your changes back to the website for other people to see!
As an alternative - you can use an app like TortoiseGit (Windows) or SourceTree (MAC) to stay away from command line. GitHub also has an app! The commands will be the same (ADD, PUSH, etc.) but you will be able to do them by pushing buttons instead of writing them into a command line terminal.
Resources
- An excellent introductory lesson is available from the Carpentries
- Oh shit, git is a website that helps you troubleshoot Git with plain-language search terms
- NYU has a curriculum for sharing within labs - available here
- This article explains why data scientists (us!) should be using Git
Key Points