The purpose of this series of blog posts is to shed a little bit of light on some of the most obscure aspects of Git as a technology and see how we can make better use of it when defining our workflow. We will focus specifically on The GitFlow. However, as we will later see in the third part of this article, this is far from being the only option available for us out there. With that being said, let’s get started.
Git is not GitHub
We all know and love Git (once we get the hang of those few useful commands anyway), but do we really understand it?
Git itself, contrary to what people might think –I know, it happened to me– it’s not short for GitHub. It’s a fully stand-alone technology that is used in a variety of different flavors. GitHub just happens to be the most popular one.
Just to make the distinction a little bit clearer, Git’s sole purpose is to be –fancy name coming up– a Distributed Revision Control System. This essentially translates to: “Stupid Content Tracker”. In its core, Git is just a Persistence Map with keys and values; a file system that uses SHA1 hashes to store objects and identify them uniquely in our Git database.
Git is Just a Graph
Git, this “Stupid Content Tracker”, is just a file system represented as a graph that consists of three main components: blobs, trees, and tags.
Blobs are just pieces of data - the content of a file - while a Tree is a directory stored in Git that can contain Blobs and other Trees.
Tags are just labels attached to objects (such as a Commit) and, amongst other things, contain the SHA1 hash that uniquely identifies that object in our Git database and a name that we can understand as human users.
As we said before, Git is just a File System, which to us just means that it’s a bunch of objects (Commits) linked to each other in a graph. These Trees and Blobs that conform our files ultimately refer to the Commits that we manage with our Branches in our Git database.
Branches are simply references to Commits. So, when we say “Branch” in truth we refer to a single Commit from which all other Commits that conform the state of our project are referenced. It is because of this that Git is able to maintain multiple states of our project; by linking the Commits that conform to any particular state at any particular moment in time.
Ultimately, there are three rules that govern our data when tracking our files in Git:
The Branch or “Current Commit” tracks all our Commits that conform our project at any given moment of time.
Because Branches are just references, when moving to another Branch, your working directory is updated automatically with a new state since the Commits linked to your current branch changed.
Whenever you change or remove a reference to a Branch, if there are unreachable objects left behind in your database (Repository) these will be automatically removed after some time.
Four Main Areas
Before moving on to using Git, first, we must understand how the Git workflow is structured. There are four main areas that make up any Git process: Working Area, Index, Repository and Stash. One important thing to remember is that all areas always have information and the same content, they are never empty, and only diverge when there are unmerged Commits or untracked files because Git only really cares about tracking changes in our information.
Our Working Area refers to the project directory in our local machine. It is our personal playground where we can safely develop new features before committing them to our Repository where we store the final version of our project.
Whenever we make changes in our Working Area we generate untracked files that must be “added” to our Index, which is mainly used as a “staging” area before we commit any changes to the Repository and it’s mainly used to help us keep track of any changes to see if all areas are in “the same page”.
Any time we create new files or change the content of existing files in the Working Area, Git will compare these changes with the content in our Index and will prompt you to “add” them so that Git can keep them on track. A similar process occurs with our Repository, where Git compares it to our Index to see if there are any differences in the Commits stored in it and those in the former and, if so, prompt you to “pull” any changes from it so that your Working Area is kept up to date.
The Stash is, as its name implies, a stash of data. It can be used to store information from the Working Area and the Index as you see fit.
The most important area is the Repository. This area is essentially the “source of truth” for our entire project and it’s used to store all of our latest commits as a database of sorts. When using tools like GitHub, we make use of two repositories: a Local Repository, and a Remote Repository. This is merely the way GitHub works in order to enable people to work collaboratively while keeping the original structure of Git intact.