Version Control

It is difficult to overstate the importance of version control.

I believe that it is as important as the inventions of the chalkboard and of the book for multiplying the power of people to create together.

[...]

Using a version control system properly is a way to think: to structure, remember, and share thoughts, at the level of depth and rigor demanded by the exhausting craft of writing software. Without that understanding, using Git will be, at best, magical incantations, used by rote, and full of unknown dangers. With that understanding, Git can become almost invisible, leaving instead the patterns of working up the intricate spells of symbols that are the magic of software.

Git for Teams,

Authorship/Autoría: Emma Jane Hogbin Westby

Terminology and Concepts

A Git project is represented by a repository, which contains the complete history of the project from its inception. A repository in turn consists of a set of individual snapshots of project content —collections of files and directories—called commits. A single commit comprises the following:

A project content snapshot, called a tree
A structure of nested files and directories representing a complete state of the project
The author identification
Name, email address, and date/time (or timestamp) indicating who made the changes that resulted in this project state and when
The committer identification
The same information about the person who added this commit to the repository (which may be different from the author)
A commit message
Text used to comment on the changes made by this commit
A list of zero or more parent commits
References to other commits in the same repository, indicating immediately preceding states of the project content

The Commit Graph

The set of all commits in a repository, connected by lines indicating their parent commits, forms a picture called the repository commit graph, shown below:

The repository commit graph

The letters and numbers here represent commits, and arrows point from a commit to its parents. Commit A has no parents and is called a root commit; it was the initial commit in this repository's history. Most commits have a single parent, indicating that they evolved in a straightforward way from a single previous state of the project, usually incorporating a set of related changes made by one person. Some commits, here just the one labeled E, have multiple parents and are called merge commits. This indicates that the commit reconciles the changes made on distinct branches of the commit graph, often combining contributions made separately by different people.

Since it is normally clear from context in which direction the history proceeds—usually, as here, parent commits appear to the left of their children—we will omit the arrow heads in such diagrams from now on.

Branches

The labels on the right side of this picture—master, topic, and release—denote branches. The branch name refers to the latest commit on that branch; here, commits F, 4, and Z, respectively, are called the tip of the branch. The branch itself is defined as the collection of all commits in the graph that are reachable from the tip by following the parent arrows backward along the history. Here, the branches are:

  • release = {A, B, C, X, Y, Z}
  • master = {A, B, C, D, E, F, 1, 2}
  • topic = {A, B, 1, 2, 3, 4}

Note that branches can overlap; here, commits 1 and 2 are on both the master and topic branches, and commits A and B are on all three branches. Usually, you are on a branch, looking at the content corresponding to the tip commit on that branch. When you change some files and add a new commit containing the changes (called committing to the repository), the branch name advances to the new commit, which in turn points to the old commit as its sole parent; this is the way branches move forward. From time to time, you will tell Git to merge several branches (most often two, but there can be more), tying them together as at commit E in the figure above. The same branches can be merged repeatedly over time, showing that they continued to progress separately while you periodically combined their contents.

The first branch in a new repository is named master by default, and it's customary to use that name if there is only one branch in the repository, or for the branch that contains the main line of development (if that makes sense for your project). You are not required to do so, however, and there is nothing special about the name master apart from convention, and its use as a default by some commands.

Sharing Work

There are two contexts in which version control is useful: private and public. When working on your own, it's useful to commit early and often, so that you can explore different ideas and make changes freely without worrying about recovering earlier work. Such commits are likely to be somewhat disorganized and have cryptic commit messages, which is fine because they need to be intelligible only to you, and for a short period of time. Once a portion of your work is finished and you're ready to share it with others, though, you may want to reorganize those commits, to make them well-factored with regard to reusability of the changes being made (especially with software), and to give them meaningful, well-written commit messages.

In centralized version control systems, the acts of committing a change and publishing it for others to see are one and the same: the unit of publication is the commit, and committing requires publishing (applying the change to the central repository where others can immediately see it). This makes it difficult to use version control in both private and public contexts.

Git

A distributed version control system

Git is a tool for tracking changes made to a set of files over time, a task traditionally known as version control. It was created by Linus Torvalds in 2005, and has been maintained by Junio Hamano since then. Although it is most often used by programmers to coordinate changes to software source code, and it is especially good at that, you can use Git to track any kind of content at all. Any body of related files evolving over time, which we'll call a project, is a candidate for using Git. With Git, you can:

...and much more.

Introduction to Git

Simply put, Git is a content tracker. Given that notion, Git shares common principles of most version control systems. However, the distinct feature that makes Git unique among the variety of tools available today is that it is a distributed version control system. This distinction means Git is fast and scalable, has a rich collection of command sets that provide access to both high level and low-level operations, and is optimized for local operations.

Git Components (Client and Server)

Git GUI tools act as a frontend for the Git command line, and some tools have extensions that integrate with popular Git hosting platforms. The Git client tools mostly work on the local copy of your repository.

When you are working with Git, a typical setup includes a Git server and Git clients. You can possibly forgo a server, but that would add complexity to how you maintain and manage repositories when sharing revision changes in a collaborative setup and would make consistency more difficult. The Git server and clients work as follows:

Git server
A Git server enables you to collaborate more easily because it ensures the availability of a central and reliable source of truth for the repositories you will be working on. A Git server is also where your remote Git repositories are stored; as common practice goes, the repository has the most up-to-date and stable source of your projects. You have the option to install and configure your own Git server, or you can forgo the overhead and opt to host your Git repositories on reliable third-party hosting sites such as GitHub, GitLab, and Bitbucket.
Git clients
Git clients interact with your local repositories, and you are able to interact with Git clients via the Git command line or the Git GUI tools. When you install and configure a Git client, you will be able to access the remote repositories, work on a local copy of the repository, and push changes back to the Git server. If you are new to Git, we recommend starting out using the Git command line; familiarize yourself with the common subset of git commands required for your day-to-day operations and then progress to a Git GUI tool of your choice.

The reason for this approach is that to some extent, Git GUI tools tend to provide terminologies that represent a desired outcome that may not be part of Git's standard commands.

Git Characteristics

Now that we have given an overview of the Git components, let's learn about the characteristics of Git. Understanding these distinct traits of Git enables you to effortlessly switch from a centralized version control mindset to a distributed version control mentality. We like to refer to this as Thinking in Git:

Git stores revision changes as snapshots
The very first concept to unlearn is the way Git stores multiple revisions of a file that you are working on. Unlike other version control systems, Git does not track revision changes as a series of modifications, commonly known as deltas; instead, it takes a snapshot of changes made to the state of your repository at a specific point in time. In Git terminology this is known as a commit. Think of this as capturing a moment in time, as through a photograph.
Git is enhanced for local development
In Git, you work on a copy of the repository on your local development machine. This is known as a local repository, or a clone of the remote repository on a Git server. Your local repository will have the resources and the snapshots of the revision changes made on those resources all in one location. Git terms these collections of linked snapshots repository commit history, or repo history for short. This allows you to work in a disconnected environment since Git does not need a constant connection to the Git server to version-control your changes. As a natural consequence, you are able to work on large, complex projects across distributed teams without compromising efficiency and performance for version control operations.
Git is definitive
Definitive means the git commands are explicit. Git waits for you to provide instructions on what to do and when to do it. For example, Git does not automatically sync changes from your local repository to the remote repository, nor does it automatically save a snapshot of a revision to your local repo history. Every action requires your explicit command or instruction to tell Git what is required, including adding new commits, fixing existing commits, pushing changes from your local repository to the remote repository, and even retrieving new changes from the remote repository. In short, you need to be intentional with your actions. This also includes letting Git know which files you intend to track, since Git does not automatically add new files to be version controlled.
Git is designed to bolster nonlinear development

Git allows you to ideate and experiment with various implementations of features for viable solutions to your project by enabling you to diverge and work in parallel along the main, stable codebase of your project. This methodology, called branching, is a very common practice and ensures the integrity of the main development line, preventiing any accidental changes that may break it.

In Git, the concept of branching is considered lightweight and inexpensive because a branch in Git is just a pointer to the latest commit in a series of linked commits. For every branch you create, Git keeps track of the series of commits for that branch. You can switch between branches locally. Git then restores the state of the project to the most recent moment when the snapshot of the specified branch was created. When you decide to merge the changes from any branch into the main development line, Git is able to combine those series of commits by applying techniques that we will be discussing....

The Git Command Line

Historically, Git was provided as a suite of many simple, distinct, standalone commands developed according to the Unix philosophy: build small, interoperable tools. Each command sported a hyphenated name, such as git-commit and git-log. However, modern Git installations no longer support the hyphenated command forms and instead use a single git executable with a subcommand.

The git commands understand both “short” and “long” options. For example, the git commit command treats the following examples equivalently:

$ git commit -m "Fix a typo."
$ git commit --message="Fix a typo."

Quick Introduction to Using Git

To see Git in action, you can create a new repository, add some content, and track a few revisions. You can create a repository in two ways: either create a repository from scratch and populate it with some content, or work with an existing repository by cloning it from a remote Git server.

Preparing to Work with Git

Whether you are creating a new repository or working with an existing repository, there are basic prerequisite configurations that you need to complete after installing Git on your local development machine. This is akin to setting up the correct date, time zone, and language on a new camera before taking your first snapshot.

At a bare minimum, Git requires your name and email address before you make your first commit in your repository. The identity you supply then shows as the commit author, baked in with other snapshot metadata. You can save your identity in a configuration file using the git config command:

$ git config user.name "Francisco Fernández-Victorio"
$ git config user.email "franciscofvh@hotmail.com"

If you decide not to include your identity in a configuration file, you will have to specify your identity for every git commit subcommand by appending the argument --author at the end of the command:

$ git commit -m "log message" --author="Francisco Fernández-Victorio franciscofvh@hotmail.com"

Keep in mind that this is the hard way, and it can quickly become tedious.

You can also specify your identity by supplying your name and email address to the GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL environment variables, respectively. If set, these variables will override all configuration settings. However, for specifications set on the command line, Git will override the values supplied in the configuration file and environment variable.

Working with a Local Repository

Now that you have configured your identity, you are ready to start working with a repository. Start by creating a new empty repository on your local development machine. We will start simple and work our way toward techniques for working with a shared repository on a Git server.

Creating an initial repository

We will model a typical situation by creating a repository for your personal website. Let's assume you're starting from scratch and you are going to add content for your project in the local directory ~/my_website, which you place in a Git repository. Type in the following commands to create the directory, and place some basic content in a file called index.xhtml:

$ mkdir ~/my_website
$ cd ~/my_website
$ echo 'My awesome website!' > index.xhtml

To convert ~/my_website into a Git repository, run git init. Here we provide the option -b followed by a default branch named main:

$ git init -b main
  Initialized empty Git repository in ../my_website/.git/

If you prefer to initialize an empty Git repository first and then add files to it, you can do so by running the following commands:

$ git init -b main ~/my_website
Initialized empty Git repository in ../my_website/.git/
$ cd ~/my_website
$ echo 'My awesome website!' > index.xhtml
You can initialize a completely empty directory or an existing directory full of files. In either case, the process of converting the directory into a Git repository is the same.

The git init command creates a hidden directory called .git at the root level of your project. All revision information along with supporting metadata and Git extensions are stored in this top level, hidden .git folder.

Git considers ~/my_website to be the working directory. This directory contains the current version of files for your website. When you make changes to existing files or add new files to your project, Git records those changes in the hidden .git folder.

For the purpose of learning, we will reference two virtual directories that we call Local History and Index to illustrate the concept of initializing a new Git repository.

Adding a file to your repository

Up to this point, you have only created a new Git repository. In other words, this Git repository is empty. Although the file index.xhtml exists in the directory ~/my_website, to Git, this is the working directory, a representation of a scratch pad or directory where you frequently alter your files.

When you have finalized changes to the files and want to deposit those changes into the Git repository, you need to explicitly do so by using the git add file command:

$ git add index.xhtml
Although you can let Git add all the files in the directory and all subdirectories using the git add . command, this stages everything, and we advise you to be intentional with what you are planning to stage, mainly to prevent sensitive information or unwanted files from being included when commits are made. To avoid including such information, you can use the .gitignore file

The argument ., the single period or dot in Unix parlance, is shorthand for the current directory. With the git add command, Git understands that you intend to include the final iteration of the modification on index.xhtml as a revision in the repository. However, so far Git has merely staged the file, an interim step before taking a snapshot via a commit.

Git separates the add and commit steps to avoid volatility while providing flexibility and granularity in how you record changes. Imagine how disruptive, confusing, and time-consuming it would be to update the repository each time you add, remove, or change a file. Instead, multiple provisional and related steps, such as an add, can be batched, thereby keeping the repository in a stable, consistent state. This method also allows us to craft a narrative of why we are changing the code.

We recommend that you strive to group logical change batches before making a commit. This is called an atomic commit and will help you in situations where you'll need to do some advanced Git operations.

Running the git status command reveals this in-between state of index.xhtml:

$ git status
  On branch main
  No commits yet
  Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
    new file: index.xhtml

The command reports that the new file index.xhtml will be added to the repository during the next commit.

After staging the file, the next logical step is to commit the file to the repository. Once you commit the file, it becomes part of the repository commit history; for brevity, we will refer to this as the repo history. Every time you make a commit, Git records several other pieces of metadata along with it, most notably the commit log message and the author of the change.

A fully qualified git commit command should supply a terse and meaningful log message using active language to denote the change that is being introduced by the commit. This is very helpful when you need to traverse the repo history to track down a specific change or quickly identify changes of a commit without having to dig deeper into the change details.

Let's commit the staged index.xhtml file for your website:

$ git commit -m "Initial contents of my_website"
  [main (root-commit) c149e12] initial contents of my_website
  1 file changed, 1 insertion(+)
  create mode 100644 index.xhtml
The details of the author who is making the commit are retrieved from the Git configuration we set up earlier.

In the code example, we supplied the -m argument to be able to provide the log message directly on the command line. If you prefer to provide a detailed log message via an interactive editor session, you can do so as well. You will need to configure Git to launch your favorite editor during a git commit (leave out the -m argument); if it isn't set already, you can set the $GIT_EDITOR environment variable as follows:

# In bash or zsh
$ export GIT_EDITOR=vim
# In tcsh
$ setenv GIT_EDITOR emacs
Git will honor the default text editor configured in the shell environment variables VISUAL and EDITOR. If neither is configured, it falls back to using the vi editor.

After you commit the index.xhtml file into the repository, run git status to get an update on the current state of your repository. In our example, running git status should indicate that there are no outstanding changes to be committed:

$ git status
  On branch main
  nothing to commit, working tree clean

Git also tells you that your working directory is clean, which means the working directory has no new or modified files that differ from what is in the repository.

The difference between git add and git commit is much like you organizing a group of schoolchildren in a preferred order to get the perfect classroom photograph: git add does the organizing, whereas git commit takes the snapshot.

Making another commit

Next, let's make a few modifications to index.xhtml and create a repo history within the repository.

Convert index.xhtml into a proper HTML file, and commit the alteration to it:

$ cd ~/my_website
# edit the index.xhtml file.
$ cat index.xhtml
    <html>
    <body>
    My website is awesome!
  </body>
</html>
$ git commit index.xhtml -m 'Convert to HTML'
[main 521edbe] Convert to HTML
1 file changed, 5 insertions(+), 1 deletion(-)

If you are already familiar with Git, you may be wondering why we skipped the git add index.xhtml step before we committed the file. It is because the content to be committed may be specified in more than one way in Git.

Type git commit --help to learn more about these options

Detailed explanations of the various commit methods are also explained in the git commit --help manual pages.

In our example, we decided to commit the index.xhtml file with an additional argument, the -m switch, which supplied a message explaining the changes in the commit: Convert to HTML.

Git Ignore

.gitignore tells the Git which files should be ignored from tracking and committing. It is generally used to ignore IDE or system generated files and other transient files that aren’t supposed to be present in the Git.

Viewing your commits

Now that you have more commits in the repo history, you can inspect them in a variety of ways. Some git commands show the sequence of individual commits, others show the summary of an individual commit, and still others show the full details of any commit you specify in therepository.

The git log command yields a sequential history of the individual commits within the repository.

For repositories with many commit histories, this standard view may not help you traverse a long list of detailed commit information with ease; in such situations you can provide the --oneline switch to list a summarized commit ID number along with the commit message:

$ git log --oneline

If you want to see more detail about a particular commit, use the git show command with a commit ID number

If you run git show without an explicit commit number, it simply shows the details of the HEAD commit, in our case, the most recent one.

The git log command shows the commit logs for how changes for each commit are included in the repo history. If you want to see concise, one-line summaries for the current development branch without supplying additional filter options to the git log --oneline command, an alternative approach is to use the git show-branch command:

The phrase --more=10 reveals up to an additional 10 versions, but only two exist so far and so both are shown. (The default in this case would list only the most recent commit.) The name main is the default branch name.

$ git show-branch --more=10
[main] Convert to HTML
[main^] Initial contents of my_website

Viewing commit differences

With the repo history in place from the addition of commits, you can now see the differences between the two revisions of index.xhtml. You will need to recall the commit ID numbers and run the git diff command:

$ git diff c149e12e89a9c035b9240e057b592ebfc9c88ea4 \
521edbe1dd2ec9c6f959c504d12615a751b5218f
A plus sign (+) precedes each line of new content after the minus sign (–), which indicates removed content.
Do not be intimidated by the long hex numbers. Git provides many shorter, easier ways to run similar commands so that you can avoid large, complicated commit IDs. Usually the first seven characters of the hex numbers, as shown in the git log --oneline example earlier, are sufficient.

Removing and renaming files in your repository

Now that you have learned how to add files to a Git repository, let's look at how to remove a file from one. Removing a file from a Git repository is analogous to adding a file but uses the git rm command. Suppose you have the file adverts.xhtml in your website content and plan to remove it. You can do so as follows:

$ cd ~/my_website
$ ls
index.xhtml adverts.xhtml
$ git rm adverts.xhtml
rm 'adverts.xhtml'
$ git commit -m "Remove adverts html"
[main 97ff70a] Remove adverts html
1 file changed, 0 insertions(+), 0 deletions(-)
delete mode 100644 adverts.xhtml

Similar to an addition, a deletion also requires two steps: express your intent to remove the file using git rm, which also stages the file you intend to remove. Realize the change in the repository with a git commit.

Just as with git add, with git rm we are not directly deleting the file; instead, we are changing what is tracked: the deletion or addition of a file.

You can rename a file indirectly by using a combination of the git rm and git add commands, or you can rename the file more quickly and directly with the command git mv. Here's an example of the former:

$ mv foo.xhtml bar.xhtml
$ git rm foo.xhtmlrm 'foo.xhtml'
$ git add bar.xhtml

In this sequence, you must execute mv foo.xhtml bar.xhtml at the onset to prevent git rm from permanently deleting the foo.xhtml file from the filesystem.

Here's the same operation performed with git mv:

$ git mv foo.xhtml bar.xhtml

In either case, the staged changes must subsequently be committed:

$ git commit -m "Moved foo to bar"
[main d1e37c8] Moved foo to bar
1 file changed, 0 insertions(+), 0 deletions(-)
rename foo.xhtml => bar.xhtml (100%)

Git handles file move operations differently than most similar systems, employing a mechanism based on the similarity of the content between two file versions.

Storing Your Commit or Version in a Remote Repository

First you define a variable (origin in the example below) to point to your remote repository (https://github.com/proyecto-eden-3-esferas/Cpp in the example):

git remote add origin https://github.com/proyecto-eden-3-esferas/Cpp

Then you run

git push - -set-upstream origin main

or just:

git push origin main

Working with a Shared Repository

By now you have initialized a new repository and have been making changes to it. All the changes are only available to your local development machine. This is a good example of how you can manage a project that is only available to you. But how can you work collaboratively on a repository that is hosted on a Git server? Let's discuss how you can achieve this.

Making a local copy of the repository*

You can create a complete copy, or a clone, of a repository using the git clone command. This is how you collaborate with other people, making changes on the same files and keeping in sync with changes from other versions of the same repository.

For the purposes of this tutorial, let's start simple by creating a copy of your existing repository; then we can contrast the same example as if it were on a remote Git server:

$ cd ~
                  $ git clone my_website new_website

Although these two Git repositories now contain exactly the same objects, files, and directories, there are some subtle differences. You may want to explore those differences with commands such as the following:

$ ls -lsa my_website new_website
                  ...
                  $ diff -r my_website new_website
                  ...

On a local filesystem like this, using git clone to make a copy of a repository is quite similar to using cp -a or rsync. In contrast, if you were to clone the same repository from a Git server, the syntax would be as follows:

$ cd ~
                  $ git clone https://git-hosted-server.com/some-dir/my_website.git new_website
                  Cloning into 'new_website'...
                  remote: Enumerating objects: 2, done.
                  remote: Counting objects: 100% (2/2), done.
                  remote: Compressing objects: 100% (103/103), done.
                  remote: Total 125 (delta 45), reused 65 (delta 18), pack-reused 0
                  Receiving objects: 100% (125/125), 1.67 MiB | 4.03 MiB/s, done.
                  Resolving deltas: 100% (45/45), done.

Once you clone a repository, you can modify the cloned version, make new commits, inspect its logs and history, and so on. It is a complete repository with a full history. Remember that the changes you make to the cloned repository will not be automatically pushed to the original copy on the repository.

Try not to be distracted by some of the terms you see in the output. Git supports a richer set of repository sources, including network names, for naming the repository to be cloned.

Configuration Files

Git configuration files are all simple text files in the style of .ini files. The configuration files are used to store preferences and settings used by multiple git commands. Some of the settings represent personal preferences (e.g., should a color.pager be used?), others are important for a repository to function correctly (e.g., core repositoryformatversion), and still others tweak git command behavior a bit (e.g., gc.auto). Like other tools, Git supports a hierarchy of configuration files.

Hierarchy of configuration files

The Git configuration files hierarchy in decreasing precedence:

.git/config
Repository-specific configuration settings manipulated with the --file option or by default. You can also write to this file with the --local option. These settings have the highest precedence.
~/.gitconfig
User-specific configuration settings manipulated with the --global option.
/etc/gitconfig
System-wide configuration settings manipulated with the --system option if you have proper Unix file write permissions on the gitconfig file. These settings have the lowest precedence. Depending on your installation, the system settings file might be somewhere else (perhaps in usr/local/etc gitconfig) or may be absent entirely.

For example, to store an author name and email address that will be used on all the commits you make for all of your repositories, configure values for user name and user.email in your $HOME/.gitconfig file using git config --global:

$ git config --global user.name "Jon Loeliger"
                  $ git config --global user.email "jdl@example.com"

If you need to set a repository-specific name and email address that would override a --global setting, simply omit the --global flag or use the --local flag to be explicit:

$ git config user.name "Jon Loeliger"
                $ git config user.email "jdl@special-project.example.org"

You can use git config -l (or the long form --list) to list the settings of all the variables collectively found in the complete set of configuration files:

# Make a brand-new, empty repository
                  $ mkdir /tmp/new
                  $ cd /tmp/new
                  $ git init
                  # Set some config values
                  $ git config --global user.name "Jon Loeliger"
                  $ git config --global user.email "jdl@example.com"
                  $ git config user.email "jdl@special-project.example.org"
                  $ git config -l
                  user.name=Jon Loeliger
                  user.email=jdl@example.com
                  core.repositoryformatversion=0
                  core.filemode=true
                  core.bare=false
                  core.logallrefupdates=true
                  user.email=jdl@special-project.example.org
When specifying the command git config -l, adding the options --show-scope and --show-origin will help to print the various sources for the configurations! Try this out with git config -l --show-scope --show-origin in your terminal.

Because the configuration files are simple text files, you can view their contents with cat and edit them with your favorite text editor too.

The content of the configuration text file may be presented with some slight differences according to your operating system type. Many of these differences allow for different filesystem characteristics.

If you need to remove a setting from the configuration files, use the --unset option together with the correct configuration files flag:

$ git config --unset --global user.email

Git provides you with many configuration options and environment variables that frequently exist for the same purpose. For example, you can set a value for the editor to be used when composing a commit log message. Based on the configuration, invocation follows these steps:

  • GIT_EDITOR environment variable
  • core.editor configuration option
  • VISUAL environment variable
  • EDITOR environment variable
  • The vi command

There are more than a few hundred configuration parameters. We will not bore you with them but will point out important ones as we go along. A more extensive (yet still incomplete) list can be found on the git config manual page.

Configuring an alias

Git aliases allow you to substitute common but complex git commands that you type frequently with simple and easy-to-remember aliases. This also saves you the hassle of remembering or typing out those long commands, and it saves you from the frustration of running into typos:

$ git config --global alias.show-graph \
                  'log --graph --abbrev-commit --pretty=oneline'

In this example, we created the show-graph alias and made it available for use in any repository we create. When we use the command git show-graph, it will give us the same output we got when we typed that long git log command with all those options.

Comparisons to Git

There have been many different version control systems developed in the computing world, including SCCS, RCS, CVS, ixSubversion, BitKeeper, Mercurial, Bazaar, Darcs, and others. Some particular strengths of Git are:

  • Git is a member of the newer generation of distributed version control systems. Older systems such as CVS and Subversion are centralized, meaning that there is a single, central copy of the project content and history to which all users must refer. Typically accessed over a network, if the central copy is unavailable for some reason, all users are stuck; they cannot use version control until the central copy is working again. Distributed systems such as Git, on the other hand, have no inherent central copy. Each user has a complete, independent copy of the entire project history, called a repository, and full access to all version control facilities. Network access is only needed occasionally, to share sets of changes among people working on the same project.
  • In some systems, notably CVS and Subversion, branches are slow and difficult to use in practice, which discourages their use. Branches in Git, on the other hand, are very fast and easy to use. Effective branching and merging allows more people to work on a project in parallel, relying on Git to combine their separate contributions.
  • Applying changes to a repository is a two-step process: you add the changes to a staging area called the index, then commit those changes to the repository. The extra step allows you to easily apply just some of the changes in your current working files (including a subset of changes to a single file), rather than being forced to apply them all at once, or undoing some of those changes yourself before committing and then redoing them by hand. This encourages splitting changes up into better organized, more coherent and reusable sets.
  • Git's distributed nature and flexibility allow for many different styles of use, or workflows. Individuals can share work directly between their personal repositories. Groups can coordinate their work through a single central repository. Hybrid schemes permit several people to organize the contributions of others to different areas of a project, and then collaborate among themselves to maintain the overall project state.
  • Git is the technology behind the enormously popular social coding website GitHub, which includes many wellknown open source projects. In learning Git, you will open up a whole world of collaboration on small and large scales.

A Git Tutorial (www.w3schools.com/git)

This is a section borrowed from www.w3schools.com. It is a self-contained tutorial, whereas other sources do not provide all the instructions.

To learn our version of Git type:

git --version

What does Git do?

  • Manage projects with Repositories
  • Clone a project to work on a local copy
  • Control and track changes with Staging and Committing
  • Branch and Merge to allow for work on different parts and versions of a project
  • Pull the latest version of the project to a local copy
  • Push local updates to the main project

Working with Git

  • Initialize Git on a folder, making it a Repository
  • Git now creates a hidden folder to keep track of changes in that folder
  • When a file is changed, added or deleted, it is considered modified
  • You select the modified files you want to Stage (git add MYFILE.EXT)
  • The Staged files are Committed, which prompts Git to store a permanent snapshot of the files
  • Git allows you to see the full history of every commit.
  • You can revert back to any previous commit.
  • Git does not store a separate copy of every file in every commit, but keeps track of changes made in each commit!

Getting Started with Git

You can download Git for free from the following website: https://www.git-scm.com/

After checking our version of Git (git --version), you let Git know who you are. This is important for version control systems, as each Git commit uses this information:

git config --global user.name "MY-NAME MY-SURNAME"
git config --global user.email "ACCOUNT@PROVIDER.com"
Use global (--global) to set the username and e-mail for every repository on your computer. If you want to set the username/e-mail for just the current repo, you can remove global.

Now, let's create a new folder for our project. We'll assume a linux/unix command line.

Example:

mkdir myproject
cd myproject

Once you have navigated to the correct folder, you can initialize Git on that folder:

Example

$ git init
Initialized empty Git repository in /Users/user/myproject/.git/

You have just created your first Git Repository!

Git now knows that it should watch the folder you initiated it on. Git creates a hidden folder to keep track of changes.

Adding and Registering New Files

Let's add some files, or create a new file using your favourite text editor. Then save or move it to the folder you just created. Say your folder contains file index.html. We type git status and get:

On branch master

No commits yet

Untracked files:
  (use "git add ..." to include in what will be committed)
    index.html

nothing added to commit but untracked files present (use "git add" to track)

Now Git is aware of the file, but has not added it to our repository!

Files in your Git repository folder can be in one of 2 states:

  • tracked: files that Git knows about and are added to the repository
  • untracked: files that are in your working directory, but not added to the repository

When you first add files to an empty repository, they are all untracked. To get Git to track them, you need to stage them, or add them to the staging environment.

One of the core functions of Git is the concepts of the Staging Environment, and the Commit.

As you are working, you may be adding, editing and removing files. But whenever you hit a milestone or finish a part of the work, you should add the files to a Staging Environment.

Staged files are files that are ready to be committed to the repository you are working on. You will learn more about commit shortly.

For now, we are done working with index.html. So we can add it to the Staging Environment:

git add index.html

The file should be Staged. Let's check the status:

> git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached ..." to unstage)
    new file: index.html

You can add all files in the current directory to the Staging Environment:

git add --all

Git Commit

Since we have finished our work, we are ready move from stage to commit for our repo.

Adding commits keep track of our progress and changes as we work. Git considers each commit a change point or save point. It is a point in the project you can go back to if you find a bug, or want to make a change.

When we commit, we should always include a message.

By adding clear messages to each commit, it is easy for yourself (and others) to see what has changed and when.

Example:

> git commit -m "First release of Hello World!"

Sometimes, when you make small changes, using the staging environment seems like a waste of time. It is possible to commit changes directly, skipping the staging environment. The -a option will automatically stage every changed, already tracked file.

Example:

git status --short
 M index.html

Short status flags are:

  • ?? - Untracked files
  • A - Files added to stage
  • M - Modified files
  • D - Deleted files
> git commit -a -m "Updated index.html with a new line"
[master 09f4acd] Updated index.html with a new line
 1 file changed, 1 insertion(+)

Skipping the Staging Environment is not generally recommended.

Skipping the stage step can sometimes make you include unwanted changes.

To view the history of commits for a repository, you can use the git log command:

git log
commit 09f4acd3f8836b7f6fc44ad9e012f82faf861803 (HEAD -> master)
Author: w3schools-test
Date:   Fri Mar 26 09:35:54 2021 +0100

    Updated index.html with a new line

commit 221ec6e10aeedbfd02b85264087cd9adc18e4b26
Author: w3schools-test
Date:   Fri Mar 26 09:13:07 2021 +0100

    First release of Hello World!

Storing Your Commit or Version in a Remote Repository

First you define a variable (origin in the example below) to point to your remote repository (https://github.com/proyecto-eden-3-esferas/Cpp in the example):

git remote add origin https://github.com/proyecto-eden-3-esferas/Cpp

Then you run

git push - -set-upstream origin main

or just:

git push origin main

Git Help

If you are having trouble remembering commands or options for commands, you can use Git help.

There are a couple of different ways you can use the help command in command line:

  • git <command> -help: See all the available options for the specific command
  • git help --all: See all possible commands

Git Branch

In Git, a branch is a new/separate version of the main repository.

Let's say you have a large project, and you need to update the design on it.

How would that work without and with Git:

Without Git:

  • Make copies of all the relevant files to avoid impacting the live version
  • Start working with the design and find that code depend on code in other files, that also need to be changed!
  • Make copies of the dependant files as well. Making sure that every file dependency references the correct file name
  • EMERGENCY! There is an unrelated error somewhere else in the project that needs to be fixed ASAP!
  • Save all your files, making a note of the names of the copies you were working on
  • Work on the unrelated error and update the code to fix it
  • Go back to the design, and finish the work there
  • Copy the code or rename the files, so the updated design is on the live version
  • (2 weeks later, you realize that the unrelated error was not fixed in the new design version because you copied the files before the fix)

With Git:

  • With a new branch called new-design, edit the code directly without impacting the main branch
  • EMERGENCY! There is an unrelated error somewhere else in the project that needs to be fixed ASAP!
  • Create a new branch from the main project called small-error-fix
  • Fix the unrelated error and merge the small-error-fix branch with the main branch
  • You go back to the new-design branch, and finish the work there
  • Merge the new-design branch with main (getting alerted to the small error fix that you were missing)

Branches allow you to work on different parts of a project without impacting the main branch.

When the work is complete, a branch can be merged with the main project.

You can even switch between branches and work on different projects without them interfering with each other.

Branching in Git is very lightweight and fast!

Scenario

Let us add some new features to ... We are working in our local repository, and we do not want to disturb or possibly wreck the main project. So we create a new branch:

git branch hello-world-images

We have now created a new branch called hello-world-images.

Let's confirm that we have created a new branch:

git branch
  hello-world-images
* master

We can see the new branch with the name hello-world-images, but the * beside master specifies that we are currently on that branch.

To check out a branch use the checkout command, which moves us from the current branch, to the one specified at the end of the command:

git checkout hello-world-images
Switched to branch 'hello-world-images'

Now we have moved our current workspace from the master branch, to the new branch. Now open your favourite editor and make some changes. Also, add an image (img_hello_world.jpg). You are in the working directory (same directory as the main branch).

Now check the status of the current branch:

git status
On branch hello-world-images
Changes not staged for commit:
  (use "git add ..." to update what will be committed)
  (use "git restore ..." to discard changes in working directory)
        modified:   index.html

Untracked files:
  (use "git add ..." to include in what will be committed)
        img_hello_world.jpg

no changes added to commit (use "git add" and/or "git commit -a")

So let's go through what happens here:

  • There are changes to our index.html, but the file is not staged for commit
  • img_hello_world.jpg is not tracked

So we need to add both files to the Staging Environment for this branch. As we want to stage all files we type git add --all.

Next, if we are happy with our changes, we commit them to the branch:

git commit -m "Added image to Hello World"

Now we have a new branch, that is different from the master branch.

Using the -b option on checkout will create a new branch, and move to it, if it does not exist.

We are currently on the branch hello-world-images. We added an image to this branch and edited a file.

Now, let's see what happens when we change branch to master

git checkout master
Switched to branch 'master'

The new image is not a part of this branch. List the files in the current directory again (ls). And the file we edited has reverted to what it was before the alteration.

[...]

Now we have a fix ready for master, and we need to merge the two branches.

Git Branch Merge

We have the emergency fix ready, and so let's merge the master and emergency-fix branches.

First, we need to change to the master branch:

git checkout master

Now we merge the current branch (master) with emergency-fix:

git merge emergency-fix
Updating 09f4acd..dfa79db
Fast-forward
 index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Since the emergency-fix branch came directly from master, and no other changes had been made to master while we were working, Git sees this as a continuation of master. So it can Fast-forward, just pointing both master and emergency-fix to the same commit.

As master and emergency-fix are essentially the same now, we can delete emergency-fix, as it is no longer needed:

git branch -d emergency-fix

Now we can move over to hello-world-images and keep working. Add another image file (img_hello_git.jpg) and change index.html, so it shows it:

git checkout hello-world-images
Switched to branch 'hello-world-images'

Now, we are done with our work here and can stage and commit for this branch:

git add --all
git commit -m "added new image"

We see that index.html has been changed in both branches. Now we are ready to merge hello-world-images into master. But what will happen to the changes we recently made in master?

git checkout master
git merge hello-world-images
Auto-merging index.html
CONFLICT (content): Merge conflict in index.html
Automatic merge failed; fix conflicts and then commit the result.

The merge failed, as there is conflict between the versions for index.html. Let us check the status:

git status
On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")
  (use "git merge --abort" to abort the merge)

Changes to be committed:
        new file:   img_hello_git.jpg
        new file:   img_hello_world.jpg

Unmerged paths:
  (use "git add ..." to mark resolution)
        both modified:   index.html

This confirms there is a conflict in index.html, but the image files are ready and staged to be committed.

So we edit index.html and then we can stage it and check the status:

git add index.html
git status
On branch master
All conflicts fixed but you are still merging.
  (use "git commit" to conclude merge)

Changes to be committed:
        new file:   img_hello_git.jpg
        new file:   img_hello_world.jpg
        modified:   index.html

The conflict has been fixed, and we can use commit to conclude the merge:

git commit -m "merged with hello-world-images after fixing conflicts"
[master e0b6038] merged with hello-world-images after fixing conflicts

And delete the hello-world-images branch:

git branch -d hello-world-images
Deleted branch hello-world-images (was 1f1584e).

Now you have a better understanding of how branches and merging works. Time to start working with a remote repository!

Git Advanced (www.w3schools.com/git)*

***

            

            

            

            

            

            

          

Foundational Concepts

Here we shall explore why and how Git differs from nother version control systems such as SVN or CVS by examining the key components of its architecture and some important concepts. We will focus on the basics, common terminology, and the relationship between Git objects and how they are utilized, all through the lens of a single repository. The fundamentals you'll learn in this chapter also apply when you're working with multiple interconnected repositories.

Repositories*

A Git repository is simply a key–value pair database containing all the information needed to retain and manage the revisions and history of files in a project. A Git repository retains a complete copy of the entire project throughout its lifetime. However, unlike most other version control systems, the Git repository provides not only a complete working copy of all the files stored in the project but also a copy of the repository (key–value pair database) itself with which to work.

Visualizing the Git Object Store*

Now that we know how Git efficiently stores its objects, let's discuss how Git objects fit and work together to form a complete system:

The blob object is at the bottom of the data structure; it references no other Git objects and is referenced only by tree objects. It can be considered a leaf node in relation to the tree object.

Tree objects point to blobs and possibly to other trees as well. Any given tree object might be pointed at by many different commit objects. Each tree is represented by a triangle. We will learn how a tree object can also point to a commit object, but for now, we will keep it simple.

A commit points to one particular tree that is introduced into the repository by the commit.

Each tag is represented by a parallelogram. Each tag can point to, at most, one commit.

The branch is not a fundamental Git object, yet it plays a crucial role in naming commits. Each branch is pictured as a rectangle...

Git Internals: Concepts at Work*

With some tenets out of the way, let's peek under the hood and see how all these concepts fit together in a Git repository. We will start by creating a new repository and inspecting the internal files and object store in much greater detail. We'll do this by starting at the bottom of Git's data structure and working our way up in the object store.

Before we go any further, it is important to know that Git has a few categories of commands to implement its inner mechanics. To get a detailed, categorized list of all the commands, type git help -a in your terminal. Git commands are categorized as follows:

  • Main porcelain commands (high-level commands for routine Git operations)
  • Ancillary commands (commands that help query Git’s internal data store)
  • Low-level commands (plumbing commands for internal Git operations)
  • External commands (commands that extend the standard Git operations)
  • Commands that act as a bridge with a selected version control tool (interacting with other commands)
  • Command aliases (custom aliases created by users to mask complex Git commands)

Branches*

A branch allows the user to launch a separate line of development within a software project. When you create a branch, you are creating a fork from a specific state of the project's timeline. This allows development to progress in multiple directions simultaneously. Think of it as time travel, where you have the ability to create alternate parallel timelines from a single starting point. A branch also gives you the ability to create different versions of a project. Often, a branch can be reconciled and merged with other branches to combine divergent efforts.

Creating branches in Git is considered a lightweight and inexpensive operation. This is because a branch is just a pointer to a specific commit object in a Git repository. Git allows many branches, and thus many different lines of development within a repository can exist simultaneously at any given moment. Moreover, Git has first-rate support for merges between branches. As a result, most Git users make routine use of branches and are naturally encouraged to do so frequently. In this section, we will take a top-down approach to thinking about how branches function in Git by looking at how developers maintain multiple lines of development within a project. The concept discussed here will complement the key takeaways we learned before. We will show you how to list, view, select, create, and discard branches. We will also provide some best practices so that your branches don't disrupt the fabric of time and the existence of parallel timelines.

Four important areas when working with Git

the repository, the working directory, the staging area, and the commit history

The repository

A repository is how we refer to a project version controlled by Git. In reality, a repository is represented by a hidden directory called .git that exists within a project directory and it contains all the data on the changes that have been made to the files in a project. To turn a project directory into a Git repository we have to initialize the repository. Initializing a repository simply means creating a repository.

When we initialize our repository the .git directory is going to be created inside our project directory. Because the .git directory is a hidden directory we won't be able to see it unless we explicitly make hidden files visible.

You should never touch any files or folders inside your .git directory. Doing this could have undesirable consequences on your repository. You should never delete this directory unless you want to delete your repository.

The Working Directory

The working directory is represented by the contents of our project directory (not including the .git directory). It is sort of like a workbench. It contains all the files and folders of one version of our project. And it is where we add, edit, and delete those files and folders. The working directory is really where we make all the modifications to the content of a project.

Even though we make this distinction between the working directory and the repository, it is important to know that when people refer to a project version controlled with Git they will refer to the project folder as the repository.

Within a repository there are two important areas we want to explore further, the staging area and the commit history.

The Staging Area

The staging area is similar to a rough draft space. It is where we can add and remove files, when we are preparing what we want to include in the next saved version of our project (our next commit), in order to be able to explicitly craft what will be included.

The staging area is represented by a file in the .git directory called index. The index file is only created if we have added at least one file to the staging area in our project. It is a binary file therefore the actual contents are not easily understandable. For our purposes, we only need to understand that it represents the staging area.

The Commit History

A commit in Git is basically one version of a project. Every commit has a commit hash. A commit hash is a unique 40 character hash, which consists of 40 letters and numbers and acts like a name for a commit, or a way to refer to it.

We only really need the first seven characters of a commit hash in order to refer to a coInside your rainbow project directory, look inside the .git directory and find the objects For example, if we have a commit has that is 51dc6ecb327578cca508abba4a56e8c18f3835e1, then to refer to this commit we could also just use 51dc6ec.

Every commit represents a snapshot of a project, in other words, a standalone version of a project that contains a reference to all the files that are part of that commit.

The commit history is basically where we can think of our commits existing. We can think of the commit history as being represented by the objects directory in our .git directory.

An untracked file is a file in the working directory that Git is not version controlling. It has never been added to the staging area and it has never been included in a commit therefore it is not part of the repository.

On the other hand, once we add a file to the staging area and Git becomes aware of it, it becomes a tracked file. A tracked file is a file that is version controlled or in other words that Git knows about. Every new file in a project version controlled by Git needs to be explicitly added to the staging area and then included in a commit in order to become a tracked file.

The Two Steps to Making a Commit

  • Add all the files you want to include in the next commit to the staging area: git add <filename>
  • Make a commit with a commit message: git commit -m "<message>"

Listing our Commits

To see a list of commits in our commit history we can use the git log command.The git log command lists the commits in our repository in reverse chronological order. It displays four pieces of information about each commit:

  • Commit hash
  • Author name and email
  • Date
  • Commit message

Sharing Work with Git

By separating committing and publishing, and giving you tools with which to edit and reorganize existing commits, Git encourages better use of version control overall.

With Git, sharing work between repositories happens via operations called push and pull: you pull changes from a remote repository and push changes to it. To work on a project, you clone it from an existing repository, possibly over a network via protocols such as HTTP and SSH. Your clone is a full copy of the original, including all project history, completely functional on its own. In particular, you do not need to contact the first repository again in order to examine the history of your clone or commit to it—however, your new repository does retain a reference to the original one, called a remote. This reference includes the state of the branches in the remote as of the last time you pulled from it; these are called remote tracking branches. If the original repository contains two branches named master and topic, their remote-tracking branches in your clone appear qualified with the name of the remote (by default called origin): origin/ master and origin/topic.

Most often, the master branch will be automatically checked out for you when you first clone the repository; Git initially checks out whatever the current branch is in the remote repository. If you later ask to check out the topic branch, Git sees that there isn't yet a local branch with that name—but since there is a remote-tracking branch named origin/topic, it automatically creates a branch named topic and sets origin/topic as its upstream branch. This relationship causes the push/pull mechanism to keep the changes made to these branches in sync as they evolve in both your repository and in the remote.

When you pull, Git updates the remote-tracking branches with the current state of the origin repository; conversely, when you push, it updates the remote with any changes you've made to corresponding local branches. If these changes conflict, Git prompts you to merge the changes before accepting or sending them, so that neither side loses any history in the process.

If you're familiar with CVS or Subversion, a useful conceptual shift is to consider that a commit in those systems is analogous to a Git push. You still commit in Git, of course, but that affects only your repository and is not visible to anyone else until you push those commits—and you are free to edit, reorganize, or delete your commits until you do so.

The Object Store

Now, we shall discuss the ideas just introduced in more detail, starting with the heart of a Git repository: its object store. This is a database that holds just four kinds of items: blobs, trees, commits, and tags.

The blob object is at the bottom of the data structure; it references no other Git objects and is referenced only by tree objects. It can be considered a leaf node in relation to the tree object. [In the figures that follow,] each blob is represented by a rectangle.

Tree objects point to blobs and possibly to other trees as well. Any given tree object might be pointed at by many different commit objects. Each tree is represented by a triangle. (A tree object can also point to a commit object, but for now, we will keep it simple.)

A circle represents a commit. A commit points to one particular tree that is introduced into the repository by the commit.

Each tag is represented by a parallelogram. Each tag can point to, at most, one commit.

Blob

A blob is an opaque chunk of data, a string of bytes with no further internal structure as far as Git is concerned. The content of a file under version control is represented as a blob. This does not mean the implementation of blobs is naive; Git uses sophisticated compression and transmission techniques to handle blobs efficiently.

Every version of a file in Git is represented as a whole, with its own blob containing the file's complete contents. This stands in contrast to some other systems, in which file versions are represented as a series of differences from one revision to the next, starting with a base version. Various trade-offs stem from this design point. One is that Git may use more storage space; on the other hand, it does not have to reconstruct files to retrieve them by applying layers of differences, so it can be faster. This design increases reliability by increasing redundancy: corruption of one blob affects only that file version, whereas corruption of a difference affects all versions coming after that one.

Tree

A Git tree, by itself, is actually what one might usually think of as one level of a tree: it represents a single level of directory structure in the repository content. It contains a list of items, each of which has:

  • A filename and associated information that Git tracks, such as its Unix permissions (mode bits) and file type; Git can handle Unix symbolic links as well as regular files.
  • A pointer to another object. If that object is a blob, then this item represents a file; if it's another tree, a directory.

There is an ambiguity here: when we say tree, do we mean a single object as just described, or the collection of all such objects reachable from it by following the pointers recursively until we reach the terminal blobs—that is, a tree in the more usual sense? It is the latter notion of tree that this data structure is used to represent, of course, and fortunately, it is seldom necessary in practice to make the distinction. When we say tree, we will normally mean the entire hierarchy of tree and blob objects; when necessary, we will use the phrase tree object to refer to the specific, individual data structure component.

A Git tree, then, represents a portion of the repository content at one point in time: a snapshot of a particular directory's content, including that of all directories beneath it.

Originally, Git saved and restored the full permissions on files (all the mode bits). Later, however, this was deemed to cause more trouble than it was worth, so the interpretation of the mode bits in the index was changed. Now, the only valid values for the low 12 bits of the mode as stored in Git are octal 755 and 644, and these simply indicate that the file should be executable or not. Git sets the execute bits on a file on checkout according to this, but the actual mode value may be different depending on your umask setting; for example, if your umask is 0077, then a file stored with Git mode 755 will end up with mode 700.

Commit

A version control system manages content changes, and the commit is the fundamental unit of change in Git. A commit is a snapshot of the entire repository content, together with identifying information, and the relationship of this historical repository state to other recorded states as the content has evolved over time. Specifically, a commit consists of:

  • A pointer to a tree containing the complete state of the repository content at one point in time.
  • Ancillary information about this change: who was responsible for the content (the author); who introduced the change into the repository (the committer); and the time and date for both those things. The act of adding a commit object to the repository is called making a commit, or committing (to the repository).
  • A list of zero or more other commit objects, called the parents of this commit. The parent relationship has no intrinsic meaning; however, the normal ways of making a commit are meant to indicate that the commit's repository state was derived by the author from those of its parents in some meaningful way (e.g., by adding a feature or fixing a bug). A chain of commits, each having a single parent, indicates a simple evolution of repository state by discrete steps (and as we'll see, this constitutes a branch). When a commit has more than one parent, this indicates a merge, in which the committer has incorporated the changes from multiple lines of development into a single commit. We'll define branches and merges more precisely in a moment.

Of course, at least one commit in the repository must have zero parents, or else the repository would either be infinitely large or have loops in the commit graph, which is not allowed (see the description of a DAG next). This is called a root commit, and most often, there is only one root commit in a repository—the initial one created when the repository was started. However, you can introduce multiple root commits if you want; the command git checkout --orphan does this. This incorporates multiple independent histories into a repository, perhaps in order to collect the contents of previously separate projects (see Importing Disconnected History).

Author versus Committer

The separate author and committer information—name, email address, and timestamp—reflect the creation of the commit content and its addition to the repository, respectively. These are initially the same, but may later become distinct with the use of certain Git commands. For example, git cherry-pick replicates an existing commit by reapplying the changes introduced by that commit in another context. Cherry-picking carries forward the author information from the original commit, while adding new committer information. This preserves the identification and origin date of the changes, while indicating that they were applied at another point in the repository at a later date, possibly by a different person. A bugfix cherry-picked from one repository to another might look like this:

$ git log --format=fuller
commit d404534d
Author:
Eustace Maushaven <eustace@qoxp.net>
AuthorDate: Thu Nov 29 01:58:13 2012 -0500
Commit:
Richard E. Silverman <res@mlitg.com>
CommitDate: Tue Feb 26 17:01:33 2013 -0500

  Fix spin-loop bug in k5_sendto_kdc

  In the second part of the first pass over the
  server list, we passed the wrong list pointer to
  service_fds, causing it to see only a subset of
  the server entries corresponding to sel_state.
  This could cause service_fds to spin if an event
  is reported on an fd not in the subset.

  ---
  cherry-picked from upstream by res
  upstream commit 2b06a22f7fd8ec01fb27a7335125290b8...

Other operations that do this are git rebase and git filter-branch; like git cherry-pick, they too create new commits based on existing ones.

Cryptographic Signature

A commit may also be signed using GnuPG, with:

$ git commit --gpg-sign[=keyid]

A cryptographic signature binds the commit to a particular real-world personal identity attached to the key used for signing; it verifies that the commit's contents are the same now as they were when that person signed it. The meaning of the signature, though, is a matter of interpretation. If I sign a commit, it might mean that I glanced at the diff; verified that the software builds; ran a test suite; prayed to Cthulhu for a bug-free release; or did none of these. Aside from being a convention among the users of the repository, I can also put the intention of my signature in the commit message; presumably, I will not sign a commit without at least reading its message.

Tag

A tag serves to distinguish a particular commit by giving it a human-readable name in a namespace reserved for this purpose. Otherwise, commits are in a sense anonymous, normally referred to only by their position along some branch, which changes with time as the branch evolves (and may even disappear if the branch is later deleted). The tag content consists of the name of the person making the tag, a timestamp, a reference to the commit being tagged, and free-form text similar to a commit message.

A tag can have any meaning you like; often, it identifies a particular software release, with a name like coolutil-1.0-rc2 and a suitable message. You can cryptographically sign a tag just as you can a commit, in order to verify the tag's authenticity.

There are actually two kinds of tags in Git: lightweight and annotated. This section refers to annotated tags, which are represented as a separate kind of object in the repository database. A lightweight tag is entirely different; it is simply a name pointing directly to a commit (see the upcoming section on refs to understand how such names work generally).

Object IDs and SHA-1

A fundamental design element of Git is that the object store uses content-based addressing. Some other systems assign identifiers to their equivalent of commits that are relative to one another in some way, and reflect the order in which commits were made. For example, file revisions in CVS are dotted strings of numbers such as 2.17.1.3, in which (usually) the numbers are simply counters: they increment as you make changes or add branches. This means that there is no instrinsic relationship between a revision and its identifier; revision 2.17.1.3 in someone else's CVS repos‐ itory, if it exists, will almost certainly be different from yours.

Git, on the other hand, assigns object identifiers based on an object's contents, rather than on its relationship to other objects, using a mathematical technique called a hash function. A hash function takes an arbitrary block of data and produces a sort of fingerprint for it. The particular hash function Git uses, called SHA-1, produces a 160-bit fixed-length value for any data object you feed it, no matter how large.

The usefulness of hash-based object identifiers in Git depends on treating the SHA-1 hash of an object as unique; we assume that if two objects have the same SHA-1 fingerprint, then they are in fact the same object. From this property flow a number of key points:

Single-instance store
Git never stores more than one copy of a file. It can't—if you add a second copy of the file, it will hash the file contents to find its SHA-1 object ID, look in the database, and find that it's already there. This is also a consequence of the separation of a file's contents from its name. Trees map filenames onto blobs in a separate step, to determine the contents of a particular filename at any given commit, but Git does not consider the name or other properties of a file when storing it, only its contents.
Efficient comparisons
As part of managing change, Git is constantly comparing things: files against other files, changed files against existing commits, as well as one commit against another. It compares whole repository states, which might encompass hundreds or thousands of files, but it does so with great efficiency because of hashing. When comparing two trees, for example, if it finds that two subtrees have the same ID, it can immediately stop comparing those portions of the trees, no matter how many layers of directories and files might remain. Why? We said earlier that a tree object contains pointers to its child objects, either blobs or other trees. Well, those pointers are the objects' SHA-1 IDs. If two trees have the same ID, then they have the same contents, which means they must contain the same child object IDs, which means that in turn those objects must also be the same! Inductively, we see immediately that in fact, the entire contents of the two trees must be identical, if the uniqueness property assumed previously holds.
Database sharing
Git repositories can share their object databases at any level with impunity because there can be no aliasing; the binding between an ID and the content to which it refers is immutable. One repository cannot mess up another's object store by changing the data out from under it; in that sense, an object store can only be expanded, not changed. We do still have to worry about removing objects that another database is using, but that's a much easier problem to solve

Much of the power of Git stems from content-based addressing —but if you think for a moment, it's based on a lie! We are claiming that the SHA-1 hash of a data object is unique, but that's mathematically impossible: because the hash function output has a fixed length of 160 bits, there are exactly 2 160 IDs—but infinitely many potential data objects to hash. There have to be duplications, called hash collisions. The whole system appears fatally flawed.

The solution to this problem lies in what constitutes a good hash function, and the odd-sounding notion that while SHA-1 cannot be mathematically collision-free, it is what we might call effectively so. For the practical purposes of Git, I'm not necessarily concerned if there are in fact other files that might have the same ID as one of mine; what really matters is whether any of those files are at all likely to ever appear in my project, or in anyone else's. Maybe all the other files are over 10 trillion bytes long, or will never match any program or text in any programming, object, or natural language ever invented by humanity. This is exactly a property (among others) that researchers endeavor to build into hash functions: the relationship between changes in the input and output is extremely sensitive and wildly unpredictable. Changing a single bit in a file causes its SHA-1 hash to change radically, and flipping a different bit in that file, or the same bit in a different file, will scramble the hash in a way that has no recognizable relationship to the other changes. Thus, it is not that SHA-1 hash collisions cannot happen—it is just that we believe them to be so astronomically unlikely in practice that we simply don't care.

Of course, discussing precise mathematical topics in general terms is fraught with hazard; this description is intended to communicate the essence of why we rely upon SHA-1 to do its job, not to prove anything rigorously or even to give justification for these claims.

Security

SHA-1 stands for Secure Hash Algorithm 1, and its name reflects the fact that it was designed for use in cryptography. Hashing is a basic technique in computer science, with applications to many areas besides security, including signal processing, searching and sorting algorithms, and networking hardware. A cryptographically secure hash function like SHA-1 has related but distinct properties to those already mentioned with respect to Git; it is not just extraordinarily unlikely that two distinct trees arising in practice will produce the same commit ID, but it should also be effectively impossible for someone to deliberately find two such trees, or to find a second tree with the same ID as a given one. These features make a hash function useful in security as well as for more general purposes, since with them it can defend against deliberate tampering as well as ordinary or accidental changes to data.

Because SHA-1 is a cryptographic hash function, Git inherits certain security properties from its use of SHA-1 as well as operational ones. If I tag a particular commit of security-sensitive software, it is not feasible for an attacker to substitute a commit with the same ID in which he has embedded a backdoor; as long as I record the commit ID securely and compare it correctly, the repository is tamper proof in this regard. As explained earlier, the chained use of SHA-1 causes the tag's ID to cover the entire content of the tagged commit's tree. The addition of GnuPG digital signatures allows individuals to vouch for the contents of entire repository states and history, in a way that is impractical to forge.

Cryptographic research is always ongoing, though, and computing power increases every year; other hash functions such as MD5 that were once considered secure have been deprecated due to such advances. We have developed more secure versions of SHA itself, in fact, and as of this writing in early 2013, serious weaknesses in SHA-1 have recently been discovered. The criteria used to appraise hash functions for cryptographic use are very conservative, so these weaknesses are more theoretical than practical at the moment, but they are meaningful nonetheless. The good news is that further cryptographic breaks of SHA-1 will not affect the usefulness of Git as a version control system per se; that is, make it more likely in practice that Git will treat distinct commits as identical (that would be disastrous). They will affect the security properties Git enjoys as a result of using SHA-1, but those, while important, are critical to a smaller number of people (and those security goals can mostly be met in other ways if need be). In any case, it will be possible to switch Git to using a different hash function when it becomes necessary—and given the current state of research, it would probably be wise to do that sooner rather than later.

Where Objects Live

In a Git repository, objects are stored under .git/objects. They may be stored individually as loose objects, one per file with pathnames built from their object IDs:

$ find .git/objects -type f
.git/objects/08/5cf6be546e0b950e0cf7c530bdc78a6d5a78db
.git/objects/0d/55bed3a35cf47eefff69beadce1213b1f64c39
.git/objects/19/38cbe70ea103d7185a3831fd1f12db8c3ae2d3
.git/objects/1a/473cac853e6fc917724dfc6cbdf5a7479c1728
.git/objects/20/5f6b799e7d5c2524468ca006a0131aa57ecce7
...

They may also be collected into more compact data structures called packs, which appear as paired .idx and .pack files:

$ ls .git/objects/pack/
pack-a18ec63201e3a5ac58704460b0dc7b30e4c05418.idx
pack-a18ec63201e3a5ac58704460b0dc7b30e4c05418.pack

Git automatically rearranges the object store over time to improve performance; for example, when it sees that there are many loose objects, it automatically coalesces them into packs (though you can do this by hand; see git-repack(1)). Don't assume that objects will be represented in any particular way; always use Git commands to access the object database, rather than digging around in .git yourself.

The Commit Graph

The collection of all commits in a repository forms what in mathematics is called a graph: visually, a set of objects with lines drawn between some pairs of them.

In Git, the lines represent the commit parent relationship previously explained, and this structure is called the commit graph of the repository.

Because of the way Git works, there is some extra structure to this graph: the lines can be drawn with arrows pointing in one direction because a commit refers to its parent, but not the other way around (we'll see later the necessity and significance of this). Again using a mathematical term, this makes the graph directed.

The commit graph might be a simple linear history, as shown in the following figure:

A linear commit graph

Or a complex picture involving many branches and merges:

A more complex commit graph

Git, by design, will not ever produce a graph that contains a loop; that is, a way to follow the arrows from one commit to another so that you arrive at the same commit twice (think what that could possibly mean in terms of a history of changes!). This is called being acyclic: not having a cycle, or loop. Thus the commit graph is technically a directed acyclic graph, or DAG for short.

Refs

Git defines two kinds of references, or named pointers, which it calls refs:

  • A simple ref, which points directly to an object ID (usually a commit or tag)
  • symbolic ref (or symref), which points to another ref (either simple or symbolic)

These are analogous to hard links and symbolic links in a Unix filesystem.

Git uses refs to name things, including commits, branches, and tags. Refs inhabit a hierarchical namespace separated by slashes (as with Unix filenames), starting at refs/. A new repository has at least refs/tags/ and refs/heads/, to hold the names of tags and local branches, respectively.

There is also refs/remotes/, holding names referring to other repositories; these contain beneath them the ref namespaces of those repositories, and are used in push and pull operations. For example, when you clone a repository, Git creates a remote named origin referring to the source repository.

There are various defaults, which means that you don't often have to refer to a ref by its full name; for example, in branch operations, Git implicitly looks in refs/heads/ for the name you give.

Related Commands

Related Commands

These are low-level commands that directly display, change, or delete refs. You don't ordinarily need these, as Git usually handles refs automatically as part of dealing with the objects they represent, such as branches and tags. If you change refs directly, be sure you know what you're doing!

git show-ref
Display refs and the objects to which they refer
git symbolic-ref
Deals with symbolic refs specifically
git update-ref
Change the value of a ref
git for-each-ref
Apply an action to a set of refs
Refs often live in corresponding files and directories under .git/refs; however, don't get in the habit of looking for or changing them directly there, since there are cases in which they are stored elsewhere (in packs, in fact, as with objects), and changing one might involve other operations you don't know about. Always use Git commands to manipulate refs.

Branches

A Git branch is the simplest thing possible: a pointer to a commit, as a ref. Or rather, that is its implementation; the branch itself is defined as all points reachable in the commit graph from the named commit (the tip of the branch).

The special ref HEAD determines what branch you are on; if HEAD is a symbolic ref for an existing branch, then you are on that branch. If, on the other hand, HEAD is a simple ref directly naming a commit by its SHA-1 ID, then you are not on any branch, but rather in detached HEAD mode, which happens when you check out some earlier commit to examine. Let's see:

# HEAD points to the master branch
$ git symbolic-ref HEAD
refs/heads/master

# Git agrees; I'm on the master branch.
$ git branch
* master

# Check out a tagged commit, not at a branch tip.
$ git checkout mytag
Note: checking out 'mytag'.
You are in 'detached HEAD' state...

# Confirmed: HEAD is no longer a symbolic ref.
$ git symbolic-ref HEAD
fatal: ref HEAD is not a symbolic ref

# What is it? A commit ID...
$ git rev-parse HEAD
1c7ed724236402d7426606b03ee38f34c662be27

# ... which matches the commit referred to by the
# tag.
$ git rev-parse mytag^{commit}
1c7ed724236402d7426606b03ee38f34c662be27

# Git agrees; we're not on any branch.
$ git branch
* (no branch)
master

The HEAD commit is also often referred to as the current commit. If you are on a branch, it may also be called the last or tip commit of the branch.

A branch evolves over time; thus, if you are on the branch master and make a commit, Git does the following:

  • Creates a new commit with your changes to the repository content
  • Makes the commit at the current tip of the master branch the parent of the new commit
  • Adds the new commit to the object store
  • Changes the master branch (specifically, the ref refs/heads/master) to point to the new commit

In other words, Git adds the new commit to the end of the branch using the commit's parent pointer, and advances the branch ref to the new commit.

Note a few consequences of this model:

  • Considered individually, a commit is not intrinsically a part of any branch. There is nothing in the commit itself to tell you by name which branches it is or may once have been on; branch membership is a consequence of the commit graph and the current branch pointers.
  • Deleting a branch means simply deleting the corresponding ref; it has no immediate effect on the object store. In particular, deleting a branch does not delete any commits. What it may do, however, is make certain commits uninteresting, in that they are no longer on any branch (that is, no longer reachable in the commit graph from any branch tip or tag). If this state persists, Git will eventually remove such commits from the object store as part of garbage collection. Until that happens, though, if you have an abandoned commit's ID you can still directly access it perfectly well by its SHA-1 name; the Git reflog (git log -g) is useful in this regard.
  • By this definition, a branch can include more than just commits made while on that branch; it also contains commits from branches that flow into this one via an earlier merge. For example: here, the branch topic was merged into master at commit C, then both branches continued to evolve separately, as shown in the figure below.

A simple merge

At this point, git log on the master branch shows not only commits A through D as you would expect, but also commits 1 and 2, since they are also reachable from D via C. This may be surprising, but it's just a different way of defining the idea of a branch: as the set of all commits that contributed content to the latest commit. You can generally get the effect of looking only at the history of this branch—even though that's not really well defined —with git log --first-parent.

The Index

The Git index often seems a bit mysterious to people: some invisible, ineffable place where changes are staged until they're committed. The talk about staging changes in the index also suggests that it holds only changes, as if it were a collection of diffs waiting to be applied. The truth is different and quite simple, and critical to grasp in order to understand Git well. The index is an independent data structure, separate from both your working tree and from any commit. It is simply a list of file pathnames together with associated attributes, usually including the ID of a blob in the object database holding the data for a version of that file. You can see the current contents of the index with git ls-files:

$ git ls-files --abbrev --stage
100644 2830ea0b 0       TODO
100644 a4d2acee 0       VERSION
100644 ce30ff91 0       acinclude.m4
100644 236d5f93 0       configure.ac
...

The --stage option means to show just the index; git ls-files can show various combinations and subsets of the index and your working tree, generally. If you were to delete or change any of the listed files in your working tree, this would not affect the output of this command at all; it's not looking at them.

Key facts about the index:

  • The index is the implicit source of the content for a normal commit. When you use git commit (without supplying specific pathnames), you might think that it creates the new commit based on your working files. It does not; instead, it simply realizes the current index as a new tree object, and makes the new commit from that. This is why you need to stage a changed file in the index with git add in order for it to be part of the next commit.
  • The index does not just contain changes to be made on the next commit; it is the next commit, a complete catalog of the files that will be included in the tree of the next commit (recall that each commit refers to a tree object that is a complete snapshot of the repository content). When you check out a branch, Git resets the index to match the tip commit of that branch; you then modify the index with commands such as git add/mv/rm to indicate changes to be part of the next commit.
  • git add does not just note in the index that a file has changed; it actually adds the current file content to the object database as a new blob, and updates the index entry for that file to refer to that blob. This is why git commit is always fast, even if you're making lots of changes: all the actual data has already been stored by preceding git add commands.

    An implication of this behavior that occasionally confuses people is that if you change a file, git add it, then change it again, it is the version you last added to the index, not the one in your working tree, that is part of the next commit. git status shows this explicitly, by listing the same file as having both changes to be committed and changes not staged for commit.

  • Similar to git commit, git diff without arguments also has the index as an implicit operand; it shows the differences between your working tree and the index, rather than the current commit. Initially these are the same, as the index matches the last commit after a clean checkout or commit. As you make changes to your working files, these show up in the output of git diff, then disappear as you add the corresponding files. The idea is that git diff shows changes not yet staged for commit, so you can see what you have yet to deal with (or have deliberately not included) as you prepare the next commit. git diff --staged shows the opposite: the differences between the index and the current commit (that is, the changes that are about to be committed).

Merging

Merging is the complement of branching in version control: a branch allows you to work simultaneously with others on a particular set of files, whereas a merge allows you to later combine separate work on two or more branches that diverged earlier from a common ancestor commit. Here are two common merge scenarios:

  • You are working by yourself on a software project. You decide to explore refactoring your code in a certain way, so you make a branch named refactor off of the master branch. You can make any changes you like on the refactor branch without disturbing the main line of development.

    After a while, you're happy with the refactoring you've done and want to keep it, so you switch to the master branch and run git merge refactor. Git applies the changes you've made on both branches since they diverged, asking for your help in resolving any conflicts, then commits the result. You delete the refactor branch, and move on.

  • You have been working on the master branch of a cloned repository and have made several commits over a day or two. You then run git pull to update your clone with the latest work committed to the origin repository. It happens that others have also committed to the origin master branch in the meantime, so Git performs an automatic merge of master and origin/master and commits this to your master branch. You can then continue with your work or push to the origin repository now that you have incorporated its latest changes with your own. See Push and Pull.

There are two aspects to merging in Git: content and history.

Merging Content

What it means to successfully merge two or more sets of changes to the same file depends on the nature of the contents. Git will try to merge automatically, and often call it a success if the two changesets altered non-overlapping portions of the file. Whether you will call that a success, however, is a different question. If the file is chapter three of your next novel, then perhaps such a merge would be fine if you were making minor grammar and style corrections. If you were reworking the plot line, on the other hand, the results could be less useful—perhaps you added a paragraph on one branch that depends on details contained in a later paragraph that was deleted on another branch. Even if the contents are programming source code, such a merge is not guaranteed to be useful. You could change two separate subroutines in a way that causes them to fail when actually used; they might now make incompatible assumptions about some shared data structure, for example. Git doesn't even check to see that your code still compiles; that's up to you.

Within these limitations, though, Git has very sophisticated mechanisms for presenting merge conflicts and helping you to resolve them. It is optimized for the most common use case: line-oriented textual data, often in computer programming languages. It has different strategies and options for determining matching portions of files, which you can use when the defaults don't produce adequate results. You can interactively choose sets of changes to apply, skip, or further edit. To handle complex merges, Git works smoothly with external merge tools such as araxis, emerge, and kdiff, or with custom merge tools you write yourself.

Merging History

When Git has done what it can automatically, and you have resolved any remaining conflicts, it's time to commit the result. If we just make a commit to the current branch as usual, though, we've lost critical information: the fact that a merge occurred at all, and which branches were involved. You might remember to include this information in the commit message, but it's best not to depend on that; more importantly, Git needs to know about the merge in order to do a good job of merging in the future. Otherwise, the next time you merge the same branches (say, to periodically update one with continuing changes on the other), Git won't know which changes have already been merged and which are new. It may end up flagging as conflicts changes you have already considered and handled, or automatically applying changes you previously decided to discard.

The way Git records the fact of a merge is very simple. Recall from the section on The Object Store that a commit has a list of zero or more parent commits. The initial commit in a repository has no parents, and a simple commit to a branch has just one. When you commit as part of a merge, Git lists the tip commits of all branches involved in the merge as the parents of the new commit. This is in fact the definition of a merge commit: a commit having more than one parent. This information, recorded as part of the commit graph, allows visualization tools to detect and display merges in a helpful and unambiguous way. It also lets Git find an appropriate base version for comparison in later merging of the same or related branches when they have diverged again, avoiding the duplication mentioned earlier; this is called the merge base.

Push and Pull

You use the commands git pull and git push to update the state of one repository from that of another. Usually, one of these repositories was cloned from the other; in this context, git pull updates my clone with recent work added to the original repository, whereas git push contributes my work in the other direction.

There is sometimes confusion over the relationship between a repository and the one from which it was cloned. We're told that all repositories are equal, yet there seems to be an asymmetry in the original/clone relationship. Pulling automatically updates this repository from the original, so how interconnected are they? Will the clone still be usable if the original goes away? Are there branches in my repository that are somehow pointers to content in another repository? If so, that doesn't sound as if they're truly independent.

Fortunately, as with most things in Git, the situation is actually very simple; we just need to precisely define the terms at hand. The central thing to remember is that with regard to content, a repository consists of two things: an object store and a set of refs —that is, a commit graph and a set of branch names and tags that call out those commits that are of interest. When you clone a repository, such as with git clone server:dir/repo , here's what Git does:

  • Creates a new repository.
  • Adds a remote named origin to refer to the repository being cloned in .git/config:

    [remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = server:dir/repo

    The fetch value here, called a refspec, specifies a correspondence between sets of refs in the two repositories: the pattern on the left side of the colon names refs in the remote, and the spec indicates with the pattern on the right side where the corresponding refs should appear in the local repository. In this case, it means: Keep copies of the branch refs of the remote origin in its local namespace in this repository, refs/remotes/origin/.

  • Runs git fetch origin, which updates our local refs for the remote's branches (creating them in this case), and asks the remote to send any objects we need to complete the history for those refs (in the case of this new repository, all of them).
  • Finally, Git checks out the remote's current branch (its HEAD ref), leaving you with a working tree to look at. You can select a different initial branch to check out with --branch, or suppress the checkout entirely with -n.

Suppose we know the other repository has two branches, master and beta. Having cloned it, we see:

$ git branch
* master

Very well, we're on the master branch, but where's the beta branch? It appears to be missing until we use the --all switch:

$ git branch --all
* master
remotes/origin/HEAD -> origin/master
remotes/origin/master
remotes/origin/beta

Aha! There it is. This makes some sense: we have copies of the refs for both branches in the origin repository, just where the origin refspec says they should be, and there is also the HEAD ref from the origin, which told Git the default branch to check out. The curious thing now is: what is this duplicate master branch, outside of origin, that is the one we're actually on? And why did we have to give an extra option to see all these in the first place?

The answer lies in the purpose of the origin refs: they're called remote-tracking refs, and they are markers showing us the current state of those branches on the remote (as of the last time we checked in with the remote via fetch or pull). In adding to the master branch, you don't want to actually directly update your tracking branch with a commit of your own; then it would no longer reflect the remote repository state (and on your next pull, it would just discard your additions by resetting the tracking branch to match the remote). So, Git created a new branch with the same name in your local namespace, starting at the same commit as the remote branch:

$ git show-ref --abbrev master
d2e46a81 refs/heads/master
d2e46a81 refs/remotes/origin/master

The abbreviated SHA-1 values on the left are the commit IDs; note that they are the same, and recall that refs/heads/ is the implicit namespace for local branches. Now, as you add to your master branch, it will diverge from the remote master, which reflects the actual state of affairs.

The final piece here is the behavior of your local master branch in regard to the remote. Your intention is presumably to share your work with others as an update to their master branches; also, you'd like to keep abreast of changes made to this branch in the remote while you're working. To that end, Git has added some configuration for this branch in .git/config:

[branch "master"]
remote = origin
merge = refs/heads/master

This means that when you use git pull while on this branch, Git will automatically attempt to merge in any changes made to the corresponding remote branch since the last pull. This configuration affects the behavior of other commands as well, including fetch, push, and rebase.

Finally, Git has a special convenience for git checkout if you try to check out a branch that doesn't exist, but a corresponding branch does exist as part of a remote. It will automatically set up a local branch by the same name with the upstream configuration just demonstrated. For example:

$ git checkout beta
Branch beta set up to track remote branch beta from
origin. Switched to a new branch 'beta'
$ git branch --all
* beta
master
remotes/origin/HEAD -> origin/master
remotes/origin/beta
remotes/origin/master

Having explained remote-tracking branches, we can now say succinctly what the push and pull operations do:

git pull

Runs git fetch on the remote for the current branch, updating the remote's local tracking refs and obtaining any new objects needed to complete the history of those refs: that is, all commits, tags, trees, and blobs reachable from the new branch tips. Then it tries to update the current local branch to match the corresponding branch in the remote. If only one side has added content to the branch, then this will succeed, and is called a fast-forward update since one ref is simply moved forward along the branch to catch up with the other.

If both sides have committed to the branch, however, then Git has to do something to incorporate both versions of the branch history into one shared version. By default, this is a merge: Git merges the remote branch into the local one, producing a new commit that refers to both sides of the history via its parent pointers. Another possibility is to rebase instead, which attempts to rewrite your divergent commits as new ones at the tip of the updated remote branch (see Pull with Rebase)

ogit push

Attempts to update the corresponding branch in the remote with your local state, sending any objects the remote needs to complete the new history. This will fail if the update would be non–fast-forward as described earlier (i.e., would cause the remote to discard history), and Git will suggest that you first pull in order to resolve the discrepancies and produce an acceptable update.