[Rd] Community Feedback: Git Repository for R-Devel

Juan Telleria jtelleriar at gmail.com
Sat Jan 6 03:40:45 CET 2018


I attach a basic State of Art:

##########################################################################################################################################
# State of Art Analysis of Git vs SVN
##########################################################################################################################################

Scopus Keywords: GIT AND SVN

##########################################################################################################################################
# 1. How Do Centralized (SVN) and Distributed Version Control (GIT) Systems
Impact Software Changes? (22 Citations; Published: 2014)
##########################################################################################################################################

1.1 Paper Conclusions

We found that the use of CVCS and DVCS have observable effects on
developers, teams and processes. The most surprising findings are that (i)
the size of commits in DVCS was smaller than in CVCS, (ii) developers split
commits (group changes by intent) more often in DVCS, and (iii) DVCS
commits are more likely to reference issue tracking labels. These show that
DVCS contain higher quality commits compared to CVCS due to their smaller
size, cohesive changes and the presence of issue tracking labels. The
survey provided valuable information on why developers prefer one paradigm
versus the other. DVCS are preferred because of killer features, such as
the ability of committing locally. In contrast CVCS are preferred for their
ease of use and faster learning curve.

1.2 Full Paper

http://dig.cs.illinois.edu/papers/ICSE14_Caius.pdf

##########################################################################################################################################
# 2 Version Control with Git (Book: J Loeliger, M McCullough – 2012)
##########################################################################################################################################

2.1 Book Introduction

***The Birth of Git***

Often, when there is discord between a tool and a project, the developers
simply create a new tool. Indeed, in the world of software, the temptation
to create new tools can be deceptively easy and inviting. In the face of
many existing version control systems, the decision to create another
shouldn’t be made casually. However, given a critical need, a bit of
insight, and a healthy dose of motivation, forging a new tool can be
exactly the right course.

Git, affectionately termed “the information manager from hell” by its
creator is such a tool. Although the precise circumstances and timing of
its genesis are shrouded in political wrangling within the Linux Kernel
community, there is no doubt that what came from that fire is a
well-engineered version control system capable of supporting worldwide
development of software on a large scale.

Prior to Git, the Linux Kernel was developed using the commercial BitKeeper
VCS, which provided sophisticated operations not available in then-current,
free software version control systems such as RCS and CVS. However, when
the company that owned BitKeeper placed additional restrictions on its
“free as in beer” version in the spring of 2005, the Linux community
realized that BitKeeper was no longer a viable solution.

Linus looked for alternatives. Eschewing commercial solutions, he studied
the free software packages but found the same limitations and flaws that
led him to reject them previously. What was wrong with the existing VCS
systems? What were the elusive missing features or characteristics that
Linus wanted and couldn’t find?

***Facilitate distributed development***

There are many facets to “distributed development,” and Linus wanted a new
VCS that would cover most of them. It had to allow parallel as well as
independent and simultaneous development in private repositories without
the need for constant synchronization with a central repository, which
could form a development bottleneck. It had to allow multiple developers in
multiple locations even if some of them were offline temporarily.

***Scale to handle thousands of developers***

It isn’t enough just to have a distributed development model. Linus knew
that thousands of developers contribute to each Linux release, so any new
VCS had to handle a very large number of developers, whether they were
working on the same or on different parts of a common project. And the new
VCS had to be able to integrate all of their work reliably.

***Perform quickly and efficiently***

 Linus was determined to ensure that a new VCS was fast and efficient. In
order to support the sheer volume of update operations that would be made
on the Linux Kernel alone, he knew that both individual update operations
and network transfer operations would have to be very fast. To save space
and thus transfer time, compression and “delta” techniques would be needed.
Using a distributed model instead of a centralized model also ensured that
network latency would not hinder daily development.

***Maintain integrity and trust***

Because Git is a distributed revision control system, it is vital to obtain
absolute assurance that data integrity is maintained and is not somehow
being altered. How do you know the data hasn’t been altered in transition
from one developer to the next, or from one repository to the next? For
that matter, how do you know that the data in a Git repository is even what
it purports to be?

Git uses a common cryptographic hash function, called Secure Hash Function
(SHA1), to name and identify objects within its database. Although perhaps
not absolute, in practice it has proven to be solid enough to ensure
integrity and trust for all of Git’s distributed repositories.

***Enforce accountability***

One of the key aspects of a version control system is knowing who changed
files, and if at all possible, why. Git enforces a change log on every
commit that changes a file. The information stored in that change log is
left up to the developer, project requirements, management, convention,
etc. Git ensures that changes will not happen mysteriously to files under
version control because there is an accountability trail for all changes.

***Immutability***

Git’s repository database contains data objects that are immutable. That
is, once they have been created and placed in the database, they cannot be
modified. They can be recreated differently, of course, but the original
data cannot be altered without consequences. The design of the Git database
means that the entire history stored within the version control database is
also immutable. Using immutable objects has several advantages, including
very quick comparison for equality.

***Atomic transactions***

With atomic transactions, a number of different but related changes are
performed either all together or not at all. This property ensures that the
version control database is not left in a partially changed (and hence
possibly corrupted) state while an update or commit is happening. Git
implements atomic transactions by recording complete, discrete repository
states that cannot be broken down into individual or smaller state changes.

***Support and encourage branched development***

Almost all VCSs can name different genealogies of development within a
single project. For instance, one sequence of code changes could be called
“development” while another is referred to as “test.” Each version control
system can also split a single line of development into multiple lines and
then unify, or merge, the disparate threads. As with most VCSs, Git calls a
line of development a branch and assigns each branch a name.

Along with branching comes merging. Just as Linus wanted easy branching to
foster alternate lines of development, he also wanted to facilitate easy
merging of those branches. Because branch merging has often been a painful
and difficult operation in version control systems, it would be essential
to support clean, fast, easy merging.

***Complete repositories***

So that individual developers needn’t query a centralized repository server
for historical revision information, it was essential that each repository
have a complete copy of all historical revisions of every file.

***A clean internal design***

Even though end users might not be concerned about a clean internal design,
it was important to Linus and ultimately to other Git developers as well.
Git’s object model has simple structures that capture fundamental concepts
for raw data, directory structure, recording changes, etc. Coupling the
object model with a globally unique identifier technique allowed a very
clean data model that could be managed in a distributed development
environment.

***Be free, as in freedom***

 ’Nuff said.

Given a clean slate to create a new VCS, many talented software engineers
collaborated and Git was born. Necessity was the mother of invention again!

1.2.2        Book Link

https://books.google.es/books?hl=en&lr=&id=aM7-Oxo3qdQC&oi=fnd&pg=PR3&dq=GIT+SVN&ots=39uhIKPlpc&sig=PmxABWMem-h4Fp1-JR-4C2HTwUY&redir_esc=y#v=onepage&q=GIT%20SVN&f=false

Chapter 18: “Using Git with Subversion Repositories”, is of special
interest.

You can find the full book accessible with a basic search in Google:

“Version Control with Git” filetype:pdf

Juan

	[[alternative HTML version deleted]]



More information about the R-devel mailing list