POV-Ray: Newsgroups: povray.off-topic: Git tutorial

POV-Ray : Newsgroups : povray.off-topic : Git tutorial		Server Time 30 Jul 2024 10:12:20 EDT (-0400)

Goto Latest 10 Messages

Next 10 Messages >>>

From: Darren New
Subject: Git tutorial
Date: 19 Apr 2011 14:25:38
Message: <4dadd3a2$1@news.povray.org>

http://www.eecs.harvard.edu/~cduan/technical/git/

Having spoken with half a dozen people who said "I hate git, it's so 
confusing" and then after showing them this they go "wow, that's really 
easy", I figured it might be worthwhile to show people this. :-)

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Le Forgeron
Subject: Re: Git tutorial
Date: 19 Apr 2011 15:45:48
Message: <4dade66c$1@news.povray.org>

Le 19/04/2011 20:25, Darren New nous fit lire :
> http://www.eecs.harvard.edu/~cduan/technical/git/
> 
> Having spoken with half a dozen people who said "I hate git, it's so
> confusing" and then after showing them this they go "wow, that's really
> easy", I figured it might be worthwhile to show people this. :-)
> 

Let's start a holy troll war. best OS or browser is so 20th century.

I prefer mercurial. ;-)

Oh, there is a false assertion about rebasing being unique to git on the
last page.

And doing graph in ascii art... ouch on the web!
(really, graphviz is not that hard to learn)

Post a reply to this message

From: Orchid XP v8
Subject: Re: Git tutorial
Date: 19 Apr 2011 16:41:36
Message: <4dadf380$1@news.povray.org>

On 19/04/2011 08:45 PM, Le_Forgeron wrote:

> I prefer mercurial. ;-)

I prefer Darcs. Then again, I haven't tried any other [distributed] VC 
system, so...

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Git tutorial
Date: 19 Apr 2011 17:05:59
Message: <4dadf937@news.povray.org>

On 4/19/2011 12:45, Le_Forgeron wrote:
> Let's start a holy troll war. best OS or browser is so 20th century.

I honestly wasn't recommending git.

However, that said, and with the understanding that I haven't looked at the 
other DVCSs, I think git has a very interesting layered approach to the problem.

First you have the virtual file system that is the repository.

Then on top you have actual commits and such, almost completely independent 
of the file system model.

Then on top of that you have stuff like packs and transfer protocols and 
such, again almost completely independent of the file system and commit model.

Kind of cool, in much the same way that Second Life has a kind of cool 
business model regardless of how well they actually implemented the code.

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 20 Apr 2011 05:44:23
Message: <4daeaaf7$1@news.povray.org>

On 19/04/2011 22:05, Darren New wrote:

> I honestly wasn't recommending git.
>
> However, that said, and with the understanding that I haven't looked at
> the other DVCSs, I think git has a very interesting layered approach to
> the problem.

OK, I just sat down and read it.

Yeah, it's fairly straight-forward. But I can see how if you don't 
understand the way it works, there's huge potential to *seriously* 
confuse yourself!

I had assumed that all DVCSs were the same, but I now see that at least 
Git and Darcs use fundamentally different models.

Fundamentally, any version control system tracks changes to files. What 
Git appears to be doing is something similar to RCS or CVS, where each 
file goes through a strictly sequential series of "versions", and one 
version comes "before" or "after" another. It seems that each commit 
stores the complete state of the entire repository. (Presumably as a 
diff relative to the previous commit, but still logically it's a 
snapshot of everything.) Git then uses "heads" to point to the most 
recent commit in each branch, or to other interesting points in the history.

Darcs works completely differently. Darcs doesn't track sequential file 
versions, it tracks change-sets. It defines a "change-set algebra" where 
unrelated changes to the same file are independent, and can be applied 
or reverted independently. Changes to the same part of a file are not 
independent, and can only be applied in sequence. But unrelated changes 
are... unrelated.

As an example, if I add some comments to file X and commit that, and 
then I add a new function to file X and commit that too, I get two 
change-sets. I can revert adding the comments but still keep the new 
function, even though the latter change happened *before* the former.

As far as I can tell, Git would require me to create a branch where I 
add the comments, and another branch where I add the new code, and then 
merge them back into the main branch, hoping that I don't get any 
conflicts. To me, this seems like a lot more work and a lot more 
conceptual overhead.

It also seems that when you ask Git to perform a commit, you have to 
tell it which files to record changes for. With Darcs, you tell it which 
files to monitor, and when you ask to commit it detects what's changed 
and interactively asks you which differences to include and which ones 
not to include. E.g., I could add comments to file X, add a new 
function, do a commit and interactively split the modifications into two 
separate commits. (It doesn't /always/ work, of course, but mostly it does.)

I wonder how well the illusion of one single sequence of file versions 
works when you have multiple people editing the file in parallel. I 
would imagine the Darcs model works better, because it doesn't try to 
pretend that file X looked like this, and then this, and then this. It 
just records what edits happened, without recording their relative 
ordering [except where they affect the same lines of code]. Git, on the 
other hand, appears to be trying to track what every file in the entire 
repository looked like in every individual commit object.

With Darcs, you make a branch by copying the repository. That's it. 
(Although there is an option to build a bunch of symlinks for you 
instead of just copying, to save a bit of disk space. Presumably only on 
POSIX platforms...) You can email individual change-sets around, and 
this works. Getting somebody else's changes just copies all change-sets 
from their repository into yours. You can then resolve any conflicts.

Still, I've used Darcs quite a bit, and I've never once used Git, so I'm 
not really qualified to say how well it works in practise.

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 20 Apr 2011 06:02:31
Message: <4daeaf37@news.povray.org>

On 20/04/2011 10:44, Invisible wrote:

> I had assumed that all DVCSs were the same, but I now see that at least
> Git and Darcs use fundamentally different models.
>
> Fundamentally, any version control system tracks changes to files. What
> Git appears to be doing is something similar to RCS or CVS, where each
> file goes through a strictly sequential series of "versions", and one
> version comes "before" or "after" another. It seems that each commit
> stores the complete state of the entire repository. (Presumably as a
> diff relative to the previous commit, but still logically it's a
> snapshot of everything.) Git then uses "heads" to point to the most
> recent commit in each branch, or to other interesting points in the
> history.
>
> Darcs works completely differently. Darcs doesn't track sequential file
> versions, it tracks change-sets. It defines a "change-set algebra" where
> unrelated changes to the same file are independent, and can be applied
> or reverted independently. Changes to the same part of a file are not
> independent, and can only be applied in sequence. But unrelated changes
> are... unrelated.

It appears Mercurial works the same was as Git:

http://mercurial.selenic.com/wiki/UnderstandingMercurial

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 20 Apr 2011 06:08:15
Message: <4daeb08f$1@news.povray.org>

One thing which apparently can't be done with Darcs, Git or Mercurial is 
managing multiple repositories at once.

For example, consider Glasgow Haskell Compiler, which is written in 
Haskell itself. The compiler, interpreter and run-time system are one 
repo, and the various standard libraries it requires are independent 
projects with their own repos, but GHC mirrors a copy of each, lagging 
behind the upstream slightly. The main GHC repo contains a special Bash 
script to do things like update all the sub-repos automatically. The 
existence of this script tells you that there's functionality that Darcs 
itself is failing to provide. (And Git and Mercurial, apparently.)

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 20 Apr 2011 06:24:06
Message: <4daeb446$1@news.povray.org>

On 19/04/2011 19:25, Darren New wrote:

> Having spoken with half a dozen people who said "I hate git, it's so
> confusing" and then after showing them this they go "wow, that's really
> easy", I figured it might be worthwhile to show people this. :-)

As an aside, I do envy you to some extent for actually knowing people 
IRL who know how a computer works...

Post a reply to this message

From: Darren New
Subject: Re: Git tutorial
Date: 20 Apr 2011 11:32:32
Message: <4daefc90$1@news.povray.org>

On 4/20/2011 2:44, Invisible wrote:
> I had assumed that all DVCSs were the same, but I now see that at least Git
> and Darcs use fundamentally different models.

GIT stores files, and deduces change sets from those files.

Mercurial stores change sets, and deduces files from those changesets.

> Fundamentally, any version control system tracks changes to files.

No, git actually tracks entire files. Every time you check something in, any 
file you add to staging (i.e., any file with any change) is stored in its 
entirety in the repository. And, technically, *every* file in the entire 
repository is stored, but if you haven't changed it, it has the same sha-1 
name, so it doesn't physically get copied.

Only later do you compress the repository into diffs. But a file can be 
stored as a diff from a different file it was never related to, that was 
created whole-hat *after* you checked in the file it's a diff from.

 > What Git
> appears to be doing is something similar to RCS or CVS, where each file goes
> through a strictly sequential series of "versions", and one version comes
> "before" or "after" another.

Nope. Not even close. git is storing the entire repository on every commit. 
That's the completely boggling idea there.

> It seems that each commit stores the complete
> state of the entire repository.

Yes. So there's no "before" or "after" for files, technically. There's 
before or after for repositories.

> (Presumably as a diff relative to the previous commit,

Nope. It stores the entire repository. Now, if you don't change a file, it 
hashes to the same value, and hence doesn't need to get stored again. But 
the entire file is put into the repository.

That's why git doesn't have a "rename" command. That would imply you're 
storing something other than files in the repository. git looks at the same 
contents disappearing from one part of the directory structure and showing 
up in another and says "Gee, that must have been a rename."  If there are 
minor changes between what disappeared on this commit and what showed up 
somewhere else on that commit, git says "there's a 97% probability this was 
a renamed file."

When you do a "git gc" to collect garbage, it then looks for a good way to 
generate diffs between files, and it looks through all the files (not just 
ancestors) to find good versions to diff from, and it stores the diffs. But 
that's just an storage optmization, just like the fact that it gzips the 
files in the repository is a storage optimization.

If I checked in a configuration file with my name in it, and 2 days later 
deleted it, and then you checked in a similar configuration file with your 
name in it, and then we did a "git gc", it's entirely possible (likely, 
even) that my file would be stored as a diff from your file.

In contrast, it seem Mercurial actually stores a tree of changes.

The only reason git uses a pointer to earlier commits is when you merge 
things, you don't want to apply changes you already applied in an earlier 
merge.

For example, when you merge branch A into branch B, git finds the common 
ancestor (i.e., the point at which you separated B from A in the first 
place), then *generates* the diff between what's now A and where they 
branched, then *applies* that diff to B, and leaves it ready for a commit. 
There's no diff stored before you type "git merge" or after it returns from 
the command line.

Reverting a commit involves either just throwing away the new version, or 
generating a reverse diff and applying it.

That's the thing that throws people about git, apparently. They're thinking 
in terms of diffs, when it's easiest to understand if you just think about 
it in terms of file contents.

> Git then uses "heads" to point to the most recent commit in each branch, or to
> other interesting points in the history.

Right.

> Darcs works completely differently. Darcs doesn't track sequential file
> versions, it tracks change-sets. It defines a "change-set algebra" where
> unrelated changes to the same file are independent, and can be applied or
> reverted independently. Changes to the same part of a file are not
> independent, and can only be applied in sequence. But unrelated changes
> are... unrelated.

Yeah, from the little I read about it, Darcs is another one of those 
"interesting" ideas.  An actual mathematical system for defining a 
repository, like relational algebra did for databases.

> As an example, if I add some comments to file X and commit that, and then I
> add a new function to file X and commit that too, I get two change-sets. I
> can revert adding the comments but still keep the new function, even though
> the latter change happened *before* the former.
>
> As far as I can tell, Git would require me to create a branch where I add
> the comments, and another branch where I add the new code, and then merge
> them back into the main branch, hoping that I don't get any conflicts. To
> me, this seems like a lot more work and a lot more conceptual overhead.

Nah. That's only if you want to have both at once working in parallel. That 
is, if you want one version with the comments but no function, and another 
version with the function but no comments, that's trivial in git. If you 
then want to combine them into a third version that has both comments and 
function, then you merge, which is also trivial unless you changed the same 
lines in both places. (I.e., it's as trivial as any other diff-patch based 
merge.)

> It also seems that when you ask Git to perform a commit, you have to tell it
> which files to record changes for.

Sure. But you can say "add all changes" trivially. Or you can use 
interactive tools to commit just bits and pieces of this and that.

Basically, there's a "staging" area where you build a new copy of the 
repository by including files and directories that are different from what's 
out there already. Then you add those new files and directories to the 
repository and point a commit to the new top-level directory.

> With Darcs, you tell it which files to
> monitor, and when you ask to commit it detects what's changed and
> interactively asks you which differences to include and which ones not to
> include. E.g., I could add comments to file X, add a new function, do a
> commit and interactively split the modifications into two separate commits.
> (It doesn't /always/ work, of course, but mostly it does.)

This is trivial with GIT. I do it all the time. I'll be adding a new 
function, and while testing, realize there's a bug in some other function. 
So when everything works again, I'll do two commits, staging just particular 
hunks (in the diff sense of the word) and do two commits, one for the bugfix 
and one for the new change.

> I wonder how well the illusion of one single sequence of file versions works
> when you have multiple people editing the file in parallel.

There's no single sequence of file versions. Every file is a new version.

Given that it's the repository format used by Linux developers, I think it's 
safe to say it works adequately for multiple people editing the file in 
parallel.

> I would imagine
> the Darcs model works better, because it doesn't try to pretend that file X
> looked like this, and then this, and then this.

Neither does git.

> It just records what edits
> happened, without recording their relative ordering [except where they
> affect the same lines of code]. Git, on the other hand, appears to be trying
> to track what every file in the entire repository looked like in every
> individual commit object.

Yes, but since you have them all, you can recreate the diffs between any two 
versions whenever you want.

> With Darcs, you make a branch by copying the repository. That's it.

In git you make a branch by saying "make a new branch."  That's what boggled 
me about mercurial. Really, I need multiple repositories to let me have a 
stable version and a development version?

> (Although there is an option to build a bunch of symlinks for you instead of
> just copying, to save a bit of disk space. Presumably only on POSIX
> platforms...) You can email individual change-sets around, and this works.
> Getting somebody else's changes just copies all change-sets from their
> repository into yours. You can then resolve any conflicts.

git is exactly the same, except it copies files instead of changes. When you 
say "pull from that repository", it finds each "head" (i.e., branch tip) and 
then recurses through the data structures pulling anything reachable from 
them. Since they're all named after the hashes, if you already have a file 
of the same name, you don't need to copy it. Then it stores the heads for 
the remote repository in a different place of the namespace tree than your 
own heads.

But, really, the only things in the git repository are files (blobs), 
directories (trees), commits (a log message pointing to a tree and maybe 
other commits), and tags (a log message pointing to a commit, possibly 
pgp-signed), and then a bunch of names for specific commits or tags or trees 
(i.e., branch names). There are no diffs. There is no history. There are no 
users.

If you want the log history, you follow pointers from one of the heads thru 
the different commit objects. If you want to see what changed, you run diff 
on the two files you want to know what changed between. If you want a new 
branch, you just modify some files, store them, and point a different name 
at the new commit.

If you want to merge someone's repository into yours, you simply copy from 
them any files or names that they have that you don't, and you're done. 
You're merged.  Now if you want to incorporate their changes into your work, 
you generate a diff between their latest version and some earlier version, 
and apply that diff to your latest version, and you're merged.

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Darren New
Subject: Re: Git tutorial
Date: 20 Apr 2011 11:38:52
Message: <4daefe0c@news.povray.org>

On 4/20/2011 3:08, Invisible wrote:
> One thing which apparently can't be done with Darcs, Git or Mercurial is
> managing multiple repositories at once.

git can do this in a sorta half-assed way. Mercurial apparently can too.


> you that there's functionality that Darcs itself is failing to provide. (And
> Git and Mercurial, apparently.)


-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

Goto Latest 10 Messages

Next 10 Messages >>>