POV-Ray: Newsgroups: povray.off-topic: Git tutorial

POV-Ray : Newsgroups : povray.off-topic : Git tutorial		Server Time 17 Dec 2025 03:39:07 EST (-0500)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Darren New
Subject: Re: Git tutorial
Date: 20 Apr 2011 11:40:58
Message: <4daefe8a$1@news.povray.org>

On 4/20/2011 8:38, Darren New wrote:
> On 4/20/2011 3:08, Invisible wrote:
>> One thing which apparently can't be done with Darcs, Git or Mercurial is
>> managing multiple repositories at once.
>
> git can do this in a sorta half-assed way. Mercurial apparently can too.

Wrong button. :-)

http://ssteiner.wordpress.com/2008/12/30/git-subprojects/

The fundamental problem is that all the DVCS systems tend to use crypto 
hashes to identify things. So in git, for example, when you create a 
sub-project, the parent project records where the repository is and what 
commit to check out. If you change the sub-project, the old commit is still 
there; you have just added to it. So the parent project is still going to 
get the old version of the subproject until you tell the parent project 
"hey, go update your pointer to the sub project to be commit 01A73F9E."

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 20 Apr 2011 11:49:57
Message: <4daf00a5$1@news.povray.org>

On 20/04/2011 16:40, Darren New wrote:

> The fundamental problem is that all the DVCS systems tend to use crypto
> hashes to identify things. So in git, for example, when you create a
> sub-project, the parent project records where the repository is and what
> commit to check out. If you change the sub-project, the old commit is
> still there; you have just added to it. So the parent project is still
> going to get the old version of the subproject until you tell the parent
> project "hey, go update your pointer to the sub project to be commit
> 01A73F9E."

The problem Git seems to have is that it uses heads to keep track of 
things. Delete the head and the corresponding commit drops off the face 
of the Earth.

Darcs manages a set [as in set theory] of changes. You don't need to 
keep updating a "pointer" to point to the latest one or anything. I'd be 
surprised if no over VCS has thought of this.

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 20 Apr 2011 12:03:49
Message: <4daf03e5$1@news.povray.org>

>> I had assumed that all DVCSs were the same, but I now see that at
>> least Git
>> and Darcs use fundamentally different models.
>
> GIT stores files, and deduces change sets from those files.
>
> Mercurial stores change sets, and deduces files from those changesets.

Doesn't appear to me that that's what happens, from what little 
Mercurial documentation I've read.

>> Fundamentally, any version control system tracks changes to files.
>
> No, git actually tracks entire files.

Fundamentally, VCS are about tracking changes. Git might *implement* 
that by storing the entire file, but *logically* what you're trying to 
do is keep track of what you changed.

> And, technically, *every* file in the entire repository is stored.

Yes, I gradually game to that realisation. Git is managing the entire 
repo as a strictly linear sequence of unumbered versions. (Until you 
explicitly create branches, anyway.)

>> (Presumably as a diff relative to the previous commit,
>
> Nope. It stores the entire repository. Now, if you don't change a file,
> it hashes to the same value, and hence doesn't need to get stored again.
> But the entire file is put into the repository.

How odd... Still, if you're not worried about the internal 
implementation, logically Git is versioning the whole repo as one unit, 
and that's all you need to know.

> That's why git doesn't have a "rename" command.git looks at the
> same contents disappearing from one part of the directory structure and
> showing up in another and says "Gee, that must have been a rename." If
> there are minor changes between what disappeared on this commit and what
> showed up somewhere else on that commit, git says "there's a 97%
> probability this was a renamed file."

o_O

OK, wow. I thought having to tell Darcs when I rename stuff was 
inconvenient, but this just sounds insane...

> The only reason git uses a pointer to earlier commits is when you merge
> things, you don't want to apply changes you already applied in an
> earlier merge.

And here I was thinking it was so you can revert to earlier versions if 
you want. You know - the entire purpose for a VCS to exist in the first 
place? ;-)

> Yeah, from the little I read about it, Darcs is another one of those
> "interesting" ideas. An actual mathematical system for defining a
> repository, like relational algebra did for databases.

It sounds simple enough. If this change affects line X and that change 
affects line Y, they are independent.

Ah, but wait. What if some change adds or removes lines? If change X 
adds a new line between lines 50 and 51 then change Y no longer affects 
line 150, it now affects line 151. But X and Y are still independent.

There's more to it than meets the eye. Of course, if you just want to 
*use* Darcs, you just edit stuff and it "just works".

>> As far as I can tell, Git would require me to create a branch where I add
>> the comments, and another branch where I add the new code, and then merge
>> them back into the main branch, hoping that I don't get any conflicts. To
>> me, this seems like a lot more work and a lot more conceptual overhead.
>
> Nah. That's only if you want to have both at once working in parallel.

Isn't "working on both at once" kind of the entire point of distributed 
version control?

> That is, if you want one version with the comments but no function, and
> another version with the function but no comments, that's trivial in
> git. If you then want to combine them into a third version that has both
> comments and function, then you merge, which is also trivial unless you
> changed the same lines in both places. (I.e., it's as trivial as any
> other diff-patch based merge.)

And if you merge the comments branch into the main branch, and then 
somebody adds more stuff to the comments branch, then what?

>> It also seems that when you ask Git to perform a commit, you have to
>> tell it which files to record changes for.
>
> Sure. But you can say "add all changes" trivially. Or you can use
> interactive tools to commit just bits and pieces of this and that.

With Darcs, I tell it what files to watch, and then when I've finished 
editing stuff, I say "record this" and it shows me every modified line 
of every file and asks which modifications to keep. Git doesn't support 
recording half a file modification, and doesn't even figure out which 
files changed.

> This is trivial with GIT. I do it all the time. I'll be adding a new
> function, and while testing, realize there's a bug in some other
> function. So when everything works again, I'll do two commits, staging
> just particular hunks (in the diff sense of the word) and do two
> commits, one for the bugfix and one for the new change.

Given that Git can only record the new file or the old one, how is that 
possible?

>> I wonder how well the illusion of one single sequence of file versions
>> works when you have multiple people editing the file in parallel.
>
> There's no single sequence of file versions. Every file is a new version.
>
> Given that it's the repository format used by Linux developers, I think
> it's safe to say it works adequately for multiple people editing the
> file in parallel.

This boggles my mind. Apparently I /don't/ understand how Git works at 
all, because the way it seems to work precludes two people touching the 
same file at the same time...

>> It just records what edits
>> happened, without recording their relative ordering [except where they
>> affect the same lines of code]. Git, on the other hand, appears to be
>> trying
>> to track what every file in the entire repository looked like in every
>> individual commit object.
>
> Yes, but since you have them all, you can recreate the diffs between any
> two versions whenever you want.

That's my point. If multiple people are editing the same files, you do 
*not* have all the changes.

>> You can email individual change-sets around, and this works.
>> Getting somebody else's changes just copies all change-sets from their
>> repository into yours. You can then resolve any conflicts.
>
> git is exactly the same, except it copies files instead of changes.

And the "minor detail" that if 200 people edit the same file, that's 200 
separate branches which have to be manually merged back together again.

> If you want to merge someone's repository into yours, you simply copy
> from them any files or names that they have that you don't, and you're
> done. You're merged.

It would be nice if Darcs worked that way.

> Now if you want to incorporate their changes into
> your work, you generate a diff between their latest version and some
> earlier version, and apply that diff to your latest version, and you're
> merged.

What a backwards way to look at it.

Post a reply to this message

From: Darren New
Subject: Re: Git tutorial
Date: 20 Apr 2011 12:24:02
Message: <4daf08a2@news.povray.org>

On 4/20/2011 8:49, Invisible wrote:
> The problem Git seems to have is that it uses heads to keep track of things.
> Delete the head and the corresponding commit drops off the face of the Earth.

Yes. That's why you shouldn't do that.

First, deleting a head that you can't reach from anywhere else requires you 
to answer a confirmation, just like anything else. Second, the *files* are 
still there. You just might not know what they're called. They're probably 
still around at least a couple of weeks before git cleans them up. I.e., 
there are well-documented ways to recover from this if you do it accidentally.

On the other hand, if you work on something and decide it wasn't a good 
idea, you can delete the branch and no harm no done. Darcs apparently 
requires you to copy the entire repository before you even *start* making 
changes if you want to recover.

> Darcs manages a set [as in set theory] of changes. You don't need to keep
> updating a "pointer" to point to the latest one or anything. I'd be
> surprised if no over VCS has thought of this.

But that's exactly why you need to start a new repository if you want a new 
branch. If you clone a repository in Darcs, make a bunch of changes, then 
accidentally delete the repository, you're in even worse shape than if you 
delete a branch in git.

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Darren New
Subject: Re: Git tutorial
Date: 20 Apr 2011 12:55:24
Message: <4daf0ffc@news.povray.org>

On 4/20/2011 9:03, Invisible wrote:
> Doesn't appear to me that that's what happens, from what little Mercurial
> documentation I've read.

I don't know. All the mercurial documentation I've read talks about change sets.

> Fundamentally, VCS are about tracking changes.

Fundamentally, they're about controlling versions. :-)

> Git might *implement* that by
> storing the entire file, but *logically* what you're trying to do is keep
> track of what you changed.

I think it depends. If I want version 1.0 that was released, I don't really 
care what changed to get there. I want that version.  There's uses for 
changes, and uses for storing what you stored.

The advantage of storing what's actually there is you can write all kinds of 
better tools to tell you the differences. For example, you can diff any two 
files to get a compressed copy of the file. If I change a file to include an 
additional 500 lines, then change it to delete 498 of them, the third 
version is going to get stored as a 2-line diff from the first version, not 
a 498-line diff from the second version.

Basically, you're storing absolutes and deducing differences, rather than 
storing differences and deducing absolutes. That means when you want to know 
what changed between release candidate 3.5RC2 and the version 4.2 that Fred 
compiled over on *his* machine, you can just compare the two. You don't have 
to reconstruct anything first.

> Yes, I gradually game to that realisation. Git is managing the entire repo
> as a strictly linear sequence of unumbered versions. (Until you explicitly
> create branches, anyway.)

Or until you clone it, yes.

>>> (Presumably as a diff relative to the previous commit,
>>
>> Nope. It stores the entire repository. Now, if you don't change a file,
>> it hashes to the same value, and hence doesn't need to get stored again.
>> But the entire file is put into the repository.
>
> How odd... Still, if you're not worried about the internal implementation,
> logically Git is versioning the whole repo as one unit, and that's all you
> need to know.

Yes, basically. That's why you can sign just the tag blob and be sure you've 
signed every file that that tag refers to.

> OK, wow. I thought having to tell Darcs when I rename stuff was
> inconvenient, but this just sounds insane...

Why? You don't have to tell git you renamed something.

>> The only reason git uses a pointer to earlier commits is when you merge
>> things, you don't want to apply changes you already applied in an
>> earlier merge.
>
> And here I was thinking it was so you can revert to earlier versions if you
> want. You know - the entire purpose for a VCS to exist in the first place? ;-)

Well, yes, it gives you a way to find those commits. But in theory you could 
look up any commit by hash code and say "give me that version" whether it's 
earlier or later or completely unrelated to what you have now. Indeed, 
that's exactly how branches work. When you start a new branch, all you're 
doing is storing the commit sha-1 into a new file named after the branch.

> It sounds simple enough. If this change affects line X and that change
> affects line Y, they are independent.

Yeah, until you get binary objects in there. :-)

>>> As far as I can tell, Git would require me to create a branch where I add
>>> the comments, and another branch where I add the new code, and then merge
>>> them back into the main branch, hoping that I don't get any conflicts. To
>>> me, this seems like a lot more work and a lot more conceptual overhead.
>>
>> Nah. That's only if you want to have both at once working in parallel.
>
> Isn't "working on both at once" kind of the entire point of distributed
> version control?

Only if you want to work on both at once in the same repository.

If you're working in a different repository, you don't need to start a new 
branch.  Branches are nothing but names for commits. You technically never 
need to use any branch at all if you want to type in a sha-1 every time.

>> That is, if you want one version with the comments but no function, and
>> another version with the function but no comments, that's trivial in
>> git. If you then want to combine them into a third version that has both
>> comments and function, then you merge, which is also trivial unless you
>> changed the same lines in both places. (I.e., it's as trivial as any
>> other diff-patch based merge.)
>
> And if you merge the comments branch into the main branch, and then somebody
> adds more stuff to the comments branch, then what?

Then you get a merge and then more changes on the comments branch. And if 
you merge the comments branch *again*, *that* is when git uses the parent 
pointers in the commit objects to figure out which files to diff in order to 
get the patches to the parent.

A--B--C--D--E--F--G--H--I
    \     |      /
     Q--R/--S--T/

So you started with A, changed to B, branched B and made a change to create 
Q, then R. In the mean time, I changed B to be C.  Now I merge your R back 
to my C. This looks back, sees B is the common ancestor, so diffs R against 
B and applies it to C, then creates D with C and R as parent commits. (Each 
letter is a commit, which includes the state of the entire repository.)

Now you keep working on R without incorporating my B->C change, creating S 
and T. I change D to include E and F. Now I merge your work again.

Git looks at F, follows it back to D, to C and R, and sees that R is a 
common ancestor of both F and T. So it diffs T against R, applies those 
diffs to F, and creates G.  You can then delete the branch that points to T 
safely without losing anything.

It's *super* straightforward to understand what merges do in git.

And if someone comes up with a better diff algorithm, no problem. The 
algorithm to do the diff during a merge isn't built into the repository.

> With Darcs, I tell it what files to watch, and then when I've finished
> editing stuff, I say "record this" and it shows me every modified line of
> every file and asks which modifications to keep. Git doesn't support
> recording half a file modification,

Yes it does.  Indeed, you can even go back and retroactively say "oh, those 
two commits? The second one should have come first, and the first one should 
be broken up into these three commits."

As I said, I do this all the time.

 > and doesn't even figure out which files changed.

Yes it does. It compares the working directory against the staging directory 
and the head to say "these files are changed and unstaged, those are changed 
but already staged, and those are unchanged."

It's just a two-step process. You can build up the thing you want to commit, 
and then finally commit it.  It sounds like Darcs needs you to do that all 
in one step.

Git does it the other way around. First it asks you what modified lines you 
want to put in the commit (and puts them in the staging area), then it 
creates the commit (based on the staging area).

>> This is trivial with GIT. I do it all the time. I'll be adding a new
>> function, and while testing, realize there's a bug in some other
>> function. So when everything works again, I'll do two commits, staging
>> just particular hunks (in the diff sense of the word) and do two
>> commits, one for the bugfix and one for the new change.
>
> Given that Git can only record the new file or the old one, how is that
> possible?

The staging area lies between the repository and the working directory. So I 
check out some branch, and that copies it to the WD and maybe clears the 
staging area. The staging area is basically a commit that's not yet in the 
repository.

Now I make changes to the WD.

Then I use something like "git add" to add all the changes from the WD to 
the staging directory. Or I use "git add -i" (or, more likely, the GUI) to 
diff the WD against the staging area (or the repository), pick (say) three 
of the five diff hunks, and then create a new temp file that holds the 
repository with those three diff hunks applied, which I then put in the 
staging area.  When I have everything the way I like, I commit the change, 
which copies the staging area into the repository and then adds a commit 
object pointing to it.

>>> I wonder how well the illusion of one single sequence of file versions
>>> works when you have multiple people editing the file in parallel.
>>
>> There's no single sequence of file versions. Every file is a new version.
>>
>> Given that it's the repository format used by Linux developers, I think
>> it's safe to say it works adequately for multiple people editing the
>> file in parallel.
>
> This boggles my mind. Apparently I /don't/ understand how Git works at all,
> because the way it seems to work precludes two people touching the same file
> at the same time...

Sure. But you're thinking git tracks diffs. That's exactly the point. If I 
change the file, and you change the file, then now there's three files. The 
original, the new one I have, and the new one you have.  When we go to merge 
it, we create number four, which is your new one with the differences 
between my version and the original applied.

It works because if there's no merge conflicts, then my diff applied to your 
file and your diff applied to my file creates the same file.

>> Yes, but since you have them all, you can recreate the diffs between any
>> two versions whenever you want.
>
> That's my point. If multiple people are editing the same files, you do *not*
> have all the changes.

Well, no, obviously. Welcome to DVCS.  If you don't give me your files, I 
can't see them. This is true of changes you don't push in Darcs and 
mercurial too. Maybe I'm misunderstanding what you're trying to say.

Darcs *is* distributed, right? If you change a file and check it into your 
local repository, and I change it and check it into my local repository, I 
can't see your changes and you can't see mine until we connect the 
repositories again, right?

> And the "minor detail" that if 200 people edit the same file, that's 200
> separate branches which have to be manually merged back together again.

And this differs from any other VCS how?

Note that if you're trying to *push* changes to a remote repository, you 
have to do it to a branch where nobody else has branched off since you did. 
In other words, if I say "update my repository to the DEV branch on the 
company's central reposityro", and I make changes, and someone else changes 
the DEV branch to point to a later version, I can no longer push my changes 
into the DEV branch. Instead, I have to fetch down the new DEV branch, merge 
my changes, then push the newly merged commit back up. Look up "fast-forward 
merge" in the git docs if you care.

But basically what it's saying is if you're *pushing* changes to a 
repository (i.e., there's no human there checking the merges) then you can't 
do a two-parent merge commit. You have to create the two-parent merge commit 
on your own machine, *then* push it up to the server. Sorta.

>> If you want to merge someone's repository into yours, you simply copy
>> from them any files or names that they have that you don't, and you're
>> done. You're merged.
>
> It would be nice if Darcs worked that way.

Right. In Darcs, you have to merge all the changes. In git, you have to 
merge all the changes.

>> Now if you want to incorporate their changes into
>> your work, you generate a diff between their latest version and some
>> earlier version, and apply that diff to your latest version, and you're
>> merged.
>
> What a backwards way to look at it.

Only if you're used to looking at source control as a series of diffs to 
start with. But that's (A) exactly what makes git hard to understand and (B) 
exactly what makes git brilliant. :-)

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Darren New
Subject: Re: Git tutorial
Date: 20 Apr 2011 13:11:44
Message: <4daf13d0$1@news.povray.org>

On 4/20/2011 9:03, Invisible wrote:
> Fundamentally, VCS are about tracking changes. Git might *implement* that by
> storing the entire file, but *logically* what you're trying to do is keep
> track of what you changed.

Or, as an alternate example, say you've been working and every day you 
commit before lunch and you commit before you go home, even if it's not 
working, just so it gets backed up. And you implement two functions, and you 
write code on that, and then realize you should have put that first function 
elsewhere, and you don't need the second function at all, and the other code 
should be in separate objects, and etc etc etc.

And at the end of the week, you have 50 messy changes committed.

With git, you can say "OK, go diff the current version against where I 
branched, and give me exactly one commit with all the changes I need." It's 
trivial to do that in git and then say "now commit *that* change for 
everyone else to see, and abandon all the intermediate changes."

I don't know how you'd do something like that in mercurial or darcs that 
store *changes* in the repository.

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 21 Apr 2011 04:06:44
Message: <4dafe594$1@news.povray.org>

On 20/04/2011 18:11, Darren New wrote:

> Or, as an alternate example, say you've been working and every day you
> commit before lunch and you commit before you go home, even if it's not
> working, just so it gets backed up. And you implement two functions, and
> you write code on that, and then realize you should have put that first
> function elsewhere, and you don't need the second function at all, and
> the other code should be in separate objects, and etc etc etc.
>
> And at the end of the week, you have 50 messy changes committed.
>
> With git, you can say "OK, go diff the current version against where I
> branched, and give me exactly one commit with all the changes I need."
> It's trivial to do that in git and then say "now commit *that* change
> for everyone else to see, and abandon all the intermediate changes."
>
> I don't know how you'd do something like that in mercurial or darcs that
> store *changes* in the repository.

Assuming that your working copy matches everything Darcs has in its 
history, you'd do this:

1. You "unrecord" the 50 messy commits. That doesn't do anything to your 
working copy, just the history Darcs keeps.

2. You "record" a single commit. When you do this, Darcs diffs the whole 
working copy against what it has in its history, and records that.

(In other words, if you add 500 lines, commit, delete 450 of those lines 
commit, and then you unrecord the two commits and record a new commit, 
the 450 lines that you added then deleted don't show up any more.)

Needless to say, you do *not* want to be unrecording any history which 
other people have copies of. But if it's only your local repo, it's fine.

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 21 Apr 2011 04:17:13
Message: <4dafe809$1@news.povray.org>

>> The problem Git seems to have is that it uses heads to keep track of
>> things.
>> Delete the head and the corresponding commit drops off the face of the
>> Earth.
>
> Yes. That's why you shouldn't do that.

It's also why having sub-repos might be tricky. Much simpler if you 
don't need to keep updating pointers.

> On the other hand, if you work on something and decide it wasn't a good
> idea, you can delete the branch and no harm no done. Darcs apparently
> requires you to copy the entire repository before you even *start*
> making changes if you want to recover.

What craziness are you speaking? If you want to go back to an older 
version, you just say "take me back to an older version please". If you 
don't want changes you've made, you either record commits reverting 
them, or you just delete them from the history outright. That's kind of 
the whole point of version control, distributed or not.

>> Darcs manages a set [as in set theory] of changes. You don't need to keep
>> updating a "pointer" to point to the latest one or anything. I'd be
>> surprised if no over VCS has thought of this.
>
> But that's exactly why you need to start a new repository if you want a
> new branch. If you clone a repository in Darcs, make a bunch of changes,
> then accidentally delete the repository, you're in even worse shape than
> if you delete a branch in git.

Well, yes, if you delete all your work, you have a problem. This isn't 
unique to Darcs. I'm not seeing what your point is...

Post a reply to this message

From: Le Forgeron
Subject: Re: Git tutorial
Date: 21 Apr 2011 05:13:05
Message: <4daff521@news.povray.org>

Le 20/04/2011 19:11, Darren New a écrit :
> On 4/20/2011 9:03, Invisible wrote:
>> Fundamentally, VCS are about tracking changes. Git might *implement*
>> that by
>> storing the entire file, but *logically* what you're trying to do is keep
>> track of what you changed.
> 
> Or, as an alternate example, say you've been working and every day you
> commit before lunch and you commit before you go home, even if it's not
> working, just so it gets backed up. And you implement two functions, and
> you write code on that, and then realize you should have put that first
> function elsewhere, and you don't need the second function at all, and
> the other code should be in separate objects, and etc etc etc.
> 
> And at the end of the week, you have 50 messy changes committed.

Yes, but that is in your messy repository only.
"Commit often, Push when working" is a good approach with DVCS.

> 
> With git, you can say "OK, go diff the current version against where I
> branched, and give me exactly one commit with all the changes I need."
> It's trivial to do that in git and then say "now commit *that* change
> for everyone else to see, and abandon all the intermediate changes."
> 
> I don't know how you'd do something like that in mercurial or darcs that
> store *changes* in the repository.
> 

For mercurial, there is an extension which aggregate the change-line or
even a cloud: collapse.

As long as the set of commits was not published in another repository,
it's ok (you just loose the finer steps).

Post a reply to this message

From: Invisible
Subject: Re: Git tutorial
Date: 21 Apr 2011 05:26:12
Message: <4daff834$1@news.povray.org>

On 20/04/2011 17:55, Darren New wrote:
> On 4/20/2011 9:03, Invisible wrote:
>> Doesn't appear to me that that's what happens, from what little Mercurial
>> documentation I've read.
>
> I don't know. All the mercurial documentation I've read talks about
> change sets.

The documentation I saw talks about a linear series of file versions, 
just like Git and RCS and CVS and...

>> Fundamentally, VCS are about tracking changes.
>
> Fundamentally, they're about controlling versions. :-)

Well, that's a valid way to look at it I guess.

> I think it depends. If I want version 1.0 that was released, I don't
> really care what changed to get there. I want that version.

Yes, clearly.

On the other hand, if somebody sends you some stuff and says "add this 
to your repo, it fixes bug #23482", you probably want to know what changed.

So yes, it does depend.

> The advantage of storing what's actually there is you can write all
> kinds of better tools to tell you the differences.

You can still apply whatever tools you want to your files, no matter 
which way you store them. Although I will admit, having a diff algorithm 
built right into the version control software is quite nice. (Although 
sometimes I wish Darcs did this better.)

> Basically, you're storing absolutes and deducing differences, rather
> than storing differences and deducing absolutes. That means when you
> want to know what changed between release candidate 3.5RC2 and the
> version 4.2 that Fred compiled over on *his* machine, you can just
> compare the two. You don't have to reconstruct anything first.

If you just write "darcs diff", you can see the changes between any two 
versions of your repo. The fact that Darcs has to do lots of work behind 
the scenes to do this is of little consequence to me. Darcs has to apply 
an algorithm that generates the two versions and then diffs them. Git 
would have to apply an algorithm that unpacks the two commits and diffs 
them. I don't really care, so long as I get my answers.

>> OK, wow. I thought having to tell Darcs when I rename stuff was
>> inconvenient, but this just sounds insane...
>
> Why? You don't have to tell git you renamed something.

Which means that it tries to guess when you rename something, so it is 
100% guaranteed to guess wrong sometimes.

Still, I suppose if it's sufficiently rare, it doesn't matter too much...

>> It sounds simple enough. If this change affects line X and that change
>> affects line Y, they are independent.
>
> Yeah, until you get binary objects in there. :-)

Yeah, it's unclear how you can hope to version control a binary file, 
other than just keeping a linear sequence of versions (which is what 
Darcs apparently does). Personally I've never needed to try, but I guess 
somebody I might.

>> And if you merge the comments branch into the main branch, and then
>> somebody
>> adds more stuff to the comments branch, then what?
>
> Then you get a merge and then more changes on the comments branch. And
> if you merge the comments branch *again*, *that* is when git uses the
> parent pointers in the commit objects to figure out which files to diff
> in order to get the patches to the parent.
>
> A--B--C--D--E--F--G--H--I
> \ | /
> Q--R/--S--T/
>
> So you started with A, changed to B, branched B and made a change to
> create Q, then R. In the mean time, I changed B to be C. Now I merge
> your R back to my C. This looks back, sees B is the common ancestor, so
> diffs R against B and applies it to C, then creates D with C and R as
> parent commits. (Each letter is a commit, which includes the state of
> the entire repository.)
>
> Now you keep working on R without incorporating my B->C change, creating
> S and T. I change D to include E and F. Now I merge your work again.
>
> Git looks at F, follows it back to D, to C and R, and sees that R is a
> common ancestor of both F and T. So it diffs T against R, applies those
> diffs to F, and creates G. You can then delete the branch that points to
> T safely without losing anything.
>
> It's *super* straightforward to understand what merges do in git.

I don't know, man, that all looks very, very complicated to me.

If I want to fix a bug in (say) GHC [which uses Darcs], I find the files 
in question, edit them, record the changes, and email the file to the 
GHC developers. I don't need to care about branches or whether the 
development tree has changed since I got my copy of it. They don't need 
to care whether my repo is in sync with theirs. They just apply the 
change, and it's done. Simple.

> And if someone comes up with a better diff algorithm, no problem. The
> algorithm to do the diff during a merge isn't built into the repository.

This is only an issue for Darcs. I don't have to care how Darcs stores 
my stuff. I can apply any diff algorithm I want to my files.

>> Git doesn't support recording half a file modification,
>
> Yes it does. Indeed, you can even go back and retroactively say "oh,
> those two commits? The second one should have come first, and the first
> one should be broken up into these three commits."
>
> As I said, I do this all the time.

I don't see how that's possible.

>> and doesn't even figure out which files changed.
>
> Yes it does.

Then why do you have to manually tell it which files to commit?

> It's just a two-step process. You can build up the thing you want to
> commit, and then finally commit it. It sounds like Darcs needs you to do
> that all in one step.
>
> Git does it the other way around. First it asks you what modified lines
> you want to put in the commit (and puts them in the staging area), then
> it creates the commit (based on the staging area).

I'm not sure I'm understanding what Git does. What Darcs does is show 
you each change and say "do you want to put this into the commit?" If 
you say yes, it records that change. If you say no, the change stays as 
"new". My usual workflow when I edit stuff is to periodically run Darcs, 
gather up all the changes related to one thing into a commit, run Darcs 
again, gather up all the changes related to another thing into another 
commit, and so on. I'm not sure what you mean by "Darcs needs you to do 
that all in one step".

>>> This is trivial with GIT. I do it all the time. I'll be adding a new
>>> function, and while testing, realize there's a bug in some other
>>> function. So when everything works again, I'll do two commits, staging
>>> just particular hunks (in the diff sense of the word) and do two
>>> commits, one for the bugfix and one for the new change.
>>
>> Given that Git can only record the new file or the old one, how is that
>> possible?
>
> The staging area lies between the repository and the working directory.

So, wait, there's a third file storage area?

> So I check out some branch, and that copies it to the WD and maybe
> clears the staging area. The staging area is basically a commit that's
> not yet in the repository.
>
> Now I make changes to the WD.
>
> Then I use something like "git add" to add all the changes from the WD
> to the staging directory. Or I use "git add -i" (or, more likely, the
> GUI) to diff the WD against the staging area (or the repository), pick
> (say) three of the five diff hunks, and then create a new temp file that
> holds the repository with those three diff hunks applied, which I then
> put in the staging area. When I have everything the way I like, I commit
> the change, which copies the staging area into the repository and then
> adds a commit object pointing to it.

Damn that sounds complicated.

>> This boggles my mind. Apparently I /don't/ understand how Git works at
>> all, because the way it seems to work precludes two people touching the
>> same file at the same time...
>
> Sure. But you're thinking git tracks diffs. That's exactly the point.

I know Git doesn't track diffs - I just can't comprehend how that can 
actually work properly.

> If
> I change the file, and you change the file, then now there's three
> files. The original, the new one I have, and the new one you have. When
> we go to merge it, we create number four, which is your new one with the
> differences between my version and the original applied.

This just seems a very strange way to look at things. Generally you 
don't care about versions, you care about alterations. "Does this draft 
have the corrections to chapter 4 in it or not?"

It seems to me that with the Git model, any time anybody edits any file, 
you create a new version of the entire repo that then has to be 
laboriously merged back into everybody else's repos. (Assuming no other 
edits have happened in the meantime.) What a clunky way to work.

>> And the "minor detail" that if 200 people edit the same file, that's 200
>> separate branches which have to be manually merged back together again.
>
> And this differs from any other VCS how?

With a centralised system, usually it's a check-in / check-out model, so 
only one person can edit a file at once.

With something like Darcs, there are now 200 change-sets, each of which 
is only in some repos. Copy the change-sets around and everything is in 
sync again. No need for complex "merge" operations or tangled file 
histories.

> Note that if you're trying to *push* changes to a remote repository, you
> have to do it to a branch where nobody else has branched off since you
> did.

And what the hell are the chances of that ever happening? If every time 
anybody touches any file it generates a new branch, then there's no 
chance of ever being able to push changes back.

> In other words, if I say "update my repository to the DEV branch on
> the company's central reposityro", and I make changes, and someone else
> changes the DEV branch to point to a later version, I can no longer push
> my changes into the DEV branch. Instead, I have to fetch down the new
> DEV branch, merge my changes, then push the newly merged commit back up.

And hope that the DEV branch doesn't change while you're busy trying to 
catch up. Still, I suppose if you repeat this cycle enough times, 
eventually you might get lucky and be able to perform the push.

>>> If you want to merge someone's repository into yours, you simply copy
>>> from them any files or names that they have that you don't, and you're
>>> done. You're merged.
>>
>> It would be nice if Darcs worked that way.
>
> Right. In Darcs, you have to merge all the changes. In git, you have to
> merge all the changes.

No, I meant it would be nice if the Darcs repo format allowed you to 
update a repo just by copying some files. Unfortunately there's 
cross-references and stuff which also have to be updated, so it's not 
that simple. You actually have to run Darcs to import a new patch.

Darcs also doesn't explicitly support the "bare" format that Git does, 
despite it being obviously useful.

>>> Now if you want to incorporate their changes into
>>> your work, you generate a diff between their latest version and some
>>> earlier version, and apply that diff to your latest version, and you're
>>> merged.
>>
>> What a backwards way to look at it.
>
> Only if you're used to looking at source control as a series of diffs to
> start with. But that's (A) exactly what makes git hard to understand and
> (B) exactly what makes git brilliant. :-)

So doing things the hard way is brilliant?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>