POV-Ray : Newsgroups : povray.off-topic : Git tutorial : Re: Git tutorial Server Time
30 Jul 2024 04:25:27 EDT (-0400)
  Re: Git tutorial  
From: Invisible
Date: 21 Apr 2011 05:26:12
Message: <4daff834$1@news.povray.org>
On 20/04/2011 17:55, Darren New wrote:
> On 4/20/2011 9:03, Invisible wrote:
>> Doesn't appear to me that that's what happens, from what little Mercurial
>> documentation I've read.
>
> I don't know. All the mercurial documentation I've read talks about
> change sets.

The documentation I saw talks about a linear series of file versions, 
just like Git and RCS and CVS and...

>> Fundamentally, VCS are about tracking changes.
>
> Fundamentally, they're about controlling versions. :-)

Well, that's a valid way to look at it I guess.

> I think it depends. If I want version 1.0 that was released, I don't
> really care what changed to get there. I want that version.

Yes, clearly.

On the other hand, if somebody sends you some stuff and says "add this 
to your repo, it fixes bug #23482", you probably want to know what changed.

So yes, it does depend.

> The advantage of storing what's actually there is you can write all
> kinds of better tools to tell you the differences.

You can still apply whatever tools you want to your files, no matter 
which way you store them. Although I will admit, having a diff algorithm 
built right into the version control software is quite nice. (Although 
sometimes I wish Darcs did this better.)

> Basically, you're storing absolutes and deducing differences, rather
> than storing differences and deducing absolutes. That means when you
> want to know what changed between release candidate 3.5RC2 and the
> version 4.2 that Fred compiled over on *his* machine, you can just
> compare the two. You don't have to reconstruct anything first.

If you just write "darcs diff", you can see the changes between any two 
versions of your repo. The fact that Darcs has to do lots of work behind 
the scenes to do this is of little consequence to me. Darcs has to apply 
an algorithm that generates the two versions and then diffs them. Git 
would have to apply an algorithm that unpacks the two commits and diffs 
them. I don't really care, so long as I get my answers.

>> OK, wow. I thought having to tell Darcs when I rename stuff was
>> inconvenient, but this just sounds insane...
>
> Why? You don't have to tell git you renamed something.

Which means that it tries to guess when you rename something, so it is 
100% guaranteed to guess wrong sometimes.

Still, I suppose if it's sufficiently rare, it doesn't matter too much...

>> It sounds simple enough. If this change affects line X and that change
>> affects line Y, they are independent.
>
> Yeah, until you get binary objects in there. :-)

Yeah, it's unclear how you can hope to version control a binary file, 
other than just keeping a linear sequence of versions (which is what 
Darcs apparently does). Personally I've never needed to try, but I guess 
somebody I might.

>> And if you merge the comments branch into the main branch, and then
>> somebody
>> adds more stuff to the comments branch, then what?
>
> Then you get a merge and then more changes on the comments branch. And
> if you merge the comments branch *again*, *that* is when git uses the
> parent pointers in the commit objects to figure out which files to diff
> in order to get the patches to the parent.
>
> A--B--C--D--E--F--G--H--I
> \ | /
> Q--R/--S--T/
>
> So you started with A, changed to B, branched B and made a change to
> create Q, then R. In the mean time, I changed B to be C. Now I merge
> your R back to my C. This looks back, sees B is the common ancestor, so
> diffs R against B and applies it to C, then creates D with C and R as
> parent commits. (Each letter is a commit, which includes the state of
> the entire repository.)
>
> Now you keep working on R without incorporating my B->C change, creating
> S and T. I change D to include E and F. Now I merge your work again.
>
> Git looks at F, follows it back to D, to C and R, and sees that R is a
> common ancestor of both F and T. So it diffs T against R, applies those
> diffs to F, and creates G. You can then delete the branch that points to
> T safely without losing anything.
>
> It's *super* straightforward to understand what merges do in git.

I don't know, man, that all looks very, very complicated to me.

If I want to fix a bug in (say) GHC [which uses Darcs], I find the files 
in question, edit them, record the changes, and email the file to the 
GHC developers. I don't need to care about branches or whether the 
development tree has changed since I got my copy of it. They don't need 
to care whether my repo is in sync with theirs. They just apply the 
change, and it's done. Simple.

> And if someone comes up with a better diff algorithm, no problem. The
> algorithm to do the diff during a merge isn't built into the repository.

This is only an issue for Darcs. I don't have to care how Darcs stores 
my stuff. I can apply any diff algorithm I want to my files.

>> Git doesn't support recording half a file modification,
>
> Yes it does. Indeed, you can even go back and retroactively say "oh,
> those two commits? The second one should have come first, and the first
> one should be broken up into these three commits."
>
> As I said, I do this all the time.

I don't see how that's possible.

>> and doesn't even figure out which files changed.
>
> Yes it does.

Then why do you have to manually tell it which files to commit?

> It's just a two-step process. You can build up the thing you want to
> commit, and then finally commit it. It sounds like Darcs needs you to do
> that all in one step.
>
> Git does it the other way around. First it asks you what modified lines
> you want to put in the commit (and puts them in the staging area), then
> it creates the commit (based on the staging area).

I'm not sure I'm understanding what Git does. What Darcs does is show 
you each change and say "do you want to put this into the commit?" If 
you say yes, it records that change. If you say no, the change stays as 
"new". My usual workflow when I edit stuff is to periodically run Darcs, 
gather up all the changes related to one thing into a commit, run Darcs 
again, gather up all the changes related to another thing into another 
commit, and so on. I'm not sure what you mean by "Darcs needs you to do 
that all in one step".

>>> This is trivial with GIT. I do it all the time. I'll be adding a new
>>> function, and while testing, realize there's a bug in some other
>>> function. So when everything works again, I'll do two commits, staging
>>> just particular hunks (in the diff sense of the word) and do two
>>> commits, one for the bugfix and one for the new change.
>>
>> Given that Git can only record the new file or the old one, how is that
>> possible?
>
> The staging area lies between the repository and the working directory.

So, wait, there's a third file storage area?

> So I check out some branch, and that copies it to the WD and maybe
> clears the staging area. The staging area is basically a commit that's
> not yet in the repository.
>
> Now I make changes to the WD.
>
> Then I use something like "git add" to add all the changes from the WD
> to the staging directory. Or I use "git add -i" (or, more likely, the
> GUI) to diff the WD against the staging area (or the repository), pick
> (say) three of the five diff hunks, and then create a new temp file that
> holds the repository with those three diff hunks applied, which I then
> put in the staging area. When I have everything the way I like, I commit
> the change, which copies the staging area into the repository and then
> adds a commit object pointing to it.

Damn that sounds complicated.

>> This boggles my mind. Apparently I /don't/ understand how Git works at
>> all, because the way it seems to work precludes two people touching the
>> same file at the same time...
>
> Sure. But you're thinking git tracks diffs. That's exactly the point.

I know Git doesn't track diffs - I just can't comprehend how that can 
actually work properly.

> If
> I change the file, and you change the file, then now there's three
> files. The original, the new one I have, and the new one you have. When
> we go to merge it, we create number four, which is your new one with the
> differences between my version and the original applied.

This just seems a very strange way to look at things. Generally you 
don't care about versions, you care about alterations. "Does this draft 
have the corrections to chapter 4 in it or not?"

It seems to me that with the Git model, any time anybody edits any file, 
you create a new version of the entire repo that then has to be 
laboriously merged back into everybody else's repos. (Assuming no other 
edits have happened in the meantime.) What a clunky way to work.

>> And the "minor detail" that if 200 people edit the same file, that's 200
>> separate branches which have to be manually merged back together again.
>
> And this differs from any other VCS how?

With a centralised system, usually it's a check-in / check-out model, so 
only one person can edit a file at once.

With something like Darcs, there are now 200 change-sets, each of which 
is only in some repos. Copy the change-sets around and everything is in 
sync again. No need for complex "merge" operations or tangled file 
histories.

> Note that if you're trying to *push* changes to a remote repository, you
> have to do it to a branch where nobody else has branched off since you
> did.

And what the hell are the chances of that ever happening? If every time 
anybody touches any file it generates a new branch, then there's no 
chance of ever being able to push changes back.

> In other words, if I say "update my repository to the DEV branch on
> the company's central reposityro", and I make changes, and someone else
> changes the DEV branch to point to a later version, I can no longer push
> my changes into the DEV branch. Instead, I have to fetch down the new
> DEV branch, merge my changes, then push the newly merged commit back up.

And hope that the DEV branch doesn't change while you're busy trying to 
catch up. Still, I suppose if you repeat this cycle enough times, 
eventually you might get lucky and be able to perform the push.

>>> If you want to merge someone's repository into yours, you simply copy
>>> from them any files or names that they have that you don't, and you're
>>> done. You're merged.
>>
>> It would be nice if Darcs worked that way.
>
> Right. In Darcs, you have to merge all the changes. In git, you have to
> merge all the changes.

No, I meant it would be nice if the Darcs repo format allowed you to 
update a repo just by copying some files. Unfortunately there's 
cross-references and stuff which also have to be updated, so it's not 
that simple. You actually have to run Darcs to import a new patch.

Darcs also doesn't explicitly support the "bare" format that Git does, 
despite it being obviously useful.

>>> Now if you want to incorporate their changes into
>>> your work, you generate a diff between their latest version and some
>>> earlier version, and apply that diff to your latest version, and you're
>>> merged.
>>
>> What a backwards way to look at it.
>
> Only if you're used to looking at source control as a series of diffs to
> start with. But that's (A) exactly what makes git hard to understand and
> (B) exactly what makes git brilliant. :-)

So doing things the hard way is brilliant?


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.