POV-Ray : Newsgroups : povray.off-topic : Git tutorial : Re: Git tutorial Server Time
30 Jul 2024 06:31:55 EDT (-0400)
  Re: Git tutorial  
From: Darren New
Date: 21 Apr 2011 12:13:05
Message: <4db05791$1@news.povray.org>
On 4/21/2011 2:26, Invisible wrote:
> You can still apply whatever tools you want to your files, no matter which
> way you store them. Although I will admit, having a diff algorithm built
> right into the version control software is quite nice. (Although sometimes I
> wish Darcs did this better.)

That's exactly what I mean. In order to have Darcs do this better, you have 
to actually fix your repo to use the better algorithm. In git, the algorithm 
isn't part of the repo. It's only part of the tools. Your very statement 
"it's built in, but I wish Darcs did it better" is exactly my point.

> If you just write "darcs diff", you can see the changes between any two
> versions of your repo.

What if I want to use kdiff3 or gvimdiff or some other diff visualization tool?

>>> OK, wow. I thought having to tell Darcs when I rename stuff was
>>> inconvenient, but this just sounds insane...
>>
>> Why? You don't have to tell git you renamed something.
>
> Which means that it tries to guess when you rename something, so it is 100%
> guaranteed to guess wrong sometimes.

If a file disappears from one place and at the same time reappears somewhere 
else with exactly to the byte the same contents, does it really matter 
whether you renamed it, whether you cut and pasted, or whether you typed it 
back in?

Remember, git is storing snapshots of the repository, not changes. The fact 
that something got renamed is sort of irrelevant.

Darcs is also storing snapshots, except snapshots of changes. If I rename 
file A to B, and then from B to C, and *then* I commit, Darcs isn't going to 
have those changes either.  If I delete five lines, then another ten, then 
commit, Darcs is going to lose that history that that was actually two changes.

> Yeah, it's unclear how you can hope to version control a binary file, other
> than just keeping a linear sequence of versions (which is what Darcs
> apparently does). Personally I've never needed to try, but I guess somebody
> I might.

Word documents. Images. Audio. Video game resources. People want version 
control for all of that.

Keeping all the old versions is the way to do it. The problem comes when you 
have a distributed repo, and you have to store locally every old version of 
all the binary files that you almost never are going to want.

Imagine if you were a Linux developer and people stored installation CD ISO 
images in the repository. Do you really want to check out every copy of 
every install CD just so you can fix bugs in one file system?

> I don't know, man, that all looks very, very complicated to me.

You asked what happens in that case. That's what happens.

> If I want to fix a bug in (say) GHC [which uses Darcs], I find the files in
> question, edit them, record the changes, and email the file to the GHC
> developers. I don't need to care about branches or whether the development
> tree has changed since I got my copy of it. They don't need to care whether
> my repo is in sync with theirs. They just apply the change, and it's done.
> Simple.

That's how it works with git also. Indeed, there's a git command that says 
"generate an email with the patch in it that I need to update someone else's 
repository."

You are making a branch. Your whole repository is a branch of the other 
guy's repository. If you look at the up-pointing lines as "you mail me a 
patch", then you get the same answer.

You asked what happens when someone keeps working on a branch that someone 
else already incorporated. I showed you how git decides which diffs to apply 
and which not to apply. Darcs does the same thing when building the working 
directory. It's going to apply the new patches, but how does it know what 
the new patches are? Right, it goes back until it finds the patches it 
already applied, then applies the newer ones.

>>> Git doesn't support recording half a file modification,
>>
>> Yes it does. Indeed, you can even go back and retroactively say "oh,
>> those two commits? The second one should have come first, and the first
>> one should be broken up into these three commits."
>>
>> As I said, I do this all the time.
>
> I don't see how that's possible.

Here's how to do it without the GUI:

http://book.git-scm.com/4_interactive_adding.html

With a GUI, you look at the patch list, right-click a hunk, and say "stage 
this to be committed".

If you want to change old commits, you do this:

http://book.git-scm.com/4_interactive_rebasing.html

Again, that's the text-based way of doing it without a gui.

>>> and doesn't even figure out which files changed.
>>
>> Yes it does.
>
> Then why do you have to manually tell it which files to commit?

Because maybe you don't want to commit all your changes in one step.

> I'm not sure I'm understanding what Git does. What Darcs does is show you
> each change and say "do you want to put this into the commit?" If you say
> yes, it records that change. If you say no, the change stays as "new".

In git, say you start with a working directory that matches the latest thing 
in the repository. You change files AA, BB, and CCC, and you add file DD. 
Changing the first two were to fix a bug, and the second two added a 
configuration option.

git add AA BB
git commit -m "fix bug"
git add CCC DD
git commit -m "add configuration option"

Let's say you then change 173 files, converting all single quotes to double 
quotes. You can then say

git add -a
git commit -m "change quote style"

> not sure what you mean by "Darcs needs you to do that all in one step".

I mean that gathering up the changes and committing them sounds like a 
single step in Darcs. In git, I can say

wings3d my_model.wings
git add my_model.wings
gimp my_image.jpg
git add my_image.jpg
vi configuration.ini
git add configuration.ini
git commit -m "add a textured model with some configuration"

>> The staging area lies between the repository and the working directory.
> So, wait, there's a third file storage area?

Yes. That's where you build up the next commit. It's called the staging 
area, or the index.

>> Then I use something like "git add" to add all the changes from the WD
>> to the staging directory. Or I use "git add -i" (or, more likely, the
>> GUI) to diff the WD against the staging area (or the repository), pick
>> (say) three of the five diff hunks, and then create a new temp file that
>> holds the repository with those three diff hunks applied, which I then
>> put in the staging area. When I have everything the way I like, I commit
>> the change, which copies the staging area into the repository and then
>> adds a commit object pointing to it.
>
> Damn that sounds complicated.

It's very simple with the gui. You start up the gui, it shows you a top-left 
pane of files that have changed that you haven't decided to commit yet. 
Bottom right pane are files that'll be in the commit. Right side is the 
listing of the diffs for whatever file you've highlighted.

If you want to put half the changes from configuration.ini in your commit, 
you click on that, go over to the list of diffs, click on each one you want 
in the commit, then click the commit button.

Pretty trivial.

>>> This boggles my mind. Apparently I /don't/ understand how Git works at
>>> all, because the way it seems to work precludes two people touching the
>>> same file at the same time...
>>
>> Sure. But you're thinking git tracks diffs. That's exactly the point.
>
> I know Git doesn't track diffs - I just can't comprehend how that can
> actually work properly.

Because files are actually named by their SHA-1, so nobody ever touches two 
different copies of the same file at the same time.

>> If
>> I change the file, and you change the file, then now there's three
>> files. The original, the new one I have, and the new one you have. When
>> we go to merge it, we create number four, which is your new one with the
>> differences between my version and the original applied.
>
> This just seems a very strange way to look at things. Generally you don't
> care about versions, you care about alterations. "Does this draft have the
> corrections to chapter 4 in it or not?"

And you can trivially tell that in git, not by looking at the files, but by 
looking at the commits.

> It seems to me that with the Git model, any time anybody edits any file, you
> create a new version of the entire repo that then has to be laboriously
> merged back into everybody else's repos. (Assuming no other edits have
> happened in the meantime.) What a clunky way to work.

It's not laborious to merge it in, any more than it's laborious to merge 
changes in Darcs into your repository and working directory.

If I clone a repository from you, your repository's URL is stored in my 
repository and called "origin" (by default). If I want to fetch all your 
changes, I say "git pull origin", which connects to your repository, gets 
the list of objects you have that I don't, and pulls them down. It also 
updates any branch names that you changed since last time I did that, so if 
you have a branch called "bugfix", I'll have a branch called 
"origin/bugfix". If Sally also cloned your repository and created a branch 
called bugfix, then I pulled from Sally also, I'll have a branch called 
"origin/bugfix" and one called "sally/bugfix".

>>> And the "minor detail" that if 200 people edit the same file, that's 200
>>> separate branches which have to be manually merged back together again.
>>
>> And this differs from any other VCS how?
>
> With a centralised system, usually it's a check-in / check-out model, so
> only one person can edit a file at once.

Um, no, not for the last 15 years or so. Not even CVS did things that way, 
let alone SVN.  Some systems work like that, yes, but they work really, 
really poorly when you have 200 people working on the same files, which is 
why people moved to CVS in the first place.

> With something like Darcs, there are now 200 change-sets, each of which is
> only in some repos. Copy the change-sets around and everything is in sync
> again. No need for complex "merge" operations or tangled file histories.

Of course you need to merge them, and of course you'll have tangled file 
histories. If all 200 people change the same part of the file, you'll have 
200 merge conflicts. If everyone is passing around partial change sets and 
making more changes that are dependent on those changes, you'll have a 
tangled file history.


>> Note that if you're trying to *push* changes to a remote repository, you
>> have to do it to a branch where nobody else has branched off since you
>> did.
>
> And what the hell are the chances of that ever happening? If every time
> anybody touches any file it generates a new branch, then there's no chance
> of ever being able to push changes back.

http://www.kernel.org/pub/software/scm/git/docs/git-rebase.html

Basically, you say "go look at the changes I made since I branched off the 
upstream repository, then apply those same changes to the new head of the 
upstream repository, and submit *that* as the new commit."

> And hope that the DEV branch doesn't change while you're busy trying to
> catch up. Still, I suppose if you repeat this cycle enough times, eventually
> you might get lucky and be able to perform the push.

If the DEV branch is changing that quickly, it means *someone* is going 
through this cycle successfully.   You're complaining that nobody goes to 
that restaurant any more because it's always too crowded.

The rebase is a single step, unless there are merge conflicts, so it's 
basically bound by network and CPU.

>>>> If you want to merge someone's repository into yours, you simply copy
>>>> from them any files or names that they have that you don't, and you're
>>>> done. You're merged.
>>>
>>> It would be nice if Darcs worked that way.
>>
>> Right. In Darcs, you have to merge all the changes. In git, you have to
>> merge all the changes.
>
> No, I meant it would be nice if the Darcs repo format allowed you to update
> a repo just by copying some files.

Well, git *does* store stuff in files, so technically you could copy the 
files. But by "copy files" I mean "use git to copy the new files."  As in, 
"you don't have to run any diffs or patches or anything".

> Darcs also doesn't explicitly support the "bare" format that Git does,
> despite it being obviously useful.

Which is why you don't have trouble with the rebasing. You only need to 
rebase stuff when you're pushing to a bare repository without human 
intervention. Basically, if you're sending changes to a bare repository 
without human intervention, you have to prove you've already resolved the 
merge conflicts that such might impose on someone else who later updates 
from that bare repository.

>>>> Now if you want to incorporate their changes into
>>>> your work, you generate a diff between their latest version and some
>>>> earlier version, and apply that diff to your latest version, and you're
>>>> merged.
>>>
>>> What a backwards way to look at it.
>>
>> Only if you're used to looking at source control as a series of diffs to
>> start with. But that's (A) exactly what makes git hard to understand and
>> (B) exactly what makes git brilliant. :-)
>
> So doing things the hard way is brilliant?

Building an entire type theory to discuss isomorphic idempotent change sets 
is the easy way?

-- 
Darren New, San Diego CA, USA (PST)
   "Coding without comments is like
    driving without turn signals."


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.