|
![](/i/fill.gif) |
On 4/20/2011 9:03, Invisible wrote:
> Doesn't appear to me that that's what happens, from what little Mercurial
> documentation I've read.
I don't know. All the mercurial documentation I've read talks about change sets.
> Fundamentally, VCS are about tracking changes.
Fundamentally, they're about controlling versions. :-)
> Git might *implement* that by
> storing the entire file, but *logically* what you're trying to do is keep
> track of what you changed.
I think it depends. If I want version 1.0 that was released, I don't really
care what changed to get there. I want that version. There's uses for
changes, and uses for storing what you stored.
The advantage of storing what's actually there is you can write all kinds of
better tools to tell you the differences. For example, you can diff any two
files to get a compressed copy of the file. If I change a file to include an
additional 500 lines, then change it to delete 498 of them, the third
version is going to get stored as a 2-line diff from the first version, not
a 498-line diff from the second version.
Basically, you're storing absolutes and deducing differences, rather than
storing differences and deducing absolutes. That means when you want to know
what changed between release candidate 3.5RC2 and the version 4.2 that Fred
compiled over on *his* machine, you can just compare the two. You don't have
to reconstruct anything first.
> Yes, I gradually game to that realisation. Git is managing the entire repo
> as a strictly linear sequence of unumbered versions. (Until you explicitly
> create branches, anyway.)
Or until you clone it, yes.
>>> (Presumably as a diff relative to the previous commit,
>>
>> Nope. It stores the entire repository. Now, if you don't change a file,
>> it hashes to the same value, and hence doesn't need to get stored again.
>> But the entire file is put into the repository.
>
> How odd... Still, if you're not worried about the internal implementation,
> logically Git is versioning the whole repo as one unit, and that's all you
> need to know.
Yes, basically. That's why you can sign just the tag blob and be sure you've
signed every file that that tag refers to.
> OK, wow. I thought having to tell Darcs when I rename stuff was
> inconvenient, but this just sounds insane...
Why? You don't have to tell git you renamed something.
>> The only reason git uses a pointer to earlier commits is when you merge
>> things, you don't want to apply changes you already applied in an
>> earlier merge.
>
> And here I was thinking it was so you can revert to earlier versions if you
> want. You know - the entire purpose for a VCS to exist in the first place? ;-)
Well, yes, it gives you a way to find those commits. But in theory you could
look up any commit by hash code and say "give me that version" whether it's
earlier or later or completely unrelated to what you have now. Indeed,
that's exactly how branches work. When you start a new branch, all you're
doing is storing the commit sha-1 into a new file named after the branch.
> It sounds simple enough. If this change affects line X and that change
> affects line Y, they are independent.
Yeah, until you get binary objects in there. :-)
>>> As far as I can tell, Git would require me to create a branch where I add
>>> the comments, and another branch where I add the new code, and then merge
>>> them back into the main branch, hoping that I don't get any conflicts. To
>>> me, this seems like a lot more work and a lot more conceptual overhead.
>>
>> Nah. That's only if you want to have both at once working in parallel.
>
> Isn't "working on both at once" kind of the entire point of distributed
> version control?
Only if you want to work on both at once in the same repository.
If you're working in a different repository, you don't need to start a new
branch. Branches are nothing but names for commits. You technically never
need to use any branch at all if you want to type in a sha-1 every time.
>> That is, if you want one version with the comments but no function, and
>> another version with the function but no comments, that's trivial in
>> git. If you then want to combine them into a third version that has both
>> comments and function, then you merge, which is also trivial unless you
>> changed the same lines in both places. (I.e., it's as trivial as any
>> other diff-patch based merge.)
>
> And if you merge the comments branch into the main branch, and then somebody
> adds more stuff to the comments branch, then what?
Then you get a merge and then more changes on the comments branch. And if
you merge the comments branch *again*, *that* is when git uses the parent
pointers in the commit objects to figure out which files to diff in order to
get the patches to the parent.
A--B--C--D--E--F--G--H--I
\ | /
Q--R/--S--T/
So you started with A, changed to B, branched B and made a change to create
Q, then R. In the mean time, I changed B to be C. Now I merge your R back
to my C. This looks back, sees B is the common ancestor, so diffs R against
B and applies it to C, then creates D with C and R as parent commits. (Each
letter is a commit, which includes the state of the entire repository.)
Now you keep working on R without incorporating my B->C change, creating S
and T. I change D to include E and F. Now I merge your work again.
Git looks at F, follows it back to D, to C and R, and sees that R is a
common ancestor of both F and T. So it diffs T against R, applies those
diffs to F, and creates G. You can then delete the branch that points to T
safely without losing anything.
It's *super* straightforward to understand what merges do in git.
And if someone comes up with a better diff algorithm, no problem. The
algorithm to do the diff during a merge isn't built into the repository.
> With Darcs, I tell it what files to watch, and then when I've finished
> editing stuff, I say "record this" and it shows me every modified line of
> every file and asks which modifications to keep. Git doesn't support
> recording half a file modification,
Yes it does. Indeed, you can even go back and retroactively say "oh, those
two commits? The second one should have come first, and the first one should
be broken up into these three commits."
As I said, I do this all the time.
> and doesn't even figure out which files changed.
Yes it does. It compares the working directory against the staging directory
and the head to say "these files are changed and unstaged, those are changed
but already staged, and those are unchanged."
It's just a two-step process. You can build up the thing you want to commit,
and then finally commit it. It sounds like Darcs needs you to do that all
in one step.
Git does it the other way around. First it asks you what modified lines you
want to put in the commit (and puts them in the staging area), then it
creates the commit (based on the staging area).
>> This is trivial with GIT. I do it all the time. I'll be adding a new
>> function, and while testing, realize there's a bug in some other
>> function. So when everything works again, I'll do two commits, staging
>> just particular hunks (in the diff sense of the word) and do two
>> commits, one for the bugfix and one for the new change.
>
> Given that Git can only record the new file or the old one, how is that
> possible?
The staging area lies between the repository and the working directory. So I
check out some branch, and that copies it to the WD and maybe clears the
staging area. The staging area is basically a commit that's not yet in the
repository.
Now I make changes to the WD.
Then I use something like "git add" to add all the changes from the WD to
the staging directory. Or I use "git add -i" (or, more likely, the GUI) to
diff the WD against the staging area (or the repository), pick (say) three
of the five diff hunks, and then create a new temp file that holds the
repository with those three diff hunks applied, which I then put in the
staging area. When I have everything the way I like, I commit the change,
which copies the staging area into the repository and then adds a commit
object pointing to it.
>>> I wonder how well the illusion of one single sequence of file versions
>>> works when you have multiple people editing the file in parallel.
>>
>> There's no single sequence of file versions. Every file is a new version.
>>
>> Given that it's the repository format used by Linux developers, I think
>> it's safe to say it works adequately for multiple people editing the
>> file in parallel.
>
> This boggles my mind. Apparently I /don't/ understand how Git works at all,
> because the way it seems to work precludes two people touching the same file
> at the same time...
Sure. But you're thinking git tracks diffs. That's exactly the point. If I
change the file, and you change the file, then now there's three files. The
original, the new one I have, and the new one you have. When we go to merge
it, we create number four, which is your new one with the differences
between my version and the original applied.
It works because if there's no merge conflicts, then my diff applied to your
file and your diff applied to my file creates the same file.
>> Yes, but since you have them all, you can recreate the diffs between any
>> two versions whenever you want.
>
> That's my point. If multiple people are editing the same files, you do *not*
> have all the changes.
Well, no, obviously. Welcome to DVCS. If you don't give me your files, I
can't see them. This is true of changes you don't push in Darcs and
mercurial too. Maybe I'm misunderstanding what you're trying to say.
Darcs *is* distributed, right? If you change a file and check it into your
local repository, and I change it and check it into my local repository, I
can't see your changes and you can't see mine until we connect the
repositories again, right?
> And the "minor detail" that if 200 people edit the same file, that's 200
> separate branches which have to be manually merged back together again.
And this differs from any other VCS how?
Note that if you're trying to *push* changes to a remote repository, you
have to do it to a branch where nobody else has branched off since you did.
In other words, if I say "update my repository to the DEV branch on the
company's central reposityro", and I make changes, and someone else changes
the DEV branch to point to a later version, I can no longer push my changes
into the DEV branch. Instead, I have to fetch down the new DEV branch, merge
my changes, then push the newly merged commit back up. Look up "fast-forward
merge" in the git docs if you care.
But basically what it's saying is if you're *pushing* changes to a
repository (i.e., there's no human there checking the merges) then you can't
do a two-parent merge commit. You have to create the two-parent merge commit
on your own machine, *then* push it up to the server. Sorta.
>> If you want to merge someone's repository into yours, you simply copy
>> from them any files or names that they have that you don't, and you're
>> done. You're merged.
>
> It would be nice if Darcs worked that way.
Right. In Darcs, you have to merge all the changes. In git, you have to
merge all the changes.
>> Now if you want to incorporate their changes into
>> your work, you generate a diff between their latest version and some
>> earlier version, and apply that diff to your latest version, and you're
>> merged.
>
> What a backwards way to look at it.
Only if you're used to looking at source control as a series of diffs to
start with. But that's (A) exactly what makes git hard to understand and (B)
exactly what makes git brilliant. :-)
--
Darren New, San Diego CA, USA (PST)
"Coding without comments is like
driving without turn signals."
Post a reply to this message
|
![](/i/fill.gif) |