POV-Ray: Newsgroups: povray.off-topic: Linking: Re: Linking

POV-Ray : Newsgroups : povray.off-topic : Linking : Re: Linking		Server Time 18 Jul 2025 19:07:49 EDT (-0400)
From: Le Forgeron
Date: 28 May 2016 08:03:38
Message: <5749891a$1@news.povray.org>
Le 28/05/2016 09:47, Orchid Win7 v1 a écrit :
> So here's a question: how does linking actually work?
> 
> I mean, I understand what it *does*. For some reason†, the C language allows you
to write functions that call functions that don't exist. This leaves you with an
object file containing unresolved references. The linker then resolves these
references.
> 
> [ † I'm guessing the "some reason" boils down to "the machine we designed this for
only has 2KB of memory implemented as a mercury delay line" or some such stupidity...
]
> 
> But how does that actually *work*? The C compiler transforms C source code into
executable object code. Basically, the object file contains (among other things) raw
machine code, which the processor knows how to execute. Calling a subroutine is
implemented as an
> unconditional jump op-code. If you know the jump target, then by all means, fill in
the target address. But if the target hasn't been resolved yet... how do you fit the
entire symbol name into 32 bits?
> 
> (32 bits is only 4 characters. And C++ in particular seems determined to transform
even the most trivial function call into an 8-mile symbol name!)
> 
> Presumably the object file contains some metadata too. Stuff that tells you it *is*
an object file, what the target processor is, what symbols it exposes publicly, etc.
But I'm not sure how unresolved function calls are implemented.
> 
> For that matter, how does the linker sort all this out? Does it actually load the
entire final binary into memory while it untangles it? Or does it somehow manage to
incrementally build the file on disk? [I guess it's perhaps implementation-defined...]
My Linux box is
> sitting here with 16GB of RAM, and can probably handle holding the whole 0.2MB
program in RAM at once. The original 2KB system that C was designed for? Not so much.
> 
> For that matter, is it possible to store *data* in an object file? (As opposed to
executable code.)

At the beginning was the sequence of instructions, as read by the processor.
It was just painful to write for the humans, so Assembly was done: a mnemonic was used
instead of number, and began the era of symbols and labels.
And it was fine... but each processor came with its own assembly. It was not portable
and humans had to do it all over again when a new processor was made.

Then came C. The usual C compiler was in charge of generating the suitable assembly (
that's why .c can still be transformed in .s before becoming a .o )

C came with its own way to transform its symbols into the symbols supported by
assembly : mangling.

C++, because it allows even more characters in its symbols, has its own mangling too,
more complex, and not always compatible with the C mangle.
That's why you can encounter

extern "C" {
/* C code here */
}

statements: to inform the mangler to use the C version instead of the C++ one.

In classical assembly, symbols are mostly labels, and labels are position (address ?
dangerous shortcut) in file. They can be limited to 6 uppercase characters (very old)
or allow more length (32 was another limit).
What is found at a label can be data or code. And because humans are lazy, there is
even a special kind of data: data declared by length but without value (0 is assumed,
but not always).
So 3 kinds of label: initialized data, uninitialized data and code. (that's called
"section" in linker's jargon)

The linker is in charge of organizing the code and initialized data, grouping the
various sections and replacing the label with actual position (which might be absolute
or relative) for the processor.
The "initialization" of bss (unitialized data) is left to the program start code: if
you transfer a program with a huge bss section, the transfer is short because only
length is in the file, not the actual number of bytes. ("need a segment of 64k bytes"
is smaller than
providing the actual 64k bytes)

So far so good... just that with current linker, there is about 9 or more sections
instead of just 3.

A .o file contains the transcription of a .s : data, code and a bit of extra data
about the exported label, and imported label. (XREF and XDEF are some assembly's
instructions for that purpose... for some languages)

A linker, when making a C or C++ program, starts with an implicit (very well hidden,
in an implicit library or .o of the link chain) need of a main() symbol.
For every .o on the command line of the linker, the linker accumulates the needed &
provided symbols.
For every .a on the command line of the linker (via -lxxx, libxxx.a is explored): open
the library, extract all still needed symbols that are found (that might trigger the
addition of more needed symbols), close the library and forget forever about it, move
to next
library in the order of the command line.

If all symbols (including the main() ) have been found, it's a success, and all
symbols can be replaced with position in file/memory.
Otherwise, complains about unresolved symbols.

For shared library (.so) instead of static library (.a), the processing is the same,
but the code is not extracted from the library: only the name of the library is
stored. The start code of the program will load the shared library and try to perform
the relocation
(replacement of labels with position).

A function call is just: stack the parameters according to the ABI, jumps to the label
of the mangled function's name, on return extract the return value from the stack and
dispose of the previously stacked parameters. That's why C++ absolutly wants a
declaration of the
function it would call: to stack the parameters correctly in the generated assembly.
When unresolved, all that remains after that is a XREF of the mangled function name,
for the linker to solve.
Post a reply to this message