POV-Ray: Newsgroups: povray.off-topic: Linking

POV-Ray : Newsgroups : povray.off-topic : Linking		Server Time 12 Jul 2025 13:17:14 EDT (-0400)

Goto Latest 10 Messages

Next 8 Messages >>>

From: Orchid Win7 v1
Subject: Linking
Date: 28 May 2016 03:47:02
Message: <57494cf6$1@news.povray.org>

So here's a question: how does linking actually work?


allows you to write functions that call functions that don't exist. This 
leaves you with an object file containing unresolved references. The 
linker then resolves these references.


designed this for only has 2KB of memory implemented as a mercury delay 
line" or some such stupidity... ]

But how does that actually *work*? The C compiler transforms C source 
code into executable object code. Basically, the object file contains 
(among other things) raw machine code, which the processor knows how to 
execute. Calling a subroutine is implemented as an unconditional jump 
op-code. If you know the jump target, then by all means, fill in the 
target address. But if the target hasn't been resolved yet... how do you 
fit the entire symbol name into 32 bits?

(32 bits is only 4 characters. And C++ in particular seems determined to 
transform even the most trivial function call into an 8-mile symbol name!)

Presumably the object file contains some metadata too. Stuff that tells 
you it *is* an object file, what the target processor is, what symbols 
it exposes publicly, etc. But I'm not sure how unresolved function calls 
are implemented.

For that matter, how does the linker sort all this out? Does it actually 
load the entire final binary into memory while it untangles it? Or does 
it somehow manage to incrementally build the file on disk? [I guess it's 
perhaps implementation-defined...] My Linux box is sitting here with 
16GB of RAM, and can probably handle holding the whole 0.2MB program in 
RAM at once. The original 2KB system that C was designed for? Not so much.

For that matter, is it possible to store *data* in an object file? (As 
opposed to executable code.)

Post a reply to this message

From: jr
Subject: Re: Linking
Date: 28 May 2016 07:23:51
Message: <57497fc7$1@news.povray.org>

hi,

On 28/05/2016 08:47, Orchid Win7 v1 wrote:
> So here's a question: how does linking actually work?

can't answer all of your questions, but here goes:

you're right about the jump-table.  google / search docs for "GOT"
(global offset table).

the .o files aren't "raw machine code".  afaik, GCC uses an intermediary
language ("gimple") which allows creating programs/libs where different
TUs may be written in different languages (C, Ada, etc).  see the docs
on the '-flto' (link time optimisation) compiler and '-Wl,-flto' linker
options.

jr.

Post a reply to this message

From: Le Forgeron
Subject: Re: Linking
Date: 28 May 2016 08:03:38
Message: <5749891a$1@news.povray.org>

Le 28/05/2016 09:47, Orchid Win7 v1 a écrit :
> So here's a question: how does linking actually work?
> 
> I mean, I understand what it *does*. For some reason†, the C language allows you
to write functions that call functions that don't exist. This leaves you with an
object file containing unresolved references. The linker then resolves these
references.
> 
> [ † I'm guessing the "some reason" boils down to "the machine we designed this for
only has 2KB of memory implemented as a mercury delay line" or some such stupidity...
]
> 
> But how does that actually *work*? The C compiler transforms C source code into
executable object code. Basically, the object file contains (among other things) raw
machine code, which the processor knows how to execute. Calling a subroutine is
implemented as an
> unconditional jump op-code. If you know the jump target, then by all means, fill in
the target address. But if the target hasn't been resolved yet... how do you fit the
entire symbol name into 32 bits?
> 
> (32 bits is only 4 characters. And C++ in particular seems determined to transform
even the most trivial function call into an 8-mile symbol name!)
> 
> Presumably the object file contains some metadata too. Stuff that tells you it *is*
an object file, what the target processor is, what symbols it exposes publicly, etc.
But I'm not sure how unresolved function calls are implemented.
> 
> For that matter, how does the linker sort all this out? Does it actually load the
entire final binary into memory while it untangles it? Or does it somehow manage to
incrementally build the file on disk? [I guess it's perhaps implementation-defined...]
My Linux box is
> sitting here with 16GB of RAM, and can probably handle holding the whole 0.2MB
program in RAM at once. The original 2KB system that C was designed for? Not so much.
> 
> For that matter, is it possible to store *data* in an object file? (As opposed to
executable code.)

At the beginning was the sequence of instructions, as read by the processor.
It was just painful to write for the humans, so Assembly was done: a mnemonic was used
instead of number, and began the era of symbols and labels.
And it was fine... but each processor came with its own assembly. It was not portable
and humans had to do it all over again when a new processor was made.

Then came C. The usual C compiler was in charge of generating the suitable assembly (
that's why .c can still be transformed in .s before becoming a .o )

C came with its own way to transform its symbols into the symbols supported by
assembly : mangling.

C++, because it allows even more characters in its symbols, has its own mangling too,
more complex, and not always compatible with the C mangle.
That's why you can encounter

extern "C" {
/* C code here */
}

statements: to inform the mangler to use the C version instead of the C++ one.

In classical assembly, symbols are mostly labels, and labels are position (address ?
dangerous shortcut) in file. They can be limited to 6 uppercase characters (very old)
or allow more length (32 was another limit).
What is found at a label can be data or code. And because humans are lazy, there is
even a special kind of data: data declared by length but without value (0 is assumed,
but not always).
So 3 kinds of label: initialized data, uninitialized data and code. (that's called
"section" in linker's jargon)

The linker is in charge of organizing the code and initialized data, grouping the
various sections and replacing the label with actual position (which might be absolute
or relative) for the processor.
The "initialization" of bss (unitialized data) is left to the program start code: if
you transfer a program with a huge bss section, the transfer is short because only
length is in the file, not the actual number of bytes. ("need a segment of 64k bytes"
is smaller than
providing the actual 64k bytes)

So far so good... just that with current linker, there is about 9 or more sections
instead of just 3.

A .o file contains the transcription of a .s : data, code and a bit of extra data
about the exported label, and imported label. (XREF and XDEF are some assembly's
instructions for that purpose... for some languages)

A linker, when making a C or C++ program, starts with an implicit (very well hidden,
in an implicit library or .o of the link chain) need of a main() symbol.
For every .o on the command line of the linker, the linker accumulates the needed &
provided symbols.
For every .a on the command line of the linker (via -lxxx, libxxx.a is explored): open
the library, extract all still needed symbols that are found (that might trigger the
addition of more needed symbols), close the library and forget forever about it, move
to next
library in the order of the command line.

If all symbols (including the main() ) have been found, it's a success, and all
symbols can be replaced with position in file/memory.
Otherwise, complains about unresolved symbols.

For shared library (.so) instead of static library (.a), the processing is the same,
but the code is not extracted from the library: only the name of the library is
stored. The start code of the program will load the shared library and try to perform
the relocation
(replacement of labels with position).

A function call is just: stack the parameters according to the ABI, jumps to the label
of the mangled function's name, on return extract the return value from the stack and
dispose of the previously stacked parameters. That's why C++ absolutly wants a
declaration of the
function it would call: to stack the parameters correctly in the generated assembly.
When unresolved, all that remains after that is a XREF of the mangled function name,
for the linker to solve.

Post a reply to this message

From: clipka
Subject: Re: Linking
Date: 28 May 2016 23:27:15
Message: <574a6193@news.povray.org>

Am 28.05.2016 um 14:03 schrieb Le_Forgeron:

> Then came C. The usual C compiler was in charge of generating the suitable assembly
( that's why .c can still be transformed in .s before becoming a .o )
> 
> C came with its own way to transform its symbols into the symbols supported by
assembly : mangling.

Uh... no, generally C does not have any name mangling. The symbols are
taken as-is. (Microsoft Windows programs being an exception.)

Which is why you can't overload functions in C.

> C++, because it allows even more characters in its symbols, has its own mangling
too, more complex, and not always compatible with the C mangle.
> That's why you can encounter
> 
> extern "C" {
> /* C code here */
> }
> 
> statements: to inform the mangler to use the C version instead of the C++ one.

Equally importantly, it also informs the compiler about the call
conventions to use (what registers to use for parameters, whether the
caller or the callee is responsible for stack cleanup, and the like).

Post a reply to this message

From: clipka
Subject: Re: Linking
Date: 28 May 2016 23:39:56
Message: <574a648c$1@news.povray.org>

Am 28.05.2016 um 09:47 schrieb Orchid Win7 v1:

> But how does that actually *work*? The C compiler transforms C source
> code into executable object code. Basically, the object file contains
> (among other things) raw machine code, which the processor knows how to
> execute. Calling a subroutine is implemented as an unconditional jump
> op-code. If you know the jump target, then by all means, fill in the
> target address. But if the target hasn't been resolved yet... how do you
> fit the entire symbol name into 32 bits?

Simple: You don't.

Instead, you add a table to the library listing all the memory locations
that should hold the address of a given unresolved symbol but for
obvious reasons currently don't.

The linker will later use that table to update those memory locations.


> (32 bits is only 4 characters. And C++ in particular seems determined to
> transform even the most trivial function call into an 8-mile symbol name!)
> 
> Presumably the object file contains some metadata too. Stuff that tells
> you it *is* an object file, what the target processor is, what symbols
> it exposes publicly, etc. But I'm not sure how unresolved function calls
> are implemented.

You /could/ look it up... you know, they have this fancy new thing
called the Internet, and search engines and things ;)


> For that matter, how does the linker sort all this out? Does it actually
> load the entire final binary into memory while it untangles it? Or does
> it somehow manage to incrementally build the file on disk? [I guess it's
> perhaps implementation-defined...] My Linux box is sitting here with
> 16GB of RAM, and can probably handle holding the whole 0.2MB program in
> RAM at once. The original 2KB system that C was designed for? Not so much.

[Your guess is correct.]


> For that matter, is it possible to store *data* in an object file? (As
> opposed to executable code.)

Absolutely. See the `source/base/font/*.cpp` files in the POV-Ray source
code for examples.

Post a reply to this message

From: clipka
Subject: Re: Linking
Date: 28 May 2016 23:59:14
Message: <574a6912$1@news.povray.org>

Am 28.05.2016 um 09:47 schrieb Orchid Win7 v1:


> allows you to write functions that call functions that don't exist. This
> leaves you with an object file containing unresolved references. The
> linker then resolves these references.
> 

> designed this for only has 2KB of memory implemented as a mercury delay
> line" or some such stupidity... ]

No. The "some reason" is that you don't want to have to recompile your
entire project just because some minor modification in a single obscure
C source file has caused all your memory addresses to shift.

Therefore, each and every (!) C source file is first translated
("compiled") into an address-independent (*) object file, and in a later
step all the object files in your project are combined ("linked") into a
single executable with fixed addresses.

(* To achieve address independency, a similar approach is used as for
external symbols: A table is included in the object file listing each
and every memory location that will have to hold an absolute address in
the executable, but in the object file only holds an offset relative to
the object file's "payload".)

Post a reply to this message

From: Warp
Subject: Re: Linking
Date: 29 May 2016 01:37:52
Message: <574a8030@news.povray.org>

Orchid Win7 v1 <voi### [at] devnull> wrote:
> Presumably the object file contains some metadata too. Stuff that tells 
> you it *is* an object file, what the target processor is, what symbols 
> it exposes publicly, etc. But I'm not sure how unresolved function calls 
> are implemented.

Even the final executable file isn't a fixed blob of machine code.
Linking happens also when executing such an executble file (look up
"dynamic linker").

Executable files contain references to dynamically loadable libraries,
and when you execute such a file, the OS will insert function calls
into said executable to point to whichever dynamically loadable library
it needs (which the OS also loads or, most usually, is already loaded
into memory because most of everything else needs it too.)

The idea with dynamically loadable libraries is, of course, to save
memory and increase efficiency. Since 99.9% of all executables use
the same system functions, it's more efficient to have them all share
the one and same library in memory than to statically link all those
megabytes of system library code into every single executable.

When object files refer to other object files, or to statically linked
libraries, a similar process happens, but at linking time, rather than
at runtime.

-- 
                                                          - Warp

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Linking
Date: 29 May 2016 03:26:37
Message: <574a99ad$1@news.povray.org>

On 29/05/2016 04:39 AM, clipka wrote:
> Am 28.05.2016 um 09:47 schrieb Orchid Win7 v1:
>
>> But how does that actually *work*? The C compiler transforms C source
>> code into executable object code. Basically, the object file contains
>> (among other things) raw machine code, which the processor knows how to
>> execute. Calling a subroutine is implemented as an unconditional jump
>> op-code. If you know the jump target, then by all means, fill in the
>> target address. But if the target hasn't been resolved yet... how do you
>> fit the entire symbol name into 32 bits?
>
> Simple: You don't.
>
> Instead, you add a table to the library listing all the memory locations
> that should hold the address of a given unresolved symbol but for
> obvious reasons currently don't.
>
> The linker will later use that table to update those memory locations.

Oh, I see. So, what, the jump op-code says to jump to address zero, and 
the object metadata tells the linker which bytes to change?

> You /could/ look it up... you know, they have this fancy new thing
> called the Internet, and search engines and things ;)

Don't you start. I've already spent a day looking at the output of nm, 
objdump and readelf. :-P The manpages tell you what all the switches do, 
but I still have no idea what .text is supposed to mean.

>> For that matter, is it possible to store *data* in an object file? (As
>> opposed to executable code.)
>
> Absolutely. See the `source/base/font/*.cpp` files in the POV-Ray source
> code for examples.

Hmm. Interesting.

   extern const unsigned char font_timrom[36936]={...

I didn't think you could make an array const. (Wouldn't that just mean 
the array pointer itself is constant? Not the array it points to.) And I 
thought extern means "this is declared somewhere else"?

[Also... Heh, do you know, when you said it, I was thinking I'd be able 
to browse the source online somewhere. Silly me...]

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: Linking
Date: 29 May 2016 03:48:43
Message: <574a9edb$1@news.povray.org>

On 29/05/2016 06:37 AM, Warp wrote:

> Executable files contain references to dynamically loadable libraries,
> and when you execute such a file, the OS will insert function calls
> into said executable to point to whichever dynamically loadable library
> it needs (which the OS also loads or, most usually, is already loaded
> into memory because most of everything else needs it too.)

I don't have any direct experience with that.

I was *about* to say that I've only done it with AmigaOS - but that's 
not quite right. What I was *actually* looking it is calling *the OS*, 
which is a bit different.

The way AmigaOS does it, memory address 0x00000004 holds a pointer to 
the function table for exec.library. Every function in the library has a 
known index in that table, so by adding your index to the address 
pointed to, you get a function pointer to the actual library function 
that you want. Now since exec.library is the one that contains the 
functions to load *other* libraries, from here you can open any other 
library you want. This similarly returns a function table base pointer, 
which you can use in the same way.

Of course, exec.library is in ROM, as are most of the low-level system 
libraries [including the entire GUI]. But you don't need to care about 
that. Just call OpenLibrary(), and it'll load from disk if required, and 
ultimately give you back a base pointer.

Presumably any self-respecting protected-mode OS does it differently. In 
particular, calling the kernel presumably implies a transition to 
ring-0, and I don't remember how x86 does that exactly. (From what I 
dimly recall, you purposely trigger a kind of software interrupt, but 
I'm not sure how you designate what function you're trying to call.)

 From what I can tell, C code does not call the Linux kernel. C code 
calls glibc, which then calls the kernel on your behalf. (As evidenced 
by several manpages that describe a glibc function and a kernel function 
of identical name but subtly different behaviour...)

> The idea with dynamically loadable libraries is, of course, to save
> memory and increase efficiency. Since 99.9% of all executables use
> the same system functions, it's more efficient to have them all share
> the one and same library in memory than to statically link all those
> megabytes of system library code into every single executable.

This gets entertaining when you have one program that uses a dozen 
libraries that nobody else is using. Or when every single program on the 
system uses a different version of the same library, so they all supply 
their own version of it. Then again, if old code doesn't work with a 
newer version of some dynamic library, who's fault is that?

Post a reply to this message

From: Le Forgeron
Subject: Re: Linking
Date: 29 May 2016 04:54:22
Message: <574aae3e@news.povray.org>

Le 29/05/2016 09:48, Orchid Win7 v1 a écrit :
> Then again, if old code doesn't
> work with a newer version of some dynamic library, who's fault is that?

That's why the version numbers of a dynamic library is usually part of the name, even
if a link is available from the name without version.

Of course, it took some time for folks to understand why it is better than a
msvcrt.dll (same name for different versions).

Post a reply to this message

Goto Latest 10 Messages

Next 8 Messages >>>