|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Depending on which operating system you're using, the OS provides
various ways to load code into memory and execute it. My question is
this: How hard would it be to write a subroutine that can read raw
machine code from a flat file and make it execute?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 5/19/2011 2:50, Invisible wrote:
> Depending on which operating system you're using, the OS provides various
> ways to load code into memory and execute it. My question is this: How hard
> would it be to write a subroutine that can read raw machine code from a flat
> file and make it execute?
In what language?
--
Darren New, San Diego CA, USA (PST)
"Coding without comments is like
driving without turn signals."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 19/05/2011 16:15, Darren New wrote:
> On 5/19/2011 2:50, Invisible wrote:
>> Depending on which operating system you're using, the OS provides various
>> ways to load code into memory and execute it. My question is this: How
>> hard
>> would it be to write a subroutine that can read raw machine code from
>> a flat
>> file and make it execute?
>
> In what language?
Most programming languages provide a way to suck the contents of a file
into memory without modifying it. So I suppose the question is, once
it's there, is there some way to execute it?
Presumably this must be very hard to do, otherwise people wouldn't jump
through all the hoops required to get dynamic linking via the OS to work.
For example:
- According to Wikipedia, every Windows DLL in the entire system must
have a unique base address. (I forget what happens if this isn't the
case; I believe it amounts to poor performance.)
- According to the LFS book, every Linux dynamic library actually has
the absolute path to the system linker program hard-coded into it. (This
almost defies belief!)
- Obviously in both cases the actual machine code must also be
surrounded by many miles of complex metadata too.
Presumably nobody would put up with such crippling limitations if doing
it yourself wasn't insanely hard.
One thing about loading data from file is that you usually don't get to
decide where it gets loaded. Is that a problem for x86? Is it hard to
write relocatable machine code or something?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 19/05/2011 16:55, Invisible wrote:
> One thing about loading data from file is that you usually don't get to
> decide where it gets loaded.
Thinking about it, the page where the file gets loaded probably has code
execution disabled too. So I guess you have to ask the OS to enable it...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 5/19/2011 8:55, Invisible wrote:
> is there some way to execute it?
Yes. An assembly languge JSR will do the trick.
A COM file is raw machine code without any headers, at least on CP/M. A COM
file was executed by reading it into 0x100 and branching to 0x100.
Again, what language are you talking about?
> Presumably this must be very hard to do, otherwise people wouldn't jump
> through all the hoops required to get dynamic linking via the OS to work.
No, the point of *that* is to (a) allocate the memory, (b) load the DLL, (c)
link the DLL entry points into your own code so that (d) you don't have to
recompile your code when the DLL gets replaced by a newer version. Also (e)
multiple people can use the same code at the same time and only load it once.
> - According to Wikipedia, every Windows DLL in the entire system must have a
> unique base address. (I forget what happens if this isn't the case; I
> believe it amounts to poor performance.)
It means the DLL has to get relocated when it gets loaded in.
> - Obviously in both cases the actual machine code must also be surrounded by
> many miles of complex metadata too.
No more than an EXE does.
> Presumably nobody would put up with such crippling limitations if doing it
> yourself wasn't insanely hard.
It's not insanely hard. It's just inflexible.
> One thing about loading data from file is that you usually don't get to
> decide where it gets loaded. Is that a problem for x86? Is it hard to write
> relocatable machine code or something?
The expression you're looking for is PIC, "Position Independent Code".
--
Darren New, San Diego CA, USA (PST)
"Coding without comments is like
driving without turn signals."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 19.05.2011 17:55, schrieb Invisible:
> One thing about loading data from file is that you usually don't get to
> decide where it gets loaded. Is that a problem for x86? Is it hard to
> write relocatable machine code or something?
Not really. For jumping around or calling subroutines in your own code
there are CALL and JMP instructions available that take a relative
address as operand; and with rather simple code you can auto-detect the
load address from within your code, so you can also compute absolute
addresses e.g. to access constants, pass callback function pointers to
some API, or use some portion of the code as a global variables space.
For the sake of run-time performance it is faster though to hard-code
such absolute addresses, and have the respective code locations adjusted
accordingly upon loading of the code. AFAIK that's the basic principle
most executable file formats work.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> is there some way to execute it?
>
> Yes. An assembly languge JSR will do the trick.
That's more or less what I thought.
> A COM file is raw machine code without any headers, at least on CP/M. A
> COM file was executed by reading it into 0x100 and branching to 0x100.
That appears to be the case for MS-DOS as well. (I know gasm has a "COM
output" option which appears to match this description.)
> Again, what language are you talking about?
Presumably any language that lets you insert assembly or link to C ought
to be able to handle this then.
>> Presumably this must be very hard to do, otherwise people wouldn't jump
>> through all the hoops required to get dynamic linking via the OS to work.
>
> No, the point of *that* is to (a) allocate the memory, (b) load the DLL,
> (c) link the DLL entry points into your own code so that (d) you don't
> have to recompile your code when the DLL gets replaced by a newer
> version. Also (e) multiple people can use the same code at the same time
> and only load it once.
I suppose (e) is the big one.
I imagine if you wanted to load a file containing *multiple* subroutines
instead of just one, things would become more complicated. Still, it
doesn't look *that* hard to get it right...
>> - According to Wikipedia, every Windows DLL in the entire system must
>> have a
>> unique base address. (I forget what happens if this isn't the case; I
>> believe it amounts to poor performance.)
>
> It means the DLL has to get relocated when it gets loaded in.
I have a vague feeling it also disables sharing it among processes.
>> - Obviously in both cases the actual machine code must also be
>> surrounded by
>> many miles of complex metadata too.
>
> No more than an EXE does.
Indeed, I gather that a Windows DLL is only different from a Windows EXE
by a few bit flags. (No idea what Linux does. Then again, last I checked
Linux has more than one format for executable programs, never mind
libraries...)
> The expression you're looking for is PIC, "Position Independent Code".
Yeah, every system seems to have a different name for this concept.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 5/19/2011 2:50 AM, Invisible wrote:
> Depending on which operating system you're using, the OS provides
> various ways to load code into memory and execute it. My question is
> this: How hard would it be to write a subroutine that can read raw
> machine code from a flat file and make it execute?
Think I will give a slightly different answer here than most, which is,
"In most modern operating systems, short of using a buffer overflow, or
other method, you can't." Now, for something clear back to the days of
say DOS (or even Win3.11, probably), you didn't have any protections, so
nothing stopped you from loading what ever your wanted into some bit of
memory, then jumping to it. On even older things, like Apple IIs, this
was actually how such loading of parts of applications took place, more
or less. You set the "write to" bank to bank 1, while the "execute" was
set to bank 0, then read and parsed the file from disk, wrote it into
bank 1 at your location, then set things up so that when the execute
flag changed to bank 1, the machine would simply start execution code at
the location you loaded the binary data into.
The closest you could get on a modern machine would be something like an
emulator, which would allocate a known amount of memory, then let you
play the same games, as though the machine you where dealing with was
one that allowed such things, and didn't have an OS installed that
protected from this.
In principle, a modern OS will only allow you to execute code it
"recognizes" as valid executables, and only under its rules, and
disallows certain methods of modification, which would allow you to play
those sorts of games. However... At least in principle, if you could
dump the address of a data array to the stack for a program, and then
somehow trick the CPU and OS into looking there for the next place to
run... But, in general, you are not allowed to mess with the stack that
directly in most languages.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 20.05.2011 10:12, schrieb Invisible:
>> A COM file is raw machine code without any headers, at least on CP/M. A
>> COM file was executed by reading it into 0x100 and branching to 0x100.
>
> That appears to be the case for MS-DOS as well. (I know gasm has a "COM
> output" option which appears to match this description.)
I'm not exactly sure, but AFAIR the MS-DOS COM files had a small header
before the actual code. (That one was simply copied into memory as well
though.)
MS-DOS also didn't place COM files at address 0x100, but rather at a
fixed offset in an arbitrary segment.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 20/05/2011 14:00, clipka wrote:
> Am 20.05.2011 10:12, schrieb Invisible:
>
>>> A COM file is raw machine code without any headers, at least on CP/M. A
>>> COM file was executed by reading it into 0x100 and branching to 0x100.
>>
>> That appears to be the case for MS-DOS as well. (I know gasm has a "COM
>> output" option which appears to match this description.)
>
> I'm not exactly sure, but AFAIR the MS-DOS COM files had a small header
> before the actual code. (That one was simply copied into memory as well
> though.)
http://en.wikipedia.org/wiki/COM_file
Claims there's no header for CP/M or MS-DOS.
> MS-DOS also didn't place COM files at address 0x100, but rather at a
> fixed offset in an arbitrary segment.
You might be right about that. I was reading about it because I wanted
to write a boot loader - which really *is* loaded at a fixed physical
address...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|