|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Is that really so? Can you refactor an existing solution (eg. in a library)
> so that it will be more efficient (eg. memorywise) without breaking any
> existing programs?
I don't know about Haskell, but IBM had a language called NIL that was
very high-level. Everything was processes and SQL-style tables and
unbounded integers and dynamic code instantiation and such. They wrote
the routing software for SNA in it.
When a customer wanted to have two SNA routers sharing the load and
picking up in the event of a fall-over, the guys on the NIL team
realized they didn't need to rewrite any NIL code. They simply(*) added
a flag to the compiler to tell it to generate parallel code with soft
fail-over. Nothing but the compiler got changed.
(*) For some meaning of "simply" obviously. ;-)
--
Darren New / San Diego, CA, USA (PST)
Remember the good old days, when we
used to complain about cryptography
being export-restricted?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> The point being, ByteString is implemented as an array, yet still *looks
>> like* a normal linked-list. So you get all the nice fancy theory *and*
>> the efficient runtime behaviour, all at once. So I'm not sure it is
>> "flawed"...
>
> Then why do you need any other type of string?
You probably don't.
It's just that the language standard ("Haskell 98") defines "String" as
an alias for "[Char]" ("single linked list of characters"). ByteString
is a 3rd-party library that somebody independently wrote much more
recently. Adding a new library is quite easy; actually *replacing*
something existing is much more work.
My hope is that eventually the work that has been done on ByteString
will be extended to make *all* list processing trippy-fast. There's just
not much evidence of it happening right now. You know what volunteer
projects are like...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Is that really so? Can you refactor an existing solution (eg. in a library)
> so that it will be more efficient (eg. memorywise) without breaking any
> existing programs?
I guess the key to a question like this is to think about what kinds of
things would possibly *prevent* such a refactor.
I have never written a nontrivial program in C, but in Pascal I
frequently found that the type system was one such reason. In Pascal
(and AFAIK in plain C) you *must* expose all implementation details of a
type in order for anybody else to be able to touch it. This is one of
the Big Problems that OOP is supposed to fix. (And, IMHO, does fix very
well indeed.)
Haskell does quite well too in this regard. I can write a module in
Haskell that exports a type, without telling you a damn thing about what
the type is. It might just be an alias for some existing type, or it
could be something really complex. You don't know. You don't care. And I
can *change* what that type actually is, and you *still* don't need to
care. All you see is that there exists a type with a specific name, and
some functions that can do things with it.
We could summarise all that by simply saying that hiding unecessary
detail is a Good Thing. So perhaps we should look at where details can
leak out in Haskell...
One place is I/O. Functions that do I/O have to be "marked" as such.
This is by design; normal functions can be run in any order without
consequence, whereas I/O operations *usually* need to be run in a very
precise order. This information is obviously very important, and so is
exposed at the type level.
What this means is that you can't flip a normal function into an
I/O-performing function without altering at least the function's type.
But then, you're turning an "I can run this at any time" function into a
"it might well matter when I run this" function. Arguably this demands
that the client code change anyway, in which case the enforced interface
change is arguably good. Either way, this kind of code change seems
quite rare in practice. (Certainly rare for performance tuning.)
What other information is leaked? Well, a function's type tells you
exactly what inputs it depends on. (Unless it does I/O anyway...) So if
you change the algorithm to take more information into account, the type
signature must change.
Let us take an example: Suppose you have a function that decodes
different character sets. You give it a byte, and it translates it to a
Unicode code-point using some kind of rule.
Now suppose you want to add UTF-8 decoding. Oops! That's a
variable-length encoding. To decode a byte, you might need to know what
the previous byte was. But in Haskell, you can't do that.
Now, in C you could just store some status information in a global
variable somewhere. So in C you can add UTF-8 without having to change
your public interface. Yay!
Now suppose some client decides to call the function from multiple
threads processing several input streams. OOPS!! Chaos ensues... The
*only* way you can fix this is if you alter your public interface. There
are several possible fixes, but all involve client-visible change.
On the other hand, Haskell would force you to change the public
interface in the first place. (And again you have several options here.)
To summarise: In both C and Haskell you end up changing the interface if
you want the thing to work properly. Haskell just forces you to do it
right at the start, whereas C potentially does strange things until you
realise what's up. And in both cases, the key to success is to design
the public interface properly in the first place! (I.e., write a
function that converts whole strings, not individual octets.)
I said above that leaking unecessary detail is bad. My arguments below
that could essentially be read as saying "this is leaking detail, but it
is *necessary* detail, so that's OK". So it's really whether you
consider these things "necessary" or not.
Another potential problem area is laziness. Being lazy can speed
programs up no end. It can also slow them down tremendously.
Deliberately turning off laziness is a fairly common way to improve the
efficiency of Haskell code. However, this is often an obvservable
change. Code that works with the lazy version of a library might well
lock up given a stricter version.
So here is an implementation change that is totally invisible at the
interface level, yet might cause client code to stop working properly.
(It's actually fairly common to provide libraries with two versions, so
the client can choose. And because suddenly changing a library's
implementation can cause problems...)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp nous apporta ses lumieres en ce 2007/11/01 14:40:
> Alain <ele### [at] netscapenet> wrote:
>> It's about 12 BITS per characters, on average, not BYTES! That's useing UTF8
>> encoding. About 16 BITS per characters if using UTF16 encoding.
>> UTF8 is only 7 BITS per characters if you stick to only standard ASCII
>> characters set, but it gets bigger if you also use extended ASCII or characters
>> from foreign alphabets.
>
> Except that if each single character is indeed garbage-collected, that
> requires quite a lot of memory per character (compared to the size of the
> character).
>
Garbage collection is BAAAAD!!!
Any implementation that permit you not to use that is GOOD.
I got hit real bad by garbage collection, ONCE! The programm was running realy
good, no bug at all, then is just stopped there, nothing at all was hapening,
the systemwas totaly unresponsive, uterly frozen, then it resumed after around
15 minutes. It was garbage collecting.
I changed the programm logic to prevent any future garbage collection.
--
Alain
-------------------------------------------------
You know you've been raytracing too long when you're starting to find these
quotes more unsettling than funny.
-- Alex McLeod a.k.a. Giant Robot Messiah
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Alain escribió:
> Garbage collection is BAAAAD!!!
> Any implementation that permit you not to use that is GOOD.
>
> I got hit real bad by garbage collection, ONCE! The programm was running
> realy good, no bug at all, then is just stopped there, nothing at all
> was hapening, the systemwas totaly unresponsive, uterly frozen, then it
> resumed after around 15 minutes. It was garbage collecting.
> I changed the programm logic to prevent any future garbage collection.
>
http://www.virtualdub.org/blog/pivot/entry.php?id=176
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 nous apporta ses lumieres en ce 2007/11/01 15:03:
> Warp wrote:
>> Alain <ele### [at] netscapenet> wrote:
>>> It's about 12 BITS per characters, on average, not BYTES! That's
>>> useing UTF8 encoding. About 16 BITS per characters if using UTF16
>>> encoding.
>>> UTF8 is only 7 BITS per characters if you stick to only standard
>>> ASCII characters set, but it gets bigger if you also use extended
>>> ASCII or characters from foreign alphabets.
>>
>> Except that if each single character is indeed garbage-collected, that
>> requires quite a lot of memory per character (compared to the size of the
>> character).
>
> I can't actually find documentation to hand to clarify whether it's 12
> bits or 12 bytes per character. (My strong suspecion is that it *is* 12
> *bytes* - since, after all, a single Unicode code point is 24 bits
> already.)
>
> The situation is actually worse than it looks. All this lazy evaluation
> magic is implemented by storing program state around the place, so an
> "unevaluated" string probably takes up more space still...
Using UTF8 encoding, a single character can be 1 BYTE, 2, 3 or 4 Bytes long.
Standard ASCII characters are 1 BYTE. extended ASCII and some more use 2 bytes.
BIT 7 is a flag for 2 bytes codes. BITS 6 and 7 set to 1 = 3 bytes codes. BITS
5, 6 and 7 set to 1 = 4 bytes codes.
Using UTF16 encoding, any character is 2 BYTES long, for a grand total is 65536
possible characters, not all of them been printable.
--
Alain
-------------------------------------------------
Error in operator: add beer
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 01.11.2007 22:15, Orchid XP v7 nous fit lire :
>>> UTF-16 another... and raw storage the worst idea ever!
>>
>> Why would raw storage be the worst idea? There are several advantages.
>> The disadvantage is, of course, an increased memory requirement.
>
> You win some, you loose some. Programming is all about these kinds of
> compromises. :-)
Raw storage is the worst, because they are still adding glyph in
unicode...
When it started, you could have used only 16 bits... now you can't
anymore.
It's not about programming, but "selling" the concept. Raw storage
means no coding, no signaling of actual number of bits used...
On one side, you have billions of people just happy with only 7
bits. May be even with 6 bits it would be enough (you do not need
all these letters in double, do you ? ten number (roman did it with
6 or 7 glyphs) and a few ponctuations signs.)
And other billions which could probably be fine with a fixed 16
bits... maybe not, they are still adding!
and people in the middle, (japanese recommended 1945... and all the
other countries...)
And there is the few who want to be able to mix everything at the
same time in the same document.
UTF-8 is interesting for the 6 bits people, as they do not loose too
much storage and it's simple.
(far more than all the old trick of "active page" and such coding).
But UTF-8 nowadays can extend to already 4 x 8 bits ( or worse ?).
What would have happened if the basic byte was 9 bits long instead
of 8 ? Would they go by 4, or would 3 have been enough ?
Obviously all latin-european coutries would have fit on the same
page of 512, and maybe we would still be playing with paging-code ?
(No thanks!)
--
The superior man understands what is right;
the inferior man understands what will sell.
-- Confucius
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 nous apporta ses lumieres en ce 2007/11/01 17:15:
> Warp wrote:
>
>> UTF-8 encoding "wastes" some bits (in order to use less bits for the
>> most used western characters) and requires at most 4 bytes per character
>> (even though the characters requiring more than 3 bytes are very rarely
>> used).
>
> And thus, like any decent variable-length encoding scheme, it tries to
> assign short codes to common symbols. (Although UTF-8 probably fails
> horribly for, say, Japanese text. I don't actually know...)
For Japanese and Chinese, it average around 3 bytes per characters. It's not so
bad after all, as each characters in those represent a whole word, some even
represent a whole phrase or some complexe concept.
A 1000 glyphs text in Chinese would be, roughly, a 1000 to 5000 words text in
English!
>
>>> UTF-16 another... and raw storage the worst idea ever!
>>
>> Why would raw storage be the worst idea? There are several advantages.
>> The disadvantage is, of course, an increased memory requirement.
>
> You win some, you loose some. Programming is all about these kinds of
> compromises. :-)
All possible characters for all European languages fit in 1 or 2 bytes, and I
think that it also include Arabic and Cyrilic. The Asiatic glyphs use the bulk
of the 3 and 4 bytes codes.
--
Alain
-------------------------------------------------
If you're ever about to be mugged by a couple of clowns, don't hesitate - go for
the juggler.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Alain wrote:
> Garbage collection is BAAAAD!!!
> Any implementation that permit you not to use that is GOOD.
Gee, that's not much of a wild generalisation or anything like that...
> I got hit real bad by garbage collection, ONCE! The programm was running
> realy good, no bug at all, then is just stopped there, nothing at all
> was hapening, the systemwas totaly unresponsive, uterly frozen, then it
> resumed after around 15 minutes. It was garbage collecting.
Yeah, old GC algorithms used to do this. Research has been done,
solutions have been found, etc.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Alain <ele### [at] netscapenet> wrote:
> Using UTF16 encoding, any character is 2 BYTES long, for a grand total is 65536
> possible characters, not all of them been printable.
Wrong. UTF16-encoding results in either 2-byte or 4-byte characters,
depending on the unicode value.
Perhaps you are confusing it with UCS2?
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|