|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> Warp wrote:
>> Btw, what does it do if you jump to the beginning of the string after
>> it has been garbage-collected?
>
> It won't get GCed if it's possible to jump to the beginning.
What Darren said. ;-)
Notice that a string is a *single-linked* list. (I.e., there are "next"
pointers but no "prev" pointers.)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Someone else said that string are actually linked lists of characters
> where each character is garbage-collected. This must require humongous
> amounts of memory, especially considering that one character would only
> require 1 byte...
No - it's Unicode. 24 bits per character. ;-)
But yes, the standard Haskell string type is geared to flexibility, not
performance. See my ByteString comments...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Alain <ele### [at] netscapenet> wrote:
>> It's about 12 BITS per characters, on average, not BYTES! That's useing UTF8
>> encoding. About 16 BITS per characters if using UTF16 encoding.
>> UTF8 is only 7 BITS per characters if you stick to only standard ASCII
>> characters set, but it gets bigger if you also use extended ASCII or characters
>> from foreign alphabets.
>
> Except that if each single character is indeed garbage-collected, that
> requires quite a lot of memory per character (compared to the size of the
> character).
I can't actually find documentation to hand to clarify whether it's 12
bits or 12 bytes per character. (My strong suspecion is that it *is* 12
*bytes* - since, after all, a single Unicode code point is 24 bits already.)
The situation is actually worse than it looks. All this lazy evaluation
magic is implemented by storing program state around the place, so an
"unevaluated" string probably takes up more space still...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> But hey, that's the fad in modern programming: Memory usage and speed
> are irrelevant. Processors are getting faster and memory amounts are
> doubling every couple of years, so who cares if something like a string
> takes 12 times as much memory as it could?
Well, no. There are people working in the Haskell community to fix this
and related problems. (Can't comment on other programming languages...)
The nice thing about Haskell is that is *allows* you to come up with a
better implementation and slip it in there later.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> Simply let lazy evaluation read the file as required, and the GC can
>> delete the data you've already processed transparently behind you.
>
> That's not possible if the file is indeed read into one single string.
Nope, linked list. Each link in the chain can be read seperately when
accessed.
(As a matter of fact, I very much doubt it reads a single character at
once. It probably reads much more than that, for efficiency's sake...)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 01.11.2007 20:00, Orchid XP v7 nous fit lire :
> No - it's Unicode. 24 bits per character. ;-)
Unicode is just a bunch of tables of glyphs (lot of tables, lot of
glyphs).
It is past 24 bits since a few... (even past 32 bits!!!)
The real thing is how you encode all these.
UTF-8 is one way (the popular one these days),
UTF-16 another... and raw storage the worst idea ever!
UTF-8 is about 8 bits for ascii range, usually go up to 24 bits (3 x
8) for classical japanese, 16 for most french variants...
>
> But yes, the standard Haskell string type is geared to flexibility, not
> performance. See my ByteString comments...
Fixed size unicode... if only they stop adding more tables!
--
The superior man understands what is right;
the inferior man understands what will sell.
-- Confucius
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 <voi### [at] devnull> wrote:
> This is why the "ByteString" library was developed.
It just shows that the original article which spawned this thread is
naive. *In theory* you can have all kinds of fancy high-level uber-abstract
constructs which abstract away all the dirty internal details. *In practice*,
however, you still need those dirty details if you want any efficiency.
The fancy theories may be good in a big bunch of programs, but like so many
abstractions before, it's not the final silver bullet of programming.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 <voi### [at] devnull> wrote:
> Notice that a string is a *single-linked* list. (I.e., there are "next"
> pointers but no "prev" pointers.)
You can't random-access a string in haskell? You can just go through it
once and that's it? That would be quite limiting...
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v7 <voi### [at] devnull> wrote:
> The nice thing about Haskell is that is *allows* you to come up with a
> better implementation and slip it in there later.
Is that really so? Can you refactor an existing solution (eg. in a library)
so that it will be more efficient (eg. memorywise) without breaking any
existing programs?
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Darren New" <dne### [at] sanrrcom> wrote
> Warp wrote:
> Depends on the size of the string, really. If my 100-byte string holding
> an email address turns into 1200 bytes, I'm not really going to worry
> about it much. :-)
Ah yes. Next thing you know, somebody else (or you, in a couple of weeks) is
re-using that piece of code to manage an address book, later, a mailing
list. It's never a bad idea for your code to be scalable. OOP in general,
and bottom up thinking has been a minor disaster in this respect, since many
a programmer fail to think about optimized data structures for the overall
picture any more, and things get quickly out of hand memory and
performancewise.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |