|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
...OK, actually I can only think of 2 things to use a Min Heap for:
1. A heap sort.
2. Building Huffman trees.
Can anybody else think of a good use for a Min Heap? (Or even a Max Heap?)
I can think of plenty of algorithms that require some data in sorted
order, but any sorting algorithm will do for that. (And on current
hardware, in-place algorithms are usually faster than a heap sort.) Can
anybody think of an algorithm that benefits from an actual heap?
[For anybody not already familiar with such things...
A Min Heap is a kind of binary tree that stores data in such a way that
retrieving the "minimum" element is an O(1) operation. Insertion and
deletion are both O(log n), and I believe joining two heaps should be
O(log n) as well.
A "heap sort" involves turning something into a heap (with O(n log n)
complexity) and then repeatedly removing the minimum element (with O(n)
complexity) to yield a fully sorted list.
To build a "Huffman tree", you start with a set of trivial 1-node trees.
Each tree has a probability assigned to it. The algorithm is to simply
select the two lowest-probability tries, remove them from the set,
combine them into a larger tree (with a correspondingly higher
probability) and insert this back into the set. Eventually you will have
1 tree with Pr=1.
If this "set" is in fact a Min Heap sorted by tree porbability, then you
end up with a nice fast algorithm.
Here ends the lesson on algorithms and data structures.]
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Dude, you so should be doing something else than fixing computers for a job.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
> Dude, you so should be doing something else than fixing computers for a
> job.
No, really??
Heh. Now tell me something I don't already know. ;-) [Such as how to
*find* a better job... that'd be useful!]
While I'm not desputing the truth of your statement, I am curios to know
what lead you to this conclusion. The fact that I know what a min heap
is? [It's clearly described on Wikipedia. Doesn't take long to
comprehend.] That I know about Huffman trees? [Again, not hard to look
up.] What?
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> No, really??
>
> Heh. Now tell me something I don't already know. ;-) [Such as how to
> *find* a better job... that'd be useful!]
Try hammering some random search terms into monster.co.uk, even "haskell"
brings up a couple of results! I can't believe there is no job on there for
you.
> While I'm not desputing the truth of your statement, I am curios to know
> what lead you to this conclusion. The fact that I know what a min heap is?
> [It's clearly described on Wikipedia. Doesn't take long to comprehend.]
> That I know about Huffman trees? [Again, not hard to look up.] What?
The fact that you always go on about quite complicated stuff that not many
other people would understand, and I mean not many within the IT industry,
not the entire population.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
>> No, really??
>>
>> Heh. Now tell me something I don't already know. ;-) [Such as how to
>> *find* a better job... that'd be useful!]
>
> Try hammering some random search terms into monster.co.uk, even
> "haskell" brings up a couple of results!
Yep. Already done that. And they have my CV, so every now and then they
email me a job advert or 12. [Usually in Canada... WTF??] I even went to
an interview a few weeks back...
They've sent me a couple of messages telling me I'm "ideal" to work for
HMGCC. ("HMG" as in "Her Majesty's Government". Scary shit, right there!)
> I can't believe there is no job on there for you.
I'm fairly confident somebody somewhere wants to employ somebody like
me. The part I'm worried about is *finding* this company...
>> While I'm not desputing the truth of your statement, I am curios to
>> know what lead you to this conclusion.
>
> The fact that you always go on about quite complicated stuff that not
> many other people would understand, and I mean not many within the IT
> industry, not the entire population.
Pfft. It's all stuff that's readily available on Wikipedia. And it's not
exactly "complicated".
Assuming you already comprehend what a binary tree data structure is, a
heap is just a special case: Every node holds data, and the data in each
node must be less than or equal to the data in both its child nodes.
[Notice there's no ordering constraint on siblings - so the tree need
not be fully ordered, only partially.]
Finding the minimum element is O(1) because, by definition, the minimum
element is always in the root node.
Inserting an element means comparing it to what's in the root node,
putting the smaller of the two back into the root, and propogating the
larger of the two down the tree recursively. [Notice I didn't say which
child node. There are actually several possible algorithms here, each
with different characteristics.]
Deleting an element likewise involves throwing away what's in the root
and moving the smaller of the two child elements up into the root. This
"moving" thereby deletes an element from one of the child nodes, so you
must recurse.
Personally I think Haskell source code says it better, but that's just me.
A Huffman code is a similarly straight-forward idea. When you save a
normal ASCII text file, each character is given an 8-bit code. [But if
you want to be technical, ASCII is a 7-bit code, so the extra bit is
always a 0.] The idea behind Huffman codes is that instead of every
character having an 8-bit code, different characters can have codes of
different lengths.
If you give the most common characters shorter codes [and therefore, by
necessity, give the rarer characters longer codes], then simply
re-encoding your text from ASCII to this new coding system will make
your file smaller! :-D
But how to construct this new coding scheme? And if codes can be
different lengths, how do you figure out where one code ends and the
next begins?
The answer to the second question is that the code must be a "prefix
code". That is, no full code can ever be the prefix of another full
code. (It may be a suffic, but never a prefix.) So if 011 is a valid
code, all on its own, then no other valid code can begin with 011
(although it can END with 011). This means that if the decoder sees 011,
it knows it just read a whole code, and the next bit is a seperate code.
An easy way to build a prefix code is to make each code be a set of
directions from the root of a tree to the leaves. Assuming each leaf
stores a character, each leaf's address will be a prefix code.
So, "all" we have to do is build a tree where the rare characters are
deep in the tree, and the common characters are shallow. We then read
off the tree address for each character and we will have our prefix code.
One obvious way is to take our set of characters and split it into two
halves such that the total probability for each half is roughly 50%.
Then all the characters in one set go on one side of our tree, and all
the characters in the other half go on the other side of our tree.
Recurse until you're done, and you have a nice tree.
This is called a Shannon tree, and it doesn't work very well. It's
actually fairly hard to split a set into two equal-probability sets. (I
have a vague recollection it might be NP-complete. Darren will know.)
The *other* thing you can do is work backwards. The rarest characters
should end up at the bottom of the tree, so let's put those into trees
*first*. And the common characters need to end up near the top, so let's
add those *last*. In other words, pick the two lowest-probability trees,
join them into a single tree, repeat until, like Hilander, there is Only
One.
*This* is called a Huffman tree. And it works much better. So, to summarise:
- Assign a "probability" to each character. [Guess them, estimate them,
count them up, whatever.]
- Build a set of trivial trees each consisting of a leaf containing a
character and its probability.
- Select the two lowest-probability trees, remove them from the set,
join them into a larger tree, insert this back into the set. Repeat
until only 1 tree remains.
- For each ASCII code in your input, replace it with the tree address of
that character in your tree.
- Assuming the probabilities you assigned were correct, the output file
is *guaranteed* to be smaller than or, in the worst case (every
character has equal probability), exactly the same size as the original.
(But note that you probably need to write down what the tree was so the
decoder can reconstruct it! This will make the file fractionally larger.)
Note that *any* prefix code will work. There's nothing "magical" about
Huffman codes. The Huffman algorithm is just an efficient way to build
such a code. If you know that, for example, low numbers are common in
your input and high numbers are rare, you can use a "Fibonacci code",
which is a prefix code with a predefined set of probability assumptions,
so it takes no effort to compute. [It's based on Fibonacci numbers,
hence the name.]
For something like a sound signal, where most samples are near to zero,
this could be a win. (Also, look at Predictive Coding, where you make a
"prediction" for each sample based on some property of the last few
samples, and record the "error" between the prediction and the actual
sample. Assuming your predictions are good, the errors will always be
"small" numbers.)
Huffman codes will work on anything, and they work quite well. There
*is*, however, a thing called "arithmetic coding", which does the same
thing but better. You see, it assigns a *fractional* number of bits to
each character.
That probably sounds completely incomprehensible. Put simply, the
compressed file contains an integral number of bits. (Duh!) But if you
encode 100 identical characters, the output bit length need not be
divisible by 100. You could say the number of bits gets rounded up to
the nearest whole number when you complete the file.
In practice, what this means is that the number of bits per character
can be closer to the binary logarithm of the probability for that
character - the "perfect" bit size. So you get a few percent more
compression this way. OTOH, the algorithm is harder. And patented, I
believe...?
Of course, comming up with better probability estimates will have a much
more dramatic effect on compression. Running a Huffman coder on a normal
text file gives typically 50% compression. But take into account letter
*combination* frequencies and you can go much higher than that.
Theoretically, PPM + arithmetic coding should give the best compression
that money can buy!
You know what? I think I might have just disproven the point I was
trying to make...? <:-,
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> They've sent me a couple of messages telling me I'm "ideal" to work for
> HMGCC. ("HMG" as in "Her Majesty's Government". Scary shit, right there!)
I would have thought so too, I had a couple of friends that seemed to posses
the same skills and interests that you do, and they got jobs there. The pay
isn't that great, but it would be good experience.
> Pfft. It's all stuff that's readily available on Wikipedia. And it's not
> exactly "complicated".
Not to you maybe, but it would be to lots of "IT" people I know. Just
believe me, it's not normal for someone working in IT (and I don't mean
software development) to know about half the stuff you post about (and seem
to understand).
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> Heh. Now tell me something I don't already know. ;-) [Such as how to
> *find* a better job... that'd be useful!]
>
Actually, with respect to finding a better job, the problem has less to do with
*knowing* what to do (which isn't all that complicated) than with resolving to
actually do it. The work-a-day world can can easily leave one feeling drained
and thoroughly disinclined to take on as demanding a project as conducting a
proper job search. Job hunting not only uses up lots of time and effort in the
physical sense, it is also emotionally taxing in that it requires one to face
the likelihood of multiple rejections before achieving any success. It's
rather unlikely that the following list contains anything that you couldn't
have thought of yourself, but it might be useful to have these ideas in front
of you for consideration, so here goes:
1. Don't wait until you NEED a new job to look for a new job. That's how
people end up having to take yet another job that they don't get anything from
but a paycheck.
2. Make a short list NOW of places where you would like to be working in two
years and establish communications with both the human resources department AND
the department(s) that you would like to work for. Be very explicit that your
interest is specific, serious, focused and long-term and that you are NOT
merely inquiring in regard to current posted openings.
3. Create and *maintain* a library of job search support documents such as
resumes in various formats, references, recommendations. college transcripts
and examples of your work so that you can take advantage of any opportunity
INSTANTLY using top quality documents.
4. Always send important communications in BOTH electronic and paper format.
In any given company, it might be either one that will be retained and used to
contact you should an opportunity arise. Always send your resume and cover
letter UNFOLDED and in a presentation cover in a flat 9' x 12' envelope. Making
the effort to introduce yourself with professional looking documents really DOES
matter.
5. Keep records of whom you have contacted and when. Never allow more than
ninety days to go by without renewing a contact. Only make contact more
frequently than this after you have established a relationship with a specific
individual who has expressed a positive interest.
Again, none of this is any great revelation. The real question is whether or not
one is actually DOING it.
Best Regards and Good Hunting,
Mike C.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> They've sent me a couple of messages telling me I'm "ideal" to work
>> for HMGCC. ("HMG" as in "Her Majesty's Government". Scary shit, right
>> there!)
>
> I would have thought so too, I had a couple of friends that seemed to
> posses the same skills and interests that you do, and they got jobs
> there. The pay isn't that great, but it would be good experience.
Heh. When I went to that job interview a few weeks back, the guy said "I
don't think you belong in a commercial enterprise. I think you would be
better as a research associate at a university somewhere." Yeah, cos
research is what I do best...? Wait, WTF? o_O
I'm not too sure about HMGCC. The whole "you will be cleared to a high
level of security clearance" think kinda freaks me out in that whole Big
Brother type way. o_O
Plus, all their software development stuff clearly says all over it
"prior experience of developing large-scale applications is an absolute
requirement". So I guess I fail, right there.
>> Pfft. It's all stuff that's readily available on Wikipedia. And it's
>> not exactly "complicated".
>
> Not to you maybe, but it would be to lots of "IT" people I know. Just
> believe me, it's not normal for someone working in IT (and I don't mean
> software development) to know about half the stuff you post about (and
> seem to understand).
Hmm. Tell me something - all that stuff I just posted? Does it make any
semblance of comprehensible sense?
To *me* it makes sense, but then *I* wrote it. And I'm aware that often
an expert will try to explain something and completely forget some vital
item of information that the audience doesn't have but is so "obvious"
to the expert that they forget to mention it...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> Actually, with respect to finding a better job, the problem has less to do
> with
> *knowing* what to do (which isn't all that complicated) than with
> resolving to
> actually do it. The work-a-day world can can easily leave one feeling
> drained
> and thoroughly disinclined to take on as demanding a project as conducting
> a
> proper job search.
"Take time to sharpen your saw"
I forgot who told me that, it was probably quoted from some management fluff
book, but it's true. It's better to take a break from your routine and try
to optimise it, it will be beneficial in the long run, even if it means you
don't get as much work done that day. And by optimise, it could just be
writing a program in Haskell to speed up some repetitive task, reorganising
your entire workload, or even looking for a new job.
I'm always quoting that to our Japanese colleagues because they seem to work
such long hours so inefficiently, and often give the "we don't have time"
excuse for not doing something.
> 2. Make a short list NOW of places where you would like to be working in
> two
> years and establish communications with both the human resources
> department AND
> the department(s) that you would like to work for.
This link may help, if you can't think of enough places:
http://www.timesonline.co.uk/tol/life_and_style/career_and_jobs/best_100_companies/best_100_tables/
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Heh. Now tell me something I don't already know. ;-) [Such as how to
>> *find* a better job... that'd be useful!]
>
> Actually, with respect to finding a better job, the problem has less to do with
> *knowing* what to do (which isn't all that complicated) than with resolving to
> actually do it.
Probably not complicated if you know what you're doing. I haven't got a
clue. Sure, I've tried the obvious things, but those haven't worked. So
now I'm kinda not sure what's next.
> The work-a-day world can can easily leave one feeling drained
> and thoroughly disinclined to take on as demanding a project as conducting a
> proper job search.
Yeah, that's true enough. Sometimes I struggle to find the motivation to
do stuff I *like* doing, damnit! (It can be very hard to try to motivate
yourself to do stuff when you know nobody is actually going to care one
way or the other.)
Actually, I've been thinking about maybe taking a week off work so I can
spend some time doing this "for real". Unfortunately every place of
business shuts 20 minutes before I get home. [Quite maddening really!]
If I had a week, I'd be able to go places and speak to people and stuff...
> 1. Don't wait until you NEED a new job to look for a new job.
Heh. I'm tempted to leave this job right NOW, even though I have nowhere
else to go! But clueless as this company is, they've been around for 20
years, and I don't suppose they're going anywhere real soon. They'll
just continue to blunder on, making spectacular financial losses, as
they always have.
> 2. Make a short list NOW of places where you would like to be working in two
> years and establish communications with both the human resources department AND
> the department(s) that you would like to work for. Be very explicit that your
> interest is specific, serious, focused and long-term and that you are NOT
> merely inquiring in regard to current posted openings.
OK, I fail here. I cannot think of *any* company I'd actually want to
work for.
I mean, theoretically working for somebody like Yamaha or Korg or even
Nokia "sounds" realy neat. But actually? I have no clue who would really
be worth working with. My dad works in a laboratory, which "sounds"
really cool, but actually... isn't.
I did look into a list of big-name technology companies to see where
they have bases of employment. IIRC, Nokia, HP and Erricson have big UK
bases, but they're nowhere near where I live. Yamaha has a big place in
my city, but I think that's manufacturing not R&D. Korg has a base here
too, but I couldn't find any data without speaking to a human. That's
about as far as I got.
> 3. Create and *maintain* a library of job search support documents such as
> resumes in various formats, references, recommendations. college transcripts
> and examples of your work so that you can take advantage of any opportunity
> INSTANTLY using top quality documents.
I have an up-to-date CV. Spent lots of money on consultants to help me
sharpen it. It's currently available online at Monster.co.uk. Every now
and then some random agency contacts me, asks me a few questions, and
then is never heard from again.
It says on my CV "references available on request", but it strikes me
taht I don't actually *have* anybody I could point them to. Nobody knows
me! All the management in my current place of work recently changed, so
nobody in management here really knows anything about me. My university
shut down when I graduated. I don't have any "friends" or anything...
As for examples of my work... it strikes me that I've almost never
produced a "finished" program in my life. That's not a good thing. And
obviously, nobody is going to employ me to program in Haskell; they'll
want C, C++, Java or VB (or maybe some scripting language). I haven't
touched any of those in years now.
And then there's the minor detail that although I know a lot of stuff
about stuff... how many people actually need to know what a Huffman tree
is? None. Nobody needs to know this. It serves no useful purpose. And
Warp is probably right: I'm not a very good programmer.
> 4. Always send important communications in BOTH electronic and paper format.
You can't do that without a postal address. (And stamps. Do you have any
idea how hard it is to purchase stamps?!)
> 5. Keep records of whom you have contacted and when. Never allow more than
> ninety days to go by without renewing a contact. Only make contact more
> frequently than this after you have established a relationship with a specific
> individual who has expressed a positive interest.
Fair enuf...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|