|
![](/i/fill.gif) |
On 21 Jun 2000 19:33:33 -0400, Ron Parker wrote:
>On Thu, 22 Jun 2000 01:47:19 +0300, Peter Popov wrote:
>>On 21 Jun 2000 09:11:59 -0400, ron### [at] povray org (Ron Parker)
>>wrote:
>>
>>>>How about the ability to do a search on the newsgroups that are held on the
>>>>server.
>>>
>>>That's one of my on-again, off-again projects, but I haven't yet found an
>>>indexing engine that can handle that much data efficiently.
>>
>>You know the saying, if you can't find one... :)
>
>Easier said than done. I found one, actually, that seems like it can do
>the job, but when it's trying to index the more active groups like this
>one it dies when the indexing process reaches 64M of memory. I have 64M
>of physical memory and 128M of swap, but it never seems to touch swap.
>I haven't figured that out yet.
(following up to myself)
I finally got it figured out. I now have word indexes of all of the
newsgroups on this server, including articles on off-topic that are long
gone. The collection is 18 megs, zipped (about 50 megs, unzipped) and
current as of 2 AM EST yesterday morning. I still have to figure out how
to do incremental updates, but that doesn't look too hard.
Now... what do I do with them? I don't know of any free web hosts that
allow custom CGI and will let me host 50 megs of indexes plus the swish-e
executable. Any suggestions? I have DSL, but it's only 128kb upstream
and my ISP doesn't let me run a server, so I'm stuck with external
solutions. For the best results, I'd also have to have a database of
names, dates, subjects, and messageIDs so I could look up info on the
results without having to hit the news server. That's easy enough to
build, but it'd take up lots of space too.
--
Ron Parker http://www2.fwi.com/~parkerr/traces.html
My opinions. Mine. Not anyone else's.
Post a reply to this message
|
![](/i/fill.gif) |