POV-Ray: Newsgroups: povray.off-topic: gathering infos from web pages: Re: gathering infos from web pages

POV-Ray : Newsgroups : povray.off-topic : gathering infos from web pages : Re: gathering infos from web pages		Server Time 12 Jul 2025 13:13:04 EDT (-0400)

From: Darren New
Date: 21 Nov 2007 11:48:44
Message: <4744616c$1@news.povray.org>

Nicolas Alvarez wrote:
> I would do it with PHP (outside a webserver), because I did many 
> scraping scripts that way. It's easy to parse HTML with PHP's DOM and 
> loadHTML, handles all the bad syntax for you.

As long as you start a new process for each page, you'll be OK. From 
what I can tell, PHP never, ever deallocates memory.  Try walking thru 
and processing a 600-megaline database table in CLI PHP, and you'll 
regret it.

You could write one that sucks up URLs (or runs wget), then iterates 
over the resulting files with one PHP script each or something.

Or use Tcl, which is what I did.

-- 
   Darren New / San Diego, CA, USA (PST)
     It's not feature creep if you put it
     at the end and adjust the release date.

Post a reply to this message