POV-Ray : Newsgroups : povray.off-topic : gathering infos from web pages Server Time
14 Nov 2024 23:03:33 EST (-0500)
  gathering infos from web pages (Message 10 to 19 of 29)  
<<< Previous 9 Messages Goto Latest 10 Messages Next 10 Messages >>>
From: Phil Cook
Subject: Re: gathering infos from web pages
Date: 21 Nov 2007 11:44:05
Message: <op.t15hvnhqc3xi7v@news.povray.org>
And lo on Wed, 21 Nov 2007 13:51:37 -0000, Fa3ien  
<fab### [at] yourshoesskynetbe> did spake, saying:

<snip>
> I tought 'well, just do some javascript, put the content of the url in  
> an iframe, read it, and act accordingly'.  Done that. It doesn't work.  
> Why ?  The XMLHTTPRequest function, which is used to put the content of  
> the iframe in a string, is prohibited (in any browser in existence) to  
> work with content from another domain. Ouch !

If it's any help I know IE6 didn't have this security restriction, but  
that 'hole' may have been plugged now.

> I found some GreaseMonkey script which pretended to allow bypass of this  
> "cross-domain policy", but it didn't work.
>
> So I'm still at the start of this seemingly simple project.  I'm  
> currently thinking of getting the pages with WGET, but can I pilot WGET  
> from Javascript ? Or should I try another language ?  Or a completely  
> different path ?

Depends what you've got to work with and how it's going to be applied. If  
you've a PHP server then as Gilles said that's your best bet, otherwise  
you're running a 'script' directly.

-- 
Phil Cook

--
I once tried to be apathetic, but I just couldn't be bothered
http://flipc.blogspot.com


Post a reply to this message

From: Darren New
Subject: Re: gathering infos from web pages
Date: 21 Nov 2007 11:48:44
Message: <4744616c$1@news.povray.org>
Nicolas Alvarez wrote:
> I would do it with PHP (outside a webserver), because I did many 
> scraping scripts that way. It's easy to parse HTML with PHP's DOM and 
> loadHTML, handles all the bad syntax for you.

As long as you start a new process for each page, you'll be OK. From 
what I can tell, PHP never, ever deallocates memory.  Try walking thru 
and processing a 600-megaline database table in CLI PHP, and you'll 
regret it.

You could write one that sucks up URLs (or runs wget), then iterates 
over the resulting files with one PHP script each or something.

Or use Tcl, which is what I did.

-- 
   Darren New / San Diego, CA, USA (PST)
     It's not feature creep if you put it
     at the end and adjust the release date.


Post a reply to this message

From: Nicolas Alvarez
Subject: Re: gathering infos from web pages
Date: 21 Nov 2007 12:17:25
Message: <47446825@news.povray.org>

> As long as you start a new process for each page, you'll be OK. From 
> what I can tell, PHP never, ever deallocates memory.  Try walking thru 
> and processing a 600-megaline database table in CLI PHP, and you'll 
> regret it.

I never had any problem with that, I let my spam-delete.php running for 
more than 24 hours and its memory usage didn't grow.

Are you calling mysql_free_result? :)

"mysql_free_result() only needs to be called if you are concerned about 
how much memory is being used for queries that return large result sets. 
All associated result memory is automatically freed *at the end of the 
script's execution*." (emphasis mine)


Post a reply to this message

From: Tor Olav Kristensen
Subject: Re: gathering infos from web pages
Date: 21 Nov 2007 19:18:18
Message: <4744caca@news.povray.org>
Fa3ien wrote:
...
> So I'm still at the start of this seemingly simple project.  I'm
> currently thinking
> of getting the pages with WGET, but can I pilot WGET from Javascript ?

You may also want to look at cURL:

http://en.wikipedia.org/wiki/CURL


Other useful tools:

sed, awk, grep and others

http://tldp.org/LDP/abs/html/textproc.html


> Or should
> I try another language ?  Or a completely different path ?

bash, perl, tcl/tk ?

http://en.wikipedia.org/wiki/Bash
http://en.wikipedia.org/wiki/Perl
http://en.wikipedia.org/wiki/Tcl

-- 
Tor Olav
http://subcube.com


Post a reply to this message

From: Jim Henderson
Subject: Re: gathering infos from web pages
Date: 22 Nov 2007 00:45:47
Message: <4745178b$1@news.povray.org>
Use a bash script to pull the pages down to a local webserver, manipulate 
and serve up from there instead?

Jim


Post a reply to this message

From: Warp
Subject: Re: gathering infos from web pages
Date: 22 Nov 2007 08:45:37
Message: <47458801@news.povray.org>
Jim Henderson <nos### [at] nospamcom> wrote:
> Use bash

  Btw, I always wonder why it seems that 100% of people out there who
use a shell use bash. I know it's because it's (for whatever reason)
the default shell in all linux distros, but I just can't understand
what's so great about bash.

  Personally I use zsh and there are so many handy features which I can't
find in bash that I really don't understand the popularity of the latter.
Bash isn't even 100% sh-compatible.

-- 
                                                          - Warp


Post a reply to this message

From: Invisible
Subject: Re: gathering infos from web pages
Date: 22 Nov 2007 09:12:03
Message: <47458e33$1@news.povray.org>
Warp wrote:

>   Btw, I always wonder why it seems that 100% of people out there who
> use a shell use bash. I know it's because it's (for whatever reason)
> the default shell in all linux distros, but I just can't understand
> what's so great about bash.
> 
>   Personally I use zsh and there are so many handy features which I can't
> find in bash that I really don't understand the popularity of the latter.
> Bash isn't even 100% sh-compatible.

I don't understand why 60% of the packages on the Haskell package site 
use bash scripts as part of their installation process. (As you can 
probably anticipate, this means that you can't install those on 
Windoze.) It's actually rather annoying... We develop a powerful 
cross-platform high-level programming langauge, and people insist on 
using bash scripts. Gah! >_<


Post a reply to this message

From: Jim Henderson
Subject: Re: gathering infos from web pages
Date: 22 Nov 2007 15:46:50
Message: <4745eaba$1@news.povray.org>
On Thu, 22 Nov 2007 08:45:37 -0500, Warp wrote:

>   Btw, I always wonder why it seems that 100% of people out there who
> use a shell use bash. I know it's because it's (for whatever reason) the
> default shell in all linux distros, but I just can't understand what's
> so great about bash.

Personally, I use tcsh, but I've often wondered this as well.

Jim


Post a reply to this message

From: Darren New
Subject: Re: gathering infos from web pages
Date: 23 Nov 2007 11:42:10
Message: <474702e2$1@news.povray.org>
Nicolas Alvarez wrote:
> Are you calling mysql_free_result? :)

Actually, I think the problem was that I was reading a CSV file and 
outputting SQL text that I'd later "mysql <xyz.sql" sort of thing.

> "mysql_free_result() only needs to be called if you are concerned about 
> how much memory is being used for queries that return large result sets. 
> All associated result memory is automatically freed *at the end of the 
> script's execution*." (emphasis mine)

I'm pretty sure the problem is that local variables weren't getting 
freed. Nothing to do with actual mysql accesses.  I'll see if I still 
have the original code and see what it's doing that might have been leaking.

-- 
   Darren New / San Diego, CA, USA (PST)
     It's not feature creep if you put it
     at the end and adjust the release date.


Post a reply to this message

From: Ross
Subject: Re: gathering infos from web pages
Date: 26 Nov 2007 15:17:29
Message: <474b29d9$1@news.povray.org>
"Invisible" <voi### [at] devnull> wrote in message
news:47444787$1@news.povray.org...
> Fa3ien wrote:
>
> >> Obviously I recommend Haskell for this task - and, obviously, you're
> >> going to say no. ;-)
> >
> > How did you guess ?
>
> Mmm, because everybody hates Haskell? ;-)
>
> >> That being the case, I'm pretty confident that Perl / Python / Ruby /
> >> Tcl / any of those hackish scripting languages will have a library
> >> that makes this reasonably easy.
> >
> > I'm tempted to get a hand on Ruby, for various reasons. Maybe I can
> > do it in Lisp... At first, I rejected the idea because it would
> > need AutoCAD, but, no, there might be some free LISP intepreter,
> > I should check.
>
> I'm pretty sure I looked into this myself, and found that there are
> indeed free Common Lisp interpreters out there.

Most are compilers these modern days, in an interactive environment. Anyway,
try out SBCL, CLisp, or OpenMCL depending on your platform.
http://gigamonkeys.com/book/lispbox/ for a nice package of several
distribution.

>
> (And there's always emacs... bahahaha!)

emacs has elisp, a viable lisp dialect. not sure where the humor is, emacs
is one of the most robust peices of end user software. guess what it's
written in...

It's so hard to ignore the troll.


Post a reply to this message

<<< Previous 9 Messages Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.