|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
And lo on Wed, 21 Nov 2007 13:51:37 -0000, Fa3ien
<fab### [at] yourshoesskynetbe> did spake, saying:
<snip>
> I tought 'well, just do some javascript, put the content of the url in
> an iframe, read it, and act accordingly'. Done that. It doesn't work.
> Why ? The XMLHTTPRequest function, which is used to put the content of
> the iframe in a string, is prohibited (in any browser in existence) to
> work with content from another domain. Ouch !
If it's any help I know IE6 didn't have this security restriction, but
that 'hole' may have been plugged now.
> I found some GreaseMonkey script which pretended to allow bypass of this
> "cross-domain policy", but it didn't work.
>
> So I'm still at the start of this seemingly simple project. I'm
> currently thinking of getting the pages with WGET, but can I pilot WGET
> from Javascript ? Or should I try another language ? Or a completely
> different path ?
Depends what you've got to work with and how it's going to be applied. If
you've a PHP server then as Gilles said that's your best bet, otherwise
you're running a 'script' directly.
--
Phil Cook
--
I once tried to be apathetic, but I just couldn't be bothered
http://flipc.blogspot.com
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas Alvarez wrote:
> I would do it with PHP (outside a webserver), because I did many
> scraping scripts that way. It's easy to parse HTML with PHP's DOM and
> loadHTML, handles all the bad syntax for you.
As long as you start a new process for each page, you'll be OK. From
what I can tell, PHP never, ever deallocates memory. Try walking thru
and processing a 600-megaline database table in CLI PHP, and you'll
regret it.
You could write one that sucks up URLs (or runs wget), then iterates
over the resulting files with one PHP script each or something.
Or use Tcl, which is what I did.
--
Darren New / San Diego, CA, USA (PST)
It's not feature creep if you put it
at the end and adjust the release date.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> As long as you start a new process for each page, you'll be OK. From
> what I can tell, PHP never, ever deallocates memory. Try walking thru
> and processing a 600-megaline database table in CLI PHP, and you'll
> regret it.
I never had any problem with that, I let my spam-delete.php running for
more than 24 hours and its memory usage didn't grow.
Are you calling mysql_free_result? :)
"mysql_free_result() only needs to be called if you are concerned about
how much memory is being used for queries that return large result sets.
All associated result memory is automatically freed *at the end of the
script's execution*." (emphasis mine)
Post a reply to this message
|
|
| |
| |
|
|
From: Tor Olav Kristensen
Subject: Re: gathering infos from web pages
Date: 21 Nov 2007 19:18:18
Message: <4744caca@news.povray.org>
|
|
|
| |
| |
|
|
Fa3ien wrote:
...
> So I'm still at the start of this seemingly simple project. I'm
> currently thinking
> of getting the pages with WGET, but can I pilot WGET from Javascript ?
You may also want to look at cURL:
http://en.wikipedia.org/wiki/CURL
Other useful tools:
sed, awk, grep and others
http://tldp.org/LDP/abs/html/textproc.html
> Or should
> I try another language ? Or a completely different path ?
bash, perl, tcl/tk ?
http://en.wikipedia.org/wiki/Bash
http://en.wikipedia.org/wiki/Perl
http://en.wikipedia.org/wiki/Tcl
--
Tor Olav
http://subcube.com
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Use a bash script to pull the pages down to a local webserver, manipulate
and serve up from there instead?
Jim
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Jim Henderson <nos### [at] nospamcom> wrote:
> Use bash
Btw, I always wonder why it seems that 100% of people out there who
use a shell use bash. I know it's because it's (for whatever reason)
the default shell in all linux distros, but I just can't understand
what's so great about bash.
Personally I use zsh and there are so many handy features which I can't
find in bash that I really don't understand the popularity of the latter.
Bash isn't even 100% sh-compatible.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Btw, I always wonder why it seems that 100% of people out there who
> use a shell use bash. I know it's because it's (for whatever reason)
> the default shell in all linux distros, but I just can't understand
> what's so great about bash.
>
> Personally I use zsh and there are so many handy features which I can't
> find in bash that I really don't understand the popularity of the latter.
> Bash isn't even 100% sh-compatible.
I don't understand why 60% of the packages on the Haskell package site
use bash scripts as part of their installation process. (As you can
probably anticipate, this means that you can't install those on
Windoze.) It's actually rather annoying... We develop a powerful
cross-platform high-level programming langauge, and people insist on
using bash scripts. Gah! >_<
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On Thu, 22 Nov 2007 08:45:37 -0500, Warp wrote:
> Btw, I always wonder why it seems that 100% of people out there who
> use a shell use bash. I know it's because it's (for whatever reason) the
> default shell in all linux distros, but I just can't understand what's
> so great about bash.
Personally, I use tcsh, but I've often wondered this as well.
Jim
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Nicolas Alvarez wrote:
> Are you calling mysql_free_result? :)
Actually, I think the problem was that I was reading a CSV file and
outputting SQL text that I'd later "mysql <xyz.sql" sort of thing.
> "mysql_free_result() only needs to be called if you are concerned about
> how much memory is being used for queries that return large result sets.
> All associated result memory is automatically freed *at the end of the
> script's execution*." (emphasis mine)
I'm pretty sure the problem is that local variables weren't getting
freed. Nothing to do with actual mysql accesses. I'll see if I still
have the original code and see what it's doing that might have been leaking.
--
Darren New / San Diego, CA, USA (PST)
It's not feature creep if you put it
at the end and adjust the release date.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Invisible" <voi### [at] devnull> wrote in message
news:47444787$1@news.povray.org...
> Fa3ien wrote:
>
> >> Obviously I recommend Haskell for this task - and, obviously, you're
> >> going to say no. ;-)
> >
> > How did you guess ?
>
> Mmm, because everybody hates Haskell? ;-)
>
> >> That being the case, I'm pretty confident that Perl / Python / Ruby /
> >> Tcl / any of those hackish scripting languages will have a library
> >> that makes this reasonably easy.
> >
> > I'm tempted to get a hand on Ruby, for various reasons. Maybe I can
> > do it in Lisp... At first, I rejected the idea because it would
> > need AutoCAD, but, no, there might be some free LISP intepreter,
> > I should check.
>
> I'm pretty sure I looked into this myself, and found that there are
> indeed free Common Lisp interpreters out there.
Most are compilers these modern days, in an interactive environment. Anyway,
try out SBCL, CLisp, or OpenMCL depending on your platform.
http://gigamonkeys.com/book/lispbox/ for a nice package of several
distribution.
>
> (And there's always emacs... bahahaha!)
emacs has elisp, a viable lisp dialect. not sure where the humor is, emacs
is one of the most robust peices of end user software. guess what it's
written in...
It's so hard to ignore the troll.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |