|
 |
Invisible wrote:
> More to the point, there's masses of tricky parsing to wade through all
> the presentational HTML to extract the actual raw data I'm after.
That's because you're getting the results in HTML instead of "via an API".
That's what people are talking about when they talk about web APIs:
presenting data without making you parse it.
"REST" is a pattern for doing this in a way that *also* lets you use a web
browser to reverse-engineer the protocols by looking at it and which is
theoretically kind to intermediate proxies.
"SOAP" is a pattern for doing this in a way that lets you publish the
specification of the interface in a form that a tool can generate code to
decode it into whatever native data structures are available for your language.
What you're doing is called "screen scraping", and yes, it breaks when the
format of the web page changes, which happens often when there's so many
people doing screen scraping that it starts to impact the actual customers
of the web site.
--
Darren New, San Diego CA, USA (PST)
There's no CD like OCD, there's no CD I knoooow!
Post a reply to this message
|
 |