POV-Ray: Newsgroups: povray.off-topic: The trouble with XSLT

POV-Ray : Newsgroups : povray.off-topic : The trouble with XSLT		Server Time 1 Jul 2025 11:09:50 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Le Forgeron
Subject: Re: The trouble with XSLT
Date: 23 Feb 2012 10:11:01
Message: <4f465705$1@news.povray.org>

Le 23/02/2012 14:54, Warp a écrit :

>   Does that mean that xFEFF is the zero-width nbsp in both UTF-16 and UTF-8?
> 

Tsss... no cake for you.

0xFEFF is the UTF-16 encoding of BOM (Byte order mark). It is used to
signal endianess with UTF-16 (because 0xFFFE is not a valid utf-16,
indeed U+FFFE will never be a valid glyph).

Encoding U+FEFF in utf-8:
 * has no purpose, there is no endianess to detect for utf-8 encoding
(but it is legit to have a BOM in utf-8)
 * would be done as 3 bytes: 0xEF 0xBB 0xBF

BOM can also be useful when using UTF-32. (and other esoteric encoding
of unicode, such as utf-7, or utf-ebcdic, utf-1 (misnamed, IMHO), ... )

Notice that U+FEFF is deprecated as zero-width non breaking space.
You should use U+2060 (word joiner, zero width space non breaking),
and/or U+200B (zero width space, but breaking). At least in unicode 6.1

>   Also: If the byte order happened to be the reverse of what the editor
> expects (assuming the editor does not support the BOM), wouldn't the
> multi-byte characters be garbage then?
> 
That's the reason utf-16/32 need a BOM for automatic detection.

Post a reply to this message

From: clipka
Subject: Re: The trouble with XSLT
Date: 23 Feb 2012 11:49:39
Message: <4f466e23$1@news.povray.org>

Am 23.02.2012 14:54, schrieb Warp:
> clipka<ano### [at] anonymousorg>  wrote:
>> (2) The editor does not expect a leading BOM in UTF-8; in that case, it
>> /must/ treat it according to the Unicode standard, which explicitly
>> states that the BOM is actually a perfectly valid normal character,
>> which just happens to be one of the many space characters, non-breaking
>> in this case, with zero width; so you're perfectly safe here as well,
>> unless you accidently strip it from the very beginning of the file.
>
>    Does that mean that xFEFF is the zero-width nbsp in both UTF-16 and UTF-8?

If you're talking about codepoint, then obviously yes.

If you're talking about encoded byte sequence, then no; in UTF-16, it 
would be encoded as xFE xFF or xFF xFE respectively, while in UTF-8 it 
would always be encoded as xEF xBB xBF.

>    Also: If the byte order happened to be the reverse of what the editor
> expects (assuming the editor does not support the BOM), wouldn't the
> multi-byte characters be garbage then?

I would be surprised to find an editor supporting UTF-16 but not the 
BOM. As for UTF-8, the byte order for multi-byte characters (i.e. 
codepoints x0100 and above) is unambiguously defined by the standard 
(using a big-endian-ish encoding); as UTF-8 requires bit shifting 
anyway, a byte-reversed encoding would provide no benefit for 
little-endian machines and therefore doesn't exist.

Post a reply to this message

From: clipka
Subject: Re: The trouble with XSLT
Date: 23 Feb 2012 11:55:51
Message: <4f466f97$1@news.povray.org>

Am 23.02.2012 17:49, schrieb clipka:

> BOM. As for UTF-8, the byte order for multi-byte characters (i.e.
> codepoints x0100 and above) is unambiguously defined by the standard

Strike "x0100", replace with "x0080".

Post a reply to this message

From: Warp
Subject: Re: The trouble with XSLT
Date: 23 Feb 2012 12:35:03
Message: <4f4678c5@news.povray.org>

clipka <ano### [at] anonymousorg> wrote:
> As an alternative, forget UTF-8 and go for UTF-16.

  UTF-16 is more compact if the text consists mostly of non-ascii
characters, especially if it contains eg. kanji symbols, hiragana, etc.
(The vast majority of Japanese kanji can be expressed with 2 bytes using
UTF-16 but require 3 bytes with UTF-8.)

  However, if the text consists mostly of ascii characters, such as
English usually does, then UTF-8 is more compact than UTF-16 (which will
basically double the size of the file).

  Support for UTF-16 is still relatively poor (although getting better).
Most modern browsers should handle it ok, though, but it requires for the
server to send the proper http header to tell the browser the encoding,
and configuring the server to do this might not be trivial. (A html file
encoded in UTF-16 will look like garbage.)

-- 
                                                          - Warp

Post a reply to this message

From: nemesis
Subject: Re: The trouble with XML
Date: 23 Feb 2012 13:04:28
Message: <4f467fac@news.povray.org>

The trouble with the whole xml thing is that it's just another 
enterprisey BS to grind CPUs idle times.  You need an xml document, an 
xml document describing the structure of the previous document, yet 
another xml document to describe how to style the original document 
itens, perhaps a xml document describing how to transform your xml 
document into another xml document.  It's an insanely verbose and 
homogeneous pile of human and machine barely readable crap.

People resented it and thus insist on saner formats, such as CSS, JSON 
and real programming languages rather than a shitload of xml abstraction 
layers, tools and java frameworks.

Post a reply to this message

From: clipka
Subject: Re: The trouble with XSLT
Date: 23 Feb 2012 13:19:50
Message: <4f468346$1@news.povray.org>

Am 23.02.2012 18:35, schrieb Warp:
> clipka<ano### [at] anonymousorg>  wrote:
>> As an alternative, forget UTF-8 and go for UTF-16.
>
>    UTF-16 is more compact if the text consists mostly of non-ascii
> characters, especially if it contains eg. kanji symbols, hiragana, etc.
> (The vast majority of Japanese kanji can be expressed with 2 bytes using
> UTF-16 but require 3 bytes with UTF-8.)
>
>    However, if the text consists mostly of ascii characters, such as
> English usually does, then UTF-8 is more compact than UTF-16 (which will
> basically double the size of the file).

I guess a factor 2 in text stream size is not a serious problem with 
today's internet bandwidths.

>    Support for UTF-16 is still relatively poor (although getting better).
> Most modern browsers should handle it ok, though, but it requires for the
> server to send the proper http header to tell the browser the encoding,
> and configuring the server to do this might not be trivial. (A html file
> encoded in UTF-16 will look like garbage.)

It didn't sound like Andy would want to retrieve the XML file from a web 
server, but rather directly from the local file system. Otherwise he 
could simply go for server-side XSLT processing.

Post a reply to this message

From: clipka
Subject: Re: The trouble with XML
Date: 23 Feb 2012 13:37:20
Message: <4f468760$1@news.povray.org>

Am 23.02.2012 19:04, schrieb nemesis:
> The trouble with the whole xml thing is that it's just another
> enterprisey BS to grind CPUs idle times. You need an xml document, an
> xml document describing the structure of the previous document, yet
> another xml document to describe how to style the original document
> itens, perhaps a xml document describing how to transform your xml
> document into another xml document. It's an insanely verbose and
> homogeneous pile of human and machine barely readable crap.
>
> People resented it and thus insist on saner formats, such as CSS, JSON
> and real programming languages rather than a shitload of xml abstraction
> layers, tools and java frameworks.

Businesses do use it quite a lot for data exchange.

But yes, XML as a mere replacement for HTML is a rather silly thing 
(except in its incarnation as XHTML); its legitimate ecologic niche on 
the web is on the server side (if anywhere), and its native habitat is 
actually totally somewhere else.

In some sense, XML is today's CSV: A generic file or data stream format 
a human /can/ create, read and/or modify with an ASCII text editor, but 
that still follows certain clear-cut rules that it can also be evaluated 
by software; and actually just a meta-format, in the sense that the 
content of the individual data fields needs to be agreed upon separately.

Post a reply to this message

From: Warp
Subject: Re: The trouble with XML
Date: 23 Feb 2012 14:06:27
Message: <4f468e32@news.povray.org>

nemesis <nam### [at] gmailcom> wrote:
> The trouble with the whole xml thing is that it's just another 
> enterprisey BS to grind CPUs idle times.  You need an xml document, an 
> xml document describing the structure of the previous document, yet 
> another xml document to describe how to style the original document 
> itens, perhaps a xml document describing how to transform your xml 
> document into another xml document.  It's an insanely verbose and 
> homogeneous pile of human and machine barely readable crap.

  It's verbose, but it has one advantage over most other formats: It's
standardized and pretty well supported.

  It has many advantages over many other formats. One example is character
encoding. With all types of character encodings out there, and support
for them in different file formats and programs being what they are, a
*standardized* form for representing special characters can be really
useful. Also, any program that reads XML ought to support it regardless
of which character encoding it uses (at least if the program uses a
generic XML parser internally).

  Compare this to, for example, just a simple raw .txt file. Which encoding
does it use? ISO-Latin-1? ISO-Latin-9? UTF-8? Shift JIS? EUC-JP? ISO-2022-JP?
Something else completely? Impossible to say. With an XML file you don't have
to care. (As said, if your program uses a generic XML parser, the character
encoding used in the input XML file becomes a non-issue.)

  Not that this exact same thing wouldn't be possible with a less verbose
format, but as said, XML is widely supported so it has this implicit
advantages over many other formats.

-- 
                                                          - Warp

Post a reply to this message

From: Le Forgeron
Subject: Re: The trouble with XML
Date: 23 Feb 2012 15:08:11
Message: <4f469cab@news.povray.org>

Le 23/02/2012 20:06, Warp nous fit lire :
>   It's verbose, but it has one advantage over most other formats: It's
> standardized and pretty well supported.
> 

Well, XML is a container. The problem is lack of intelligent design for
the inside. It is too often the Excel sheet of today: A bunch of
entries, without consistency.

Indeed, with a bit of base64 encapsulation, you could put an BINARY
excel sheet file into a XML document. And advertise that you output XML.

And to make matter more interesting, some find it enterprisey to have
xml inside xml... and other old CSV into XML too (without reinterpreting
the data, so it's just a formatting. A dumb formatting).



>   It has many advantages over many other formats. One example is character
> encoding. With all types of character encodings out there, and support
> for them in different file formats and programs being what they are, a
> *standardized* form for representing special characters can be really
> useful. Also, any program that reads XML ought to support it regardless
> of which character encoding it uses (at least if the program uses a
> generic XML parser internally).

Read it, yes. Understand it, that's another whole story!
Same as: I can read latin or japanese in katakana, with few error on
sound. That does not means I get the meaning. At least I can edit it
like a monkey.

>   Not that this exact same thing wouldn't be possible with a less verbose
> format, but as said, XML is widely supported so it has this implicit
> advantages over many other formats.

XML is interesting when exchanging documents/data, once the big bosses
and their technical staffs have agreed on a XSD. But whenever you add a
third company, you need to negociate another XSD (with a totally
different approach of the data, not even compatible with the first one).

Post a reply to this message

From: Orchid Win7 v1
Subject: Re: The trouble with XSLT
Date: 23 Feb 2012 16:14:21
Message: <4f46ac2d@news.povray.org>

On 23/02/2012 17:35, Warp wrote:
>    Support for UTF-16 is still relatively poor (although getting better).
> Most modern browsers should handle it ok, though, but it requires for the
> server to send the proper http header to tell the browser the encoding,
> and configuring the server to do this might not be trivial. (A html file
> encoded in UTF-16 will look like garbage.)

Isn't that what the HTML encoding tag is for? Or the XML encoding 
declaration?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>