POV-Ray: Newsgroups: povray.beta-test: POV-Ray v3.7 charset behaviour: Re: POV-Ray v3.7 charset behaviour

POV-Ray : Newsgroups : povray.beta-test : POV-Ray v3.7 charset behaviour : Re: POV-Ray v3.7 charset behaviour		Server Time 20 Apr 2024 08:15:38 EDT (-0400)
From: Kenneth
Date: 25 Jun 2018 05:45:00
Message: <web.5b30b8c84a827126a47873e10@news.povray.org>
Living in the U.S, I've never paid much attention to text encodings other than
ASCII ("US-ASCII" I suppose)-- although I've seen "UTF-8" etc. show up from time
to time in others' scene files or include files.

When writing my own include files manually (for my own use, and saved as 'plain
text'), I've always used either Window's NOTEPAD or WORDPAD-- only because they
are simple and available. But I'm having a problem saving even a *simple* UTF-8
file.

WORDPAD (in my Win7 installation) can encode text in different ways:
plain text (.txt)-- is that the same as ANSI?
Rich Text Format (.rtf)-- one of Microsoft's own file-types
Unicode (.txt ?)
Unicode big endian (.txt ?)
UTF-8 (.txt ?)
.... plus a few others like XML and such.

[I assume that the various Unicode files are output as .txt file types.]

The thing is, WORDPAD's  'plain text file' is the ONLY one of its own encodings
that can be successfully read by POV-Ray as a text include file; all the others
produce various error messages. Most of those errors are expected-- but even a
UTF-8 file doesn't work. This is... odd. Perhaps I don't understand how to use
Unicode files. OR, WORDPAD isn't writng the file correctly??

Code example:
#version 3.71;  // using 3.7.1 beta 9
global_settings {assumed_gamma 1.0 charset utf8}
#include "text file as UTF8.txt" // Saved as UTF-8. No strings in
           // the contents, just a single line--  #local R = 45;

Fatal error result:
"illegal character in input file, value is ef"
This happens whether global_settings has charset utf8 or no charset at all.
(BTW, I can't 'see' the 'ef' value when I open the file.) So It appears that
WORDPAD is appending a small header-- a BOM?-- which may not conform to UTF-8
specs(?)

So I tried it a different way, just to be complete:
#version 3.71;
global_settings {assumed_gamma 1.0} // NO charset here
#include "text file as UTF8.txt" // Saved as UTF-8 again, but this one
// has  global_settings{charset utf8}, plus   #local R = 45;

.... which has the same fatal result.

I looked at various Wikipedia pages ( "US-ASCII", Windows WordPad app", "UTF-8",
"Comparison of Unicode encodings" ), but I *still* don't have a full grasp of
the facts:

"WordPad for Windows XP [and later?] added full Unicode support, enabling
WordPad to support multiple languages..."

"[Windows] files saved as Unicode text are encoded as UTF-16 LE.  [not UTF-8,
unless that is specified] ...Such [Unicode] files normally begin with Byte Order
Mark (BOM), which communicates the endianness of the file content. Although
UTF-8 does not suffer from endianness problems, many Windows programs --i.e.
Notepad-- prepend the contents of UTF-8-encoded files with BOM, to differentiate
UTF-8 encoding from other 8-bit encodings." [Other 8-bit encodings meaning
"plain ACSII"?]

[However...]
"The Unicode Standard neither requires nor recommends the use of the BOM for
UTF-8, but warns that it may be encountered at the start of a file as a
transcoding artifact. The presence of the UTF-8 BOM may cause problems with
existing software that can [otherwise] handle UTF-8..."

"UTF-16 [is] incompatible with ASCII files, and thus requires Unicode-aware
programs to display, print and manipulate them, even if the file is known to


"UTF-16 does not have endianness defined, [but] this may be achieved by using a
byte-order mark [BOM] at the start of the text, or assuming big-endian... [but]
UTF-8 is standardised on a single byte order and does not have this problem."

"If any stored data is in UTF-8 (such as file contents or names), it is very
difficult to write a system that uses UTF-16 or UTF-32 as an API. [However,] it
is trivial to translate invalid UTF-16 to a unique (though technically invalid)
UTF-8 string, so a UTF-8 API can control both UTF-8 and UTF-16 files and names,
making UTF-8 preferred in any such mixed environment."
Post a reply to this message