POV-Ray: Newsgroups: povray.pov4.discussion.general: On povr's accidental text{cmap} capability.

POV-Ray : Newsgroups : povray.pov4.discussion.general : On povr's accidental text{cmap} capability.		Server Time 27 Dec 2024 04:20:06 EST (-0500)

From: William F Pokorny
Subject: On povr's accidental text{cmap} capability.
Date: 1 Apr 2023 04:08:15
Message: <6427e66f$1@news.povray.org>

Elsewhere while working on the keyword_status() idea - which became 
initially two keywords in word_is() and word_get() - I discovered the 
povr fork picked up Christoph's text object cmap keyword and associated 
code changes due simply to when I created the povr fork.

I admit to not paying attention to 'cmap' when it was being worked on in 
early 2019. I wrongly thought it was freetype fork specific and I tuned 
it out while busy with other code play.

Well, I have it. I took a look over the past couple days. The 'cmap' 
functionality is going to stay in povr. It's impossible to pass on what 
it already offers - even if bugs later turn up.

See attached image. Using v3.8 beta 2 rendering some utf8 text in a 
text{} object at top. On the bottom rendering the same utf8 text with 
povr - identical results with or without the cmap{} block.

I worked first with a cmap{} block and it took the ttfdump utility for 
me to guess probably something like 'cmap { 0,3 charset 0 }' was what I 
needed while using the .../dejavu/DejaVuSans.ttf font file coming with 
Ubuntu 22.04.

I then wondered what would happen if I removed the cmap{} block 
altogether. I was surprised to find it worked just as well. This comes 
to the first character map table in the font being used as the default 
and it being 0,3. Plus the 'cmap{}' related changes when the font is 
read correctly read the right cmap information in the font file.

So! On unix / linux what use is the cmap{} block you ask. Well it might 
or might not be of much use. I don't have enough experience to know.

Do the font files on linux usually have already usable defaults as 
encoded? I'd make a small bet those fonts shipping with linux 
distributions do. Font files coming from, say, a windows environment - 
maybe not. Perhaps the cmap{} functionality will be needed to pick the 
correct internal cmap with those. I'm guessing.

---

Related. The DejaVuSans.ttf font I picked also encodes two indows 
specific character maps too. The following cmap examples also work to 
some degree or other:

cmap { 3,1 charset 0 } // Works as well as linux (Apple) {0,3 charset 0}
cmap { 3,1 charset 1252 } // Works, but less well in general (a).

Anyhow. Going to keep the functionality in povr and play with it. We'll 
see what other issues pop up.

---
Aside: For a short time saw a very strange parsing error more text{} 
related than cmap{}. I started with some example cmap{} code off the 
newsgroup and it initially failed. It seemed somehow related to a 
semicolon following the string declare used within the text{} object! 
The SDL declare looked like:

#declare MyText = "..."; // Semicolon not needed.

However, as I played the parsing error went away and, try as I might, 
I've been unable to reproduce that parsing fail. Maybe something like 
dos line terminations vs unix/linux ones - or some utf8 character 
elsewhere in the scene mangling things. I don't know. Still, for the 
record, I did see what looked to me to be a bogus parsing fail while 
working on a scene with cmap{}.

Bill P.

(a) - With charset 1252 and similar it might be for the restricted set 
of windows characters you are better off for 'reasons.' When it or other 
charsets used, it seems to be dropping utf8 characters outside the first 
256 or something like that. On linux I suspect the charset won't be 
useful unless trying to match some specific windows character set behavior.

Post a reply to this message

Attachments:
Download 'cmapstory.png' (10 KB)

Preview of image 'cmapstory.png'

From: Bald Eagle
Subject: Re: On povr's accidental text{cmap} capability.
Date: 1 Apr 2023 10:45:00
Message: <web.6428431f1ce613a61f9dae3025979125@news.povray.org>

William F Pokorny <ano### [at] anonymousorg> wrote:
> I admit to not paying attention to 'cmap' when it was being worked on in
> early 2019. I wrongly thought it was freetype fork specific and I tuned
> it out while busy with other code play.

I was going to point you to that thread, but I had figured you knew about it.
(oops)

> ---
> Aside: For a short time saw a very strange parsing error more text{}
> related than cmap{}. I started with some example cmap{} code off the
> newsgroup and it initially failed. It seemed somehow related to a
> semicolon following the string declare used within the text{} object!
> The SDL declare looked like:
>
> #declare MyText = "..."; // Semicolon not needed.
>
> However, as I played the parsing error went away and, try as I might,
> I've been unable to reproduce that parsing fail. Maybe something like
> dos line terminations vs unix/linux ones - or some utf8 character
> elsewhere in the scene mangling things. I don't know. Still, for the
> record, I did see what looked to me to be a bogus parsing fail while
> working on a scene with cmap{}.

Have you tried sorting out that diacritical mark issue we were having a while
back?   That seemed really strange.  Had to solve it by using a totally
different font file.

Also, I have been getting some weird parse errors, for reasons that don't make
sense.

Perhaps they are due to my structuring of the code, but I feel that they are
just too weird - and - from a user perspective - unexpected behaviour.

Over the years, I also notice instances of transient and unreproducible
behaviour in scenes that DO parse and render successfully.
There are definitely ghosts lurking in the machine's code.  I suppose that I was
always uneasy about trying to point them out as possibly being real, since I'm
not a professional programmer, and POV-Ray has always had it's unique issues.  I
guess I just felt like it would get trivialized and dismissed, and I would have
a hard time supplying a scene that demonstrated the error or strange behaviour.
(just look at your recent dictionary / core dump experiments)

I will try to note and document such things more diligently in the future.

At present, I have 2 issues in the same scene/inc pair, that I sent to jr for
inspection.  I have also spent the last week chasing my tail around in circles
trying to sort out the vertices and edges of these triangles.  In my development
scene, it appears that I have everything working the way I want to and expect.
When I switch over to rendering all of the patterns, two that are not radially
symmetric are rotated - even though my preliminary debugging suggests that it
doesn't make any logical sense.

Given that you are presently in the thick of unraveling all of the keywords that
have different scopes and meaning, and we know that we have 2 different parsers
and different solvers, and ....   I would encourage others to post their
experiences in debugging their own scenes, as this would not only help spot
potential real issues, but also serve to show where SDL and/or its documentation
perhaps isn't clear enough, and revising things in those areas would be time
well spent.

- BW

Post a reply to this message

From: William F Pokorny
Subject: Re: On povr's accidental text{cmap} capability.
Date: 1 Apr 2023 14:05:50
Message: <6428727e$1@news.povray.org>

On 4/1/23 10:43, Bald Eagle wrote:
> Have you tried sorting out that diacritical mark issue we were having a while
> back?

I did see the thread, but no.

---
Where the ring is incomplete at the top of the A in some fonts, I 
strongly suspect overlapping polygon point lists - which are technically 
legal - but the cleanest fonts eliminate such overlaps.

Those fonts represent the same covered areas with non-overlapping 
polygons point lists - which you can always do.

POV-Ray cannot handle the overlaps a renders the even overlapped areas 
as empty space. There isn't any fixing this beyond adopting some polygon 
processing package(a) to create non-overlapping representations on the 
fly...

---
Some of the rest I suspect is a font not implementing a character as a 
single glyph, but as an assembly of glyphs - to save space. A guess is 
that POV-Ray tried and failed to position all the parts of some assembly 
correctly in some of those funky cases. (Parts / point loops ending up 
outside the per prism bounding within the overall union of objects - in 
addition to - or as part of being placed incorrectly)

In the end POV-Ray is not a word processor or type setting application - 
and should never be in my opinion. There are significant packages for 
such work like freetype and IBM's open source ICT 
(https://icu.unicode.org/) underneath much of the font handling we "see" 
in applications. POV-Ray will likely use such libraries to make font 
handling better at some point, but I'd bet big there will always be 
things that don't quite work.

Aside: I've thought about what POV-Ray might look like with single 
character, ready made polygon representations in include libraries with 
no inbuilt text object at all. Rather SDL code to position characters in 
a string. Suppose that too is no small amount of work... It's attractive 
as an end goal in that it would get the POV-Ray core code out of the 
font handling game.

Bill P.

(a) - For a long time I've kept an eye on 'clipper'. See: 
http://www.angusj.com/clipper2/Docs/Overview.htm

Post a reply to this message