POV-Ray: Newsgroups: povray.off-topic: Very long post

POV-Ray : Newsgroups : povray.off-topic : Very long post		Server Time 19 Dec 2025 17:37:15 EST (-0500)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: scott
Subject: Re: Very long post
Date: 24 Sep 2008 02:27:08
Message: <48d9ddbc@news.povray.org>

> 1. A "tokeniser" takes the input string and splits it into "tokens", 
> possibly decorating them slightly. So now you have a flat stream of tokens 
> instead of just characters.

Do you need to have some sort of grammar rules though before you can 
tokenise?  I mean what if you have the string "-3*(4-3e-2)+5*-3", the 
tokeniser (if I understand right) needs to correctly decide how to interpret 
those minus signs, needing some rules about what is surrouding it etc.  What 
would be the best way to convert an ASCII string like that into a list of 
tokens?

Post a reply to this message

From: Invisible
Subject: Re: Very long post
Date: 24 Sep 2008 04:34:06
Message: <48d9fb7e$1@news.povray.org>

scott wrote:

> Do you need to have some sort of grammar rules though before you can 
> tokenise?

Yes, definitely.

> I mean what if you have the string "-3*(4-3e-2)+5*-3", the 
> tokeniser (if I understand right) needs to correctly decide how to 
> interpret those minus signs, needing some rules about what is surrouding 
> it etc.  What would be the best way to convert an ASCII string like that 
> into a list of tokens?

Ah yes, the old "is it unary minus or binary minus?" question.

In this case, you'd probably let minus be a token by itself, and let the 
parser decide whether it's unary or binary based on the context when it 
builds the parse tree.

Post a reply to this message

From: Phil Cook
Subject: Re: Very long post
Date: 24 Sep 2008 06:28:50
Message: <op.uhzd2h1ic3xi7v@news.povray.org>

And lo on Tue, 23 Sep 2008 12:32:15 +0100, Invisible <voi### [at] devnull> did  
spake, saying:

> # Preface #
>
> OK, so the muse has taken me. I want to write something.

[smack] Blog!

-- 
Phil Cook

--
I once tried to be apathetic, but I just couldn't be bothered
http://flipc.blogspot.com

Post a reply to this message

From: Invisible
Subject: Re: Very long post
Date: 24 Sep 2008 06:36:18
Message: <48da1822$1@news.povray.org>

Phil Cook wrote:

> [smack] Blog!

Meh. Like anybody will read it! :-P

Post a reply to this message

From: Invisible
Subject: Re: Very long post
Date: 24 Sep 2008 08:42:26
Message: <48da35b2$1@news.povray.org>

> scott wrote:
> 
>> Do you need to have some sort of grammar rules though before you can 
>> tokenise?
> 
> Yes, definitely.

To see this, consider that in many programming languages, "foo_bar" is a 
single identifier. However, in TeX source code, "_" is [usually] a 
command name and hence should be parsed as a seperate token.

So what constitutes a "token" completely depends on exactly what you're 
trying to tokenise/parse.

Post a reply to this message

From: Darren New
Subject: Re: Very long post
Date: 24 Sep 2008 11:42:32
Message: <48da5fe8$1@news.povray.org>

scott wrote:
> What would be the best way to convert an ASCII string like that 
> into a list of tokens?

Typically, it's done with regular expressions. And typically, "-37" is 
two tokens in programming language compilers, at least.

-- 
Darren New / San Diego, CA, USA (PST)

Post a reply to this message

From: scott
Subject: Re: Very long post
Date: 25 Sep 2008 03:37:05
Message: <48db3fa1$1@news.povray.org>

> Typically, it's done with regular expressions. And typically, "-37" is two 
> tokens in programming language compilers, at least.

OK, and then is there only one token for "minus", or are there two for unary 
and binary minus?  ie does the parser decide or the tokeniser?

Post a reply to this message

From: Invisible
Subject: Re: Very long post
Date: 25 Sep 2008 04:26:39
Message: <48db4b3f@news.povray.org>

scott wrote:

> OK, and then is there only one token for "minus", or are there two for 
> unary and binary minus?  ie does the parser decide or the tokeniser?

Varies depending on the rules of whatever you're trying to process, but 
typically it's the parser.

Post a reply to this message

From: scott
Subject: Re: Very long post
Date: 25 Sep 2008 06:09:17
Message: <48db634d@news.povray.org>

>> OK, and then is there only one token for "minus", or are there two for 
>> unary and binary minus?  ie does the parser decide or the tokeniser?
>
> Varies depending on the rules of whatever you're trying to process, but 
> typically it's the parser.

I'm just curious, because I made a parser like this in C++ once (it was very 
hacky and basically just stepped along the string trying to identify what 
each byte was).  Anyway, it worked ok for things like "-5*(4+2)" etc, but 
crashed with "-(4+2)".  I guess the minus operator should be encoded as its 
own token and then let the parser sort out what it should do.  Maybe I'll 
try a rewrite one day.

Post a reply to this message

From: Invisible
Subject: Re: Very long post
Date: 25 Sep 2008 06:12:25
Message: <48db6409@news.povray.org>

scott wrote:

> I'm just curious, because I made a parser like this in C++ once (it was 
> very hacky and basically just stepped along the string trying to 
> identify what each byte was).  Anyway, it worked ok for things like 
> "-5*(4+2)" etc, but crashed with "-(4+2)".  I guess the minus operator 
> should be encoded as its own token and then let the parser sort out what 
> it should do.  Maybe I'll try a rewrite one day.

Yeah, this is one of the tricky edge-cases of expression parsing. 
Operator precidence and unary/binary operators can get pretty ugly. 
(Gets even harder if you want to report meaningful error messages if 
there's an actua syntax error...)

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>