POV-Ray: Newsgroups: povray.off-topic: Mini-languages

POV-Ray : Newsgroups : povray.off-topic : Mini-languages		Server Time 2 Jul 2025 06:32:39 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Invisible
Subject: Re: Mini-languages
Date: 10 Nov 2010 04:08:51
Message: <4cda6123$1@news.povray.org>

On 09/11/2010 10:11 PM, Warp wrote:

>    *Using* a parser library for something which can be easily expressed with
> a regexp string would be complete overkill.

Yeah, that's true. Writing

   string "foo"
   many char
   string "bar"

is drastically harder than just saying "foo*bar".

Oh, wait...

Post a reply to this message

From: Invisible
Subject: Re: Mini-languages
Date: 10 Nov 2010 04:15:37
Message: <4cda62b9@news.povray.org>

On 08/11/2010 05:31 PM, Darren New wrote:
> Invisible wrote:
>> I'd much prefer to see a much bigger separation between what's a
>> literal character and what's a command.
>
> Technically, they're all commands. The letter "s" means "match against
> the letter s." :-)

Well, if you wanted to split hairs, it's the implicit command "match 
against a specific character" plus the argument "s". :-P

>> you could probably use it to match against streams of other data, not
>> just characters.
>
> You can.

Not in any regex product I've ever seen. (Although I won't claim to be 
an authority on the subject.) Most people seem to equate "regex" with 
"shorthand for writing text parsers".

>> A quick inspection of Wikipedia suggests that POSIX ERE involves at
>> least .[]^$()\*{}?+|:, which is 16, not 10. (Still, it's not the
>> thousands it seemed like last time I tried to learn this stuff.)
>
> : and {} and ^ $ aren't original regular expression characters.
> Technically not + either, so I think that's where the 10 come from. The
> rest are short-cuts for what you can already otherwise specify (: + {
> }), or are useful for programming but outside the theory (^$).

So much for theory. Most "regular expressions" out there aren't even 
regular. I don't know much about the theory; what I know about is the 
actual regex tools that you can actually use.

>> I recall reading somewhere that Perl's "regular expressions" aren't
>> actually regular, and so require exponential time for matching. Truly
>> regular expressions apparently require only linear time.
>
> Correct. And not only exponential time, but memory as well. A regular
> expression is regular because it requires a fixed amount of memory to
> match or reject.

Is that the definition of "regular" then?

>> The other thing I dislike is that people seem to have a tendency to
>> use regexs where they should be using a real parser.
>
> Yes, well, that's because people are stupid, not regexps.

> Only stupid people. Learn regexps, and learn the theory behind them, so
> when the boss asks you to write a parser, you know which one to use.

My boss well never, ever ask me to write a parser. (Mostly because he 
doesn't know what one is, or that such a process is actually necessary.)

Regardless, if you're trying to do complicated parsing, you should use 
real parser tools, not a regex.

Now if you just want to do a quick wildcard search, then why not? It can 
be useful to be able to say, for example, DELETE *.PNG or something. But 
by the time you get to the point where your search term is practically a 
predicate calculus, you really shouldn't be trying to encode the entire 
thing as a flat character string. You should use a real language instead.

Post a reply to this message

From: Warp
Subject: Re: Mini-languages
Date: 10 Nov 2010 10:28:52
Message: <4cdaba34@news.povray.org>

Invisible <voi### [at] devnull> wrote:
> On 09/11/2010 10:11 PM, Warp wrote:

> >    *Using* a parser library for something which can be easily expressed with
> > a regexp string would be complete overkill.

> Yeah, that's true. Writing

>    string "foo"
>    many char
>    string "bar"

> is drastically harder than just saying "foo*bar".

> Oh, wait...

  Nice straw man. (And "foo*bar" still doesn't mean what you think it means
as a regex.)

-- 
                                                          - Warp

Post a reply to this message

From: Invisible
Subject: Re: Mini-languages
Date: 10 Nov 2010 10:45:26
Message: <4cdabe16$1@news.povray.org>

>    Nice straw man.

Wasn't his problem that he didn't have a brain?

> (And "foo*bar" still doesn't mean what you think it means
> as a regex.)

So, what, it means

   string "fo"
   many (char 'o')
   string "bar"

Or am I reading this wrong?

Post a reply to this message

From: Darren New
Subject: Re: Mini-languages
Date: 10 Nov 2010 10:54:36
Message: <4cdac03c$1@news.povray.org>

Invisible wrote:
> Not in any regex product I've ever seen. 

You don't even use text regular expressions, so that's not very 
authoritative. ;-)

Seriously, any DFA is the equivalent of a regex.

> Most people seem to equate "regex" with "shorthand for writing text parsers".

Sure. But exactly the same theory works for any stream of tokens. You can 
use most modern libraries to match, for example, unicode, which includes 
chinese, so right there you're outside the "text" area, let alone if you 
write your own parser.

> So much for theory. Most "regular expressions" out there aren't even 
> regular. 

Sure they are. Unless you use a back-escape (i.e., substitute in something 
that you earlier matched) then it's all regular. Stuff like {} and + are 
just trivial macros to reduce typing.  I think I use a backmatch maybe once 
every two or three years interactively, and I don't think I've ever used one 
programmatically.

>>> I recall reading somewhere that Perl's "regular expressions" aren't
>>> actually regular, and so require exponential time for matching. Truly
>>> regular expressions apparently require only linear time.
>>
>> Correct. And not only exponential time, but memory as well. A regular
>> expression is regular because it requires a fixed amount of memory to
>> match or reject.
> 
> Is that the definition of "regular" then?

Wikipedia is your friend. But yes, that's part of the definition. A language 
is regular if it can be matched by a DFA.

> Regardless, if you're trying to do complicated parsing, you should use 
> real parser tools, not a regex.

Again, it depends what you're trying to parse. Are you trying to parse a 
file full of lines like

structure_size = 37
structure_drift = 92.7E13

etc?  A regexp will do just fine.

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

From: Darren New
Subject: Re: Mini-languages
Date: 10 Nov 2010 10:57:54
Message: <4cdac102$1@news.povray.org>

Invisible wrote:
> On 09/11/2010 10:11 PM, Warp wrote:
> 
>>    *Using* a parser library for something which can be easily 
>> expressed with
>> a regexp string would be complete overkill.
> 
> Yeah, that's true. Writing
> 
>   string "foo"
>   many char
>   string "bar"
> 
> is drastically harder than just saying "foo*bar".
> 
> Oh, wait...

OK, so how do you do

(\+|-)[0-9]+(\.[0-9]+)?(E(\+|-)?[0-9]{1,3})?

in your parser language?

(That is, optional sign, one or more digits, optional decimal point followed 
by one or more digits, optional E followed by optional sign followed by one 
to three digits.)

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

From: Warp
Subject: Re: Mini-languages
Date: 10 Nov 2010 11:28:45
Message: <4cdac83c@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
> (\+|-)[0-9]+(\.[0-9]+)?(E(\+|-)?[0-9]{1,3})?

  Btw, it's a surprisingly little known fact that at least in C and C++
things like 10. and 10.e5 are valid floating point literals (besides the
more usual .1 form).

  If you wanted to take that into account in a regexp like the one above,
it actually becomes a bit verbose (so that a lone . wouldn't be considered
a valid floating point literal). Basically you need to write the above twice
(with slight differences, and the two parts separated with a |.)

-- 
                                                          - Warp

Post a reply to this message

From: Invisible
Subject: Re: Mini-languages
Date: 10 Nov 2010 11:53:35
Message: <4cdace0f$1@news.povray.org>

> OK, so how do you do
>
> (\+|-)[0-9]+(\.[0-9]+)?(E(\+|-)?[0-9]{1,3})?
>
> in your parser language?

OK, that's one big ol' complex regex, right there.

> (That is, optional sign, one or more digits, optional decimal point
> followed by one or more digits, optional E followed by optional sign
> followed by one to three digits.)

If I've understood the spec correctly, it's

   do
     option (char '+' <|> char '-')
     many1 digit
     option (char '.')
     many1 digit
     option (do char 'E'; option (char '+' <|> char '-'); many1 digit)

Enforcing that the exponent is less than or equal to 3 digits would be 
slightly more wordy. The obvious way is

   xs <- many1 digit
   if length xs > 3 then fail else return ()

Notice that since this is written in a /real/ programming language and 
not a text string, we can do

   sign = char '+' <|> char '-'

   number = do
     option sign
     many1 digit
     option (char '.')
     many1 digit
     option (do char 'E'; option sign; many1 digit)

and save a little typing. You can also factor the task into smaller pieces:

   sign = char '+' <|> char '-'

   exponent = do
     char 'E'
     option sign
     xs <- many1 digit
     if length xs > 3 then fail else return ()

   number = do
     option sign
     many1 digit
     option (char '.')
     many1 digit
     option exponent

You can also do things like write a function that builds a special kind 
of parser given a simpler spec.

With a regex, on the other hand, you cannot even statically guarantee 
that a given string is even a syntactically valid regex. And that's 
before you try to programmatically construct new ones. :-P

Post a reply to this message

From: scott
Subject: Re: Mini-languages
Date: 10 Nov 2010 12:55:46
Message: <4cdadca2@news.povray.org>

>> (\+|-)[0-9]+(\.[0-9]+)?(E(\+|-)?[0-9]{1,3})?
>
>   do
>     option (char '+' <|> char '-')
>     many1 digit
>     option (char '.')
>     many1 digit
>     option (do char 'E'; option (char '+' <|> char '-'); many1 digit)

Hmm, now which one is more readable?

Post a reply to this message

From: Darren New
Subject: Re: Mini-languages
Date: 10 Nov 2010 14:25:47
Message: <4cdaf1bb$1@news.povray.org>

Warp wrote:
> Basically you need to write the above twice

I would think at the worst you'd need to write the part before the "E" twice.

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>