POV-Ray : Newsgroups : povray.off-topic : An interesting problem : Re: An interesting problem Server Time
28 Jul 2024 18:27:17 EDT (-0400)
  Re: An interesting problem  
From: Orchid Win7 v1
Date: 8 May 2013 16:26:01
Message: <518ab4d9@news.povray.org>
On 24/04/2013 06:31 PM, Warp wrote:
> IIRC a regular expression can be (and is usually) converted into a state
> machine that traverses the input data linearly once. When the state machine
> ends up in one of specific nodes, you know that a match was found (and the
> node in question uniquely identifies which pattern was matching.)

I've spent the last few days playing with this. I managed to convert my 
regular expression list into a non-deterministic finite state automaton. 
This was a rather interesting process.

Initially I tried to build a simple decision tree, where every node has 
a dictionary for every possible byte stating what to do if that byte is 
read next. Trouble is, what this is essentially doing is constructing 
every possible matching byte sequence. For expressions with lots of 
alternatives, you get a combinatorial explosion of matching byte sequences.

This became especially evident when I tried to generate code for the 
decision tree. I ended up producing an 8MB Haskell source file, which 
the Haskell compiler died trying to compile. Clearly this approach is a 
dead end.

I then did some reading about how the professionals do this stuff. It 
seems the key is that where a subexpression has alternatives, you 
generate a series of states, but those diverging branches *merge* again 
at the end of the subexpression. What I was doing was effectively 
duplicating the rest of the expression - which is obviously more 
space-hungry.

The slightly tricky thing is that some transitions happen only for one 
specific byte, and others happen for *any* byte. And yet other 
transitions happen without consuming any input at all. This is how we 
end up with a *non*-deterministic state automaton, after all.

Anyway, having implemented all this, the generated Haskell file is now a 
mere 600KB. It still takes a while to compile it, but nowhere near as 
long as before.

I then began reading about how to transform a non-deterministic 
automaton into a deterministic one. It appears you can get rid of the 
"zero byte" moves by generating a new state for every collection of 
states reachable by zero-byte moves. I have yet to implement this yet, 
but it looks like it should significantly reduce the size of the 
generated code.

I didn't actually get as far as the part where you remove the 
non-determinism completely...


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.