|  |  | Warp wrote:
>> Good luck writing a parser that can untangle all of that... :-(
> 
>   I really can't see the problem. When the input contains a sequence of
> valid characters (ie. which can form a real or a name), if this sequence
> has the form:
> 
>     ^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$
> 
> then it's a real, else it's a name.
Yeah. As I said, I'm currently changing my design from one that attempts 
to recognise and delimit numbers to one that just chops the text into 
chunks, and *then* decides what kind of thing each chunk is.
>   If we translate that regexp to plain English, it means:
> 
> - There's nothing before this pattern (which is what the ^ at the beginning
>   means), and nothing after it (which is what the $ at the end means).
> - The sequence optionally starts with a + or a -.
> - After that two possible patterns must appear (the expression in
>   parentheses, where the two patterns are separated with the | symbol):
>   - A sequence of one of more digits ([0-9]+), optionally followed by
>     the dot character (a plain "." has a special meaning in regexps, so the
>     dot character has to be escaped, and thus written as "\.")
>   - A sequence of zero or more digits ([0-9]*) followed by a dot character
>     followed by a sequence of one of more digits.
> - Optionally the character "e" can follow, and if that's the case, a real
>   (not containing an "e") must follow as well (the whole last part in
>   parentheses, with the "?" at the end to indicate optionality).
...which would be incorrect then, for at least the following reasons:
- There can be *zero* or more characters before the decimal point. (But 
notice that there must be more than zero characters *in total* before 
and after the decimal point. It's just that there can be zero in either 
place, but not both.)
- The "e" can also be "E".
- The exponent is an integer, not a real.
See? Not as easy as it looks, is it? Gotta pay careful attention to 
*exactly* what the manual says is and isn't permissible.
>   In an actual BNF-style parser the rule probably becomes simpler because
> the repetition can be removed.
It would be *really nice* if the reference manual included a BNF syntax 
diagram... :-S
Between the rules for splitting up tokens, the tricky rules for escaping 
things in strings, and characters that have multiple meanings depending 
on context, it's really quite hard!
E.g., "<" is the start of a string, "<~" is the start of another string, 
and "<<" is an ordinary name object. Go figure. Similarly, "[" is 
classified as a "delimiter character", yet it's also the *name* of an 
operator (and a name is a sequence of "regular characters" - that is, 
can't contain delimiters).
It all gets confusing very fast...
 Post a reply to this message
 |  |