|
|
Warp wrote:
>> Good luck writing a parser that can untangle all of that... :-(
>
> I really can't see the problem. When the input contains a sequence of
> valid characters (ie. which can form a real or a name), if this sequence
> has the form:
>
> ^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$
>
> then it's a real, else it's a name.
Yeah. As I said, I'm currently changing my design from one that attempts
to recognise and delimit numbers to one that just chops the text into
chunks, and *then* decides what kind of thing each chunk is.
> If we translate that regexp to plain English, it means:
>
> - There's nothing before this pattern (which is what the ^ at the beginning
> means), and nothing after it (which is what the $ at the end means).
> - The sequence optionally starts with a + or a -.
> - After that two possible patterns must appear (the expression in
> parentheses, where the two patterns are separated with the | symbol):
> - A sequence of one of more digits ([0-9]+), optionally followed by
> the dot character (a plain "." has a special meaning in regexps, so the
> dot character has to be escaped, and thus written as "\.")
> - A sequence of zero or more digits ([0-9]*) followed by a dot character
> followed by a sequence of one of more digits.
> - Optionally the character "e" can follow, and if that's the case, a real
> (not containing an "e") must follow as well (the whole last part in
> parentheses, with the "?" at the end to indicate optionality).
...which would be incorrect then, for at least the following reasons:
- There can be *zero* or more characters before the decimal point. (But
notice that there must be more than zero characters *in total* before
and after the decimal point. It's just that there can be zero in either
place, but not both.)
- The "e" can also be "E".
- The exponent is an integer, not a real.
See? Not as easy as it looks, is it? Gotta pay careful attention to
*exactly* what the manual says is and isn't permissible.
> In an actual BNF-style parser the rule probably becomes simpler because
> the repetition can be removed.
It would be *really nice* if the reference manual included a BNF syntax
diagram... :-S
Between the rules for splitting up tokens, the tricky rules for escaping
things in strings, and characters that have multiple meanings depending
on context, it's really quite hard!
E.g., "<" is the start of a string, "<~" is the start of another string,
and "<<" is an ordinary name object. Go figure. Similarly, "[" is
classified as a "delimiter character", yet it's also the *name* of an
operator (and a name is a sequence of "regular characters" - that is,
can't contain delimiters).
It all gets confusing very fast...
Post a reply to this message
|
|