|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> Anything that isn't parsable as a number is a name. Therefore,
> "0." -> real
> ".0" -> real
> "." -> name
> "1.1" -> real
> "1.1.1" -> name
> "1e1" -> real
> "1x1" -> name
> "s1" -> name
> "1s" -> name
> Will the insanity never end?? >_<
> Good luck writing a parser that can untangle all of that... :-(
I really can't see the problem. When the input contains a sequence of
valid characters (ie. which can form a real or a name), if this sequence
has the form:
^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$
then it's a real, else it's a name.
If we translate that regexp to plain English, it means:
- There's nothing before this pattern (which is what the ^ at the beginning
means), and nothing after it (which is what the $ at the end means).
- The sequence optionally starts with a + or a -.
- After that two possible patterns must appear (the expression in
parentheses, where the two patterns are separated with the | symbol):
- A sequence of one of more digits ([0-9]+), optionally followed by
the dot character (a plain "." has a special meaning in regexps, so the
dot character has to be escaped, and thus written as "\.")
- A sequence of zero or more digits ([0-9]*) followed by a dot character
followed by a sequence of one of more digits.
- Optionally the character "e" can follow, and if that's the case, a real
(not containing an "e") must follow as well (the whole last part in
parentheses, with the "?" at the end to indicate optionality).
In an actual BNF-style parser the rule probably becomes simpler because
the repetition can be removed.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
>> Good luck writing a parser that can untangle all of that... :-(
>
> I really can't see the problem. When the input contains a sequence of
> valid characters (ie. which can form a real or a name), if this sequence
> has the form:
>
> ^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$
>
> then it's a real, else it's a name.
Yeah. As I said, I'm currently changing my design from one that attempts
to recognise and delimit numbers to one that just chops the text into
chunks, and *then* decides what kind of thing each chunk is.
> If we translate that regexp to plain English, it means:
>
> - There's nothing before this pattern (which is what the ^ at the beginning
> means), and nothing after it (which is what the $ at the end means).
> - The sequence optionally starts with a + or a -.
> - After that two possible patterns must appear (the expression in
> parentheses, where the two patterns are separated with the | symbol):
> - A sequence of one of more digits ([0-9]+), optionally followed by
> the dot character (a plain "." has a special meaning in regexps, so the
> dot character has to be escaped, and thus written as "\.")
> - A sequence of zero or more digits ([0-9]*) followed by a dot character
> followed by a sequence of one of more digits.
> - Optionally the character "e" can follow, and if that's the case, a real
> (not containing an "e") must follow as well (the whole last part in
> parentheses, with the "?" at the end to indicate optionality).
...which would be incorrect then, for at least the following reasons:
- There can be *zero* or more characters before the decimal point. (But
notice that there must be more than zero characters *in total* before
and after the decimal point. It's just that there can be zero in either
place, but not both.)
- The "e" can also be "E".
- The exponent is an integer, not a real.
See? Not as easy as it looks, is it? Gotta pay careful attention to
*exactly* what the manual says is and isn't permissible.
> In an actual BNF-style parser the rule probably becomes simpler because
> the repetition can be removed.
It would be *really nice* if the reference manual included a BNF syntax
diagram... :-S
Between the rules for splitting up tokens, the tricky rules for escaping
things in strings, and characters that have multiple meanings depending
on context, it's really quite hard!
E.g., "<" is the start of a string, "<~" is the start of another string,
and "<<" is an ordinary name object. Go figure. Similarly, "[" is
classified as a "delimiter character", yet it's also the *name* of an
operator (and a name is a sequence of "regular characters" - that is,
can't contain delimiters).
It all gets confusing very fast...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
>
> Why...why...WHY...why would they do this? o_O
>
Why not? After all, it's possible.
-Aero
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> Why...why...WHY...why would they do this? o_O
I wonder if anybody will figure out where this is quoted from...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> - There can be *zero* or more characters before the decimal point. (But
> notice that there must be more than zero characters *in total* before
> and after the decimal point. It's just that there can be zero in either
> place, but not both.)
A quickie C# example ...
Code:
class Program
{
static string[] tokens = new string[] {"0.", ".0", ".", "1.1",
"1.1.1", "1e1", "1x1", "s1", "1s"};
static void Main(string[] args)
{
Regex rx = new
Regex(@"^[+-]?(([0-9]*(\.[0-9]+)?)|([0-9+]\.))(e[0-9]+)?$");
foreach(string t in tokens)
{
if(rx.Match(t).Success)
{
Console.WriteLine("\"{0}\" --> Real", t);
}
else
{
Console.WriteLine("\"{0}\" --> Identifier", t);
}
}
System.Console.ReadKey();
}
Output:
"0." --> Real
".0" --> Real
"." --> Identifier
"1.1" --> Real
"1.1.1" --> Identifier
"1e1" --> Real
"1x1" --> Identifier
"s1" --> Identifier
"1s" --> Identifier
The relavent portion is this regex:
^[+-]?(([0-9]*(\.[0-9]+)?)|([0-9+]\.))(e[0-9]+)?$
Which does what Warp describes, but also takes care of the case with .0
and 0.
> - The "e" can also be "E".
you can easily replace the e for exponent with [eE], which allows either
lower or upper case. :)
> - The exponent is an integer, not a real.
>
> See? Not as easy as it looks, is it? Gotta pay careful attention to
> *exactly* what the manual says is and isn't permissible.
>
>> In an actual BNF-style parser the rule probably becomes simpler because
>> the repetition can be removed.
>
> It would be *really nice* if the reference manual included a BNF syntax
> diagram... :-S
>
> Between the rules for splitting up tokens, the tricky rules for escaping
> things in strings, and characters that have multiple meanings depending
> on context, it's really quite hard!
>
> E.g., "<" is the start of a string, "<~" is the start of another string,
> and "<<" is an ordinary name object. Go figure. Similarly, "[" is
> classified as a "delimiter character", yet it's also the *name* of an
> operator (and a name is a sequence of "regular characters" - that is,
> can't contain delimiters).
>
> It all gets confusing very fast...
--
~Mike
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> > ^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)(e[+-]?([0-9]+\.?|[0-9]*\.[0-9]+))?$
> ...which would be incorrect then, for at least the following reasons:
> - There can be *zero* or more characters before the decimal point.
Which is exactly what "[0-9]*\." means: Zero or more digits, followed
by the dot character.
> - The "e" can also be "E".
That's easy to fix: Replace "e" in the regexp with "[eE]".
> - The exponent is an integer, not a real.
Then it becomes simpler:
^[+-]?([0-9]+\.?|[0-9]*\.[0-9]+)([eE][+-]?[0-9]+)?$
In fact, that pattern actually matches the floating point number
representation in C and C++ (except perhaps for possibility of a
preceding +).
> See? Not as easy as it looks, is it?
It was not a question of difficulty. It was a question of me not knowing
the exact format of floating point numbers in PostScript and making
assumptions.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> There is also a whole bunch of "interesting" rules about how token
> parsing works.
Sounds like a state machine to me. AKA a regular expression.
Sounds like your parser is too complex.
--
Darren New, San Diego CA, USA (PST)
The NFL should go international. I'd pay to
see the Detroit Lions vs the Roman Catholics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New <dne### [at] sanrrcom> wrote:
> Sounds like a state machine to me. AKA a regular expression.
Any regular expression can be converted into a state machine, but can
every state machine be converted into a regular expression? Is there a
one-to-one relation?
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Warp wrote:
> Any regular expression can be converted into a state machine, but can
> every state machine be converted into a regular expression? Is there a
> one-to-one relation?
In the original definitions of those two terms, yes. Obviously, by the time
you get to Perl regular expressions, they're no longer "regular" in the same
sense. If you stick to regular expression algebra as invented by Kleene,
they're equivalent specifications of the same languages. (One can wind up
exponentially larger than the other, mind.)
--
Darren New, San Diego CA, USA (PST)
The NFL should go international. I'd pay to
see the Detroit Lions vs the Roman Catholics.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> Eero Ahonen wrote:
>
>> I assume you are already aware of this, but I'll provide a link anyways:
>>
>> http://circle.ch/blog/p558.html
>
> Why...why...WHY...why would they do this? o_O
Search in povray.general about that guy who wanted to make a webserver in
POV SDL.
(I don't think it's possible without adding for example other I/O
facilities)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|