|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I don't like regular expressions. Or rather, I don't like the mangled
ASCII pea-soup typically used to /represent/ regexes. I have nothing
against the formal /concept/ of a regular expression. I just dislike the
lack of separation between commands and arguments. (And the fact that it
looks like pea-soup!)
Apparently other people dislike them far more:
http://tinyurl.com/ydb4j9j
My best answer on Stack Overflow got about 12 votes. This one got FOUR
AND A HALF THOUSAND! o_O
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 16/07/2012 12:28, Invisible a écrit :
> I don't like regular expressions. Or rather, I don't like the mangled
> ASCII pea-soup typically used to /represent/ regexes
Issue number 1: there is too many syntaxes for regexes.
SQL has its own. Unix ed/sed/vi... has another. Windows try to have some
but not all, and so on (including unix shells)...
(SQL uses % and ? where Unix classical uses * and . )
Regexes is useful for small items.
Trying to recognize a book from another with a regex is silly.
(I want a regex that catch all transcriptions in any language and any
encoding of the work of William Shakespeare but it should avoid all
other books, including the one talking about the work of William
Shakespeare... go to hell!)
Trying to implement a BNF syntax with a regex is asking for trouble as
soon as recursion or reordering is allowed. And interesting BNF syntax
have always a recursion somewhere (just to allow more than one item...),
and most are also cool about the required order of appearance of
sub-items or properties.
Regex for syntactically correct email address: yes
Regex for valid ip address: yes
Regex for validating your income's tax form: No way!
>. I have nothing
> against the formal /concept/ of a regular expression. I just dislike the
> lack of separation between commands and arguments. (And the fact that it
> looks like pea-soup!)
Command ?
There is no command in regexes. Commands are from a programming
language. They often make more issues with regexes too.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> I don't like regular expressions. Or rather, I don't like the mangled
> ASCII pea-soup typically used to /represent/ regexes.
That complaint is mostly irrelevant. regexes are excellent for their
most common use, which is to match extremely simple patterns.
"hello" is a regex. It might not look like it, but it is. And that's the
beauty of it. If you use that for example as a search pattern, you will
find all occurrences of those five consecutive characters.
The absolute beauty of it is that there's *no* extraneous syntax *at all*
to perform such a simple match. If you were to separate the syntax into
"commands and arguments", you would only be making the pattern needlessly
complicated and more laborious to write.
But how does that differ from a trivial matching that simply searches for
those consecutive characters and that's it? Well, you can refine the pattern
by adding a few additional key characters to it, to make it perform a more
elaborate search, and in the vast majority of cases it does not become
needlessly complicated or long.
For example, suppose you wanted to search for either "hello" or "Hello".
You would write the pattern as "[Hh]ello".
You would have to be really pedantic if you were to argue that's complicated
and difficult to understand. However, imagine that you were to separate that
into "commands and arguments", how much more complicated and lengthy the
pattern would become.
You can argue against regexes by giving examples of really long and
complicated patterns, but then you would be complaining about extreme
fringe cases, not about *the most common* usage for them, for which they
are just superb.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 16/07/2012 12:43 PM, Le_Forgeron wrote:
> Issue number 1: there is too many syntaxes for regexes.
Well, that's true enough.
> Trying to implement a BNF syntax with a regex is asking for trouble as
> soon as recursion or reordering is allowed. And interesting BNF syntax
> have always a recursion somewhere (just to allow more than one item...),
> and most are also cool about the required order of appearance of
> sub-items or properties.
I believe the correct phrase is "regular expressions can only recognise
regular languages", and "most languages described by BNF are not
regular". Followed shortly by "most regex implementations are not
actually regular".
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 16/07/2012 12:48 PM, Warp wrote:
> You can argue against regexes by giving examples of really long and
> complicated patterns, but then you would be complaining about extreme
> fringe cases, not about *the most common* usage for them, for which they
> are just superb.
In my limited experience, people don't use regexes for simple pattern
matches. They only use them for insanely complex cases - cases where it
would be far, far better to spell out exactly what you're trying to do,
rather than encode it into a tangle of punctuation.
I can see how if you're just trying to quickly search some document for
a piece of information, being able to throw together a short text string
and get nearly the right results might be useful. And hey, if you're
only doing this once, who /cares/ that it's completely non-maintainable.
It's a one-off task; you don't /need/ to maintain anything.
But for building large, complex applications, regexes seem like a
stupendously bad idea.
Also, the usual formulation of regexes as text strings means that you
can only match against text strings. Admittedly that's the most common
case for wanting to do complicated matching. But if, say, you wanted to
match against a binary file... sorry, you can't do that. As far as I can
tell, there's no reason why a formal regular expression can't be matched
against binary data; it's just that most real-world implementations
don't allow this.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
He doesn't dislike regexes at all. He dislikes using the wrong tool for
the job, as he wisely puts it:
"HTML is not a regular language and hence cannot be parsed by regular
expressions."
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> In my limited experience, people don't use regexes for simple pattern
> matches.
Yes, because you have decades of extensive experience on how eg. unix users
typically use regexes.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 16/07/2012 05:22 PM, nemesis wrote:
> He doesn't dislike regexes at all. He dislikes using the wrong tool for
> the job, as he wisely puts it:
>
> "HTML is not a regular language and hence cannot be parsed by regular
> expressions."
And yet, this seems to be how people almost always try to use regexes...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> And yet, this seems to be how people almost always try to use regexes...
No, it isn't.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 17/07/2012 07:28 AM, Warp wrote:
> Invisible<voi### [at] devnull> wrote:
>> In my limited experience, people don't use regexes for simple pattern
>> matches.
>
> Yes, because you have decades of extensive experience on how eg. unix users
> typically use regexes.
Perhaps you mean like
grep -e "ntpd\[[[:digit:]]\+\]" /var/log/messages.4
which obviously searches for... wait, what does it search for exactly?
So how about this?
egrep
'\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
Yeah, that's pretty clear. If by "clear" you mean "it's going to take me
five minutes to figure out exactly what this is supposed to do".
Or how about
dmesg | egrep '(s|h)d[a-z]'
At least that one only takes a minute or two to figure out.
And then we come to horrifying things such as
while(<STDIN>)
{
my($line) = $_;
chomp($line);
if($line !~ /<DIR>/)
{
if ($line =~ /.{28}(\d\d)-(\d\d)-(\d\d).{8}(.+)$/)
{
my($filename) = $4;
my($yymmdd) = "$3$1$2";
if($yymmdd lt "971222")
{
print "copy $filename \\oldie\n";
}
}
}
}
I don't even want to contemplate what the hell that does...
Still, I suppose the fact that you can do bad things with regexes
doesn't automatically mean that regexes are bad.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |