Gonzalo Diethelm (2023-05-09 08:55:32) Permalink
Great article, thanks! Tiny suggestion: change "every non-empty sequence of lower-case digits defines an integer" into "every non-empty sequence of digits defines an integer".

Laurence Tratt (2023-05-09 13:49:44) Permalink
@Gonzalo Oops, that's an amusing typo -- fixed!

Michael Norrish (2023-08-25 05:50:27) Permalink
The regular expressions are closed under intersection and complement (unlike the CFLs), so in the case of wanting to exclude "pi", it seems as if you'd actually want to keep a separate scanner, and just extend lex's syntax to allow you to write

ident = [A-Za-z]+ - "pi" - "e" - ...

where the + is the non-zero Kleene closure operator and the - is subtracting out possibilities. Of course, the point from your great interview with Eelco about wanting to compose grammars is valid too, so it feels as if we're being pulled in two directions at once...

Laurence Tratt (2023-08-25 06:56:01) Permalink
@Michael Your "-" operator suggestion is an intriguing one and would have made the "pi" example easier to express, once I'd understood where the overlap was. My initial reaction is that I think "-" should work in lex, because it restricts itself to "true" regular languages, and therefore in my imaginary lecc too.

That said, you're very right that once you start composing grammars (scannerless or otherwise), bigger challenges start to make themselves known. Parsing is full of fun trade-offs!