I think I've finally got a proper solution to eol ambiguity resolution
(slapped wrist for not testing my previous solution more thoroughly).
As usual, doing the right thing (TM) was surprisingly easy, although I'm sure
there must be simpler approaches to ^ and $ handling without all this
post processing. I'm now happily in a position to tackle meatier issues, such as a more flexible
generator interface and another ponder on lookahead. I think I can solve at least half of the
lookahead problem with some regex syntax tree shenanigans, but I need both a*/a and
a/a* to work correctly to really solve it. I may even be able to use some more
post processing thinking about it. Maybe that's the easiest way.
Spoke to Hartmut and agreed to ditch the boost.lexer.zip and instead only update
that code base in boosts SVN repository. Any boost review is still ages away
and it is a moot point seeing as spirit has been using the library for years
already!
I finally fixed the ambiguity with $ and \n in the
lexertl code base. The boost version will be fixed soon.
I will also start work on some kind of abstract state machine interface with Hartmut so that
generator::build() can produce a char_state_machine directly soon.
As I haven't got around to making changes in preparation for the re2c style
code generator, I have switched lexertl.zip to the latest version which includes
the changes mentioned below. I'm thinking that it is probably better to either build to
a char_state_machine directly in the generator class and that
it probably doesn't really need iterators. There will be an option as to whether to
group transitions by state or by char cluster. The latter is needed for a
re2c style code generator.
As I have recently started a revamp of lexertl I have decided to start a blog to keep everybody up to date. As this version is not feature complete yet, I have added a separate zip file which you can find here.
So far I have implemented the following improvements:
wchar_t based state machines (overridable).lexertl::skip token constant.^) link a singleton (as it can only occur at the beginning of a token).debug::dump() now compresses ranges.This dramatically reduces the list of (easier) features I wanted to add and just leaves the following for the immediate future:
file_iterator (this will also replace the one in Boost.Spirit)size_t into a templated type for state machine creation.