Tuesday 16 October 2007

Wiki parsers (this post is a bit technical)

I've been changing the wiki engine the Intelligent Book uses -- I had been using a slightly altered version of JSPWiki -- and the question of what makes a good wiki engine has come up. For the moment, I've been putting something together using Radeox, since that seems to be the most commonly used library. But I really don't like it.

Part of my prejudice is that the documentation has fallen off the web because the project lead (Stephan Schmidt) has left his employer (Fraunhofer) and is still negotiating the rights to the code. But more essentially, I'm not a fan of its design: each piece of syntax recognises a kind of regular expression, replaces it in the text with some output, and then passes it on to the next filter or macro in the chain. This sounds nice and simple, but some annoying things can happen -- for example, a macro that outputs HTML including Javascript is very likely to get trodden on by later filters: characters like "[" and "]" in the Javascript can be misinterpreted as link tags by the later Wiki filters.

Adam Megacz at Berkeley has produced a formal scannerless parser that can be configured to parse Wiki markup (most parser generators cannot) -- but it's a little bit like cobbly PhD software, and in practice it seems to be very slow parsing its grammar file. Also, requiring wiki installations to define formal grammars for their markup just doesn't sound right. Formal grammars are the exclusive domain of computer scientists; wikis are supposed to be more accessible than that.

I'm getting tempted to go down a different route (I have a cunning plan for a much better approach!) but for the moment I don't have time, so I guess I'll struggle on with Radeox and just keep grumbling for now.