Tuesday, 16 October 2007

Wiki parsers (this post is a bit technical)

I've been changing the wiki engine the Intelligent Book uses -- I had been using a slightly altered version of JSPWiki -- and the question of what makes a good wiki engine has come up. For the moment, I've been putting something together using Radeox, since that seems to be the most commonly used library. But I really don't like it.

Part of my prejudice is that the documentation has fallen off the web because the project lead (Stephan Schmidt) has left his employer (Fraunhofer) and is still negotiating the rights to the code. But more essentially, I'm not a fan of its design: each piece of syntax recognises a kind of regular expression, replaces it in the text with some output, and then passes it on to the next filter or macro in the chain. This sounds nice and simple, but some annoying things can happen -- for example, a macro that outputs HTML including Javascript is very likely to get trodden on by later filters: characters like "[" and "]" in the Javascript can be misinterpreted as link tags by the later Wiki filters.

Adam Megacz at Berkeley has produced a formal scannerless parser that can be configured to parse Wiki markup (most parser generators cannot) -- but it's a little bit like cobbly PhD software, and in practice it seems to be very slow parsing its grammar file. Also, requiring wiki installations to define formal grammars for their markup just doesn't sound right. Formal grammars are the exclusive domain of computer scientists; wikis are supposed to be more accessible than that.

I'm getting tempted to go down a different route (I have a cunning plan for a much better approach!) but for the moment I don't have time, so I guess I'll struggle on with Radeox and just keep grumbling for now.

Friday, 21 September 2007

Elections, elections, bureaucracies

So it seems as thought both Australia and (perhaps) the UK are heading into elections. I've always found it curiously comical that although the UK and Australia have usually had the opposite party in power since the late 70s, the outcomes have all been pretty similar -- interest rates have tracked largely the same in both countries, as have employment figures, and the political agenda (crises about petrol prices and illegal immigration being on the agenda in both countries at similar times...).

It reminds me of Michael Portillo's comment on This Week a year or so ago. He was asked his opinion about why Blair ended up being at odds with Europe when before 1997 he'd campaigned so vigorously that Britain should co-operate more with the EU. Portillo said that he wasn't surprised because he [Portillo] has always been "a great believer in bureaucracies" -- in other words, the issue is always bigger than the politician. It doesn't matter if Tony was pro-Europe in his heart, when he was in power he was there to argue for Britain's interests and they were different from France's interests and so, sure enough, he ended up having rows with Chirac all the time.

But this leaves me feeling vaguely powerless at the polls. Not only won't my vote make much difference (because I live in a "safe seat"), but even if I did change the government, the outcome would be pretty similar. On reflection, maybe I'll go back to pretending that whether Darling or Osborne (and Costello or Swan) is chancellor is going to determine whether I'll be wealthy or destitute in five years time...

Tuesday, 18 September 2007

Blog comments

I noticed a curious thing with Blogger's comment system -- although there are "anonymous" and "leave a name, but I don't have a log-in" options, it defaults to asking you to log in. I wonder how many people click "comment", see the log-in box, and run away, never noticing they don't have to log in after all.

Naturally I wonder this mostly for other blogs -- the five or so people who read this one wouldn't make much of a sample! :)

ALT-C, My presentation

My presentation slot was at 9am on the final morning of the conference. Happily I had an interested audience (if a little small), and made some contacts to follow up on after the talk.

I'd been slightly worried that with the conference dinner going late into the evening before, and having to clear out of our accommodation by 10am, maybe there'd only be me and the other presenter in the room as everyone would either have stayed in bed, been packing their bags, or already have hit the road homewards. Thankfully, learning technology conference attendees are more dedicated than that though!

Monday, 17 September 2007

ALT-C, Peter Norvig's keynote

A couple of weeks ago, Peter Norvig, Google's director of research, gave the closing keynote at the ALT-C keynote. For his background reading, he spoke to Hal Abelson, who happens also to have been involved in the Intelligent Book project my PhD came out of. So, of course I was ever so interested because there was lots of common ground with my research and I had a personal link to the background. And I thought it was an entertaining talk. However, talking to a two other attendees (with backgrounds in education) in the cab on the way to the train station, they surprised me by saying they didn't find it very relevant. Well, of course you can't please everyone, but I had to ponder what it was about the talk that left them unenthused and me entertained.

I wondered if perhaps by keeping the talk non-technical, Norvig maybe ended up focusing on material they already knew. Like many AI researchers when they look at teaching technology, Norvig took Bloom's "Two Sigma Problem" as his cue. This is the "problem" that tutoring students one-to-one is much more effective than teaching them in a traditional classroom (specifically, a 1980s US high school classroom) but is also much more expensive. Rather than focus too much on technical matters, he unpacked the outcomes of Bloom's research and what it means for teaching pedagogy, and where technology might fit. This is all interesting stuff for technologists looking at educational technology, even if I'd already come to similar conclusions from reading Bloom's Two Sigma Problem paper myself. But this made me wonder -- the teachers and education researchers in the audience are probably already very familiar with Bloom and his research: he's one of the biggest names in the field. And perhaps, seeing that the keynote was by the director of research at Google, they expected to hear more about what new kinds of technology might be around the corner, how it could benefit teaching, and how to prevent technology from chewing up all their time in learning how to use and administer it?

Or maybe the two people I spoke to were just having a bad day.

Tuesday, 4 September 2007

"Digital native" -- the most overused term?

After the first day of the ALT-C (learning technology) conference, one common theme is that almost every talk I have been to has at some point mentioned "digital natives and digital immigrants". And many of the speakers have talked about their teenage children understanding the Web 2.0 world so much better... Maybe it's the reactionary in me, but I think the "digital native/immigrant" term might be becoming a hindrance -- it seems to be prompting people sometimes to think about today's undergraduates as an alien culture they cannot easily relate to or understand. And I don't think that's true. Most of the "digital native" phenomena seem perfectly rational and understandable from an HCI perspective, whether it be using Facebook instead in preference to email for personal communication (no spam, less formal, no need to write down email addresses, etc etc...) or the texting culture.

In fact, if the world wasn't locked into email by network effects ("everyone else uses email and expects me to have email" or "work expects me to have email") I wonder if any of us would really chose email as our method of choice any more, given it is 90% spam now.

Friday, 31 August 2007


Next week I'm heading to ALT-C, a conference on learning technology. It's in Nottingham, just up the motorway, so not quite such an exotic location as Takamatsu, Mumbai, or Madrid. It's the last paper from my PhD, so it's got a slight "end of an era" feel about it.

Monday, 16 July 2007

A Penny Saved is a ...

Well, if you're one of the economists David Cameron is listening to, it's not "a penny earned" but "a loss to the economy"!

One of Cameron's  sillier policies is to extend copyright from 50 years to 70 years.  He claims that this would "add £3.3 billion to the economy".  Unfortunately, this is one of those nasty things: a penalty dressed up as a benefit.  Why?  Because actually the size of the economy is a measure of penalty to people, not benefit to people

In simple terms, it's obvious: the size of the economy is the sum total of cash leaving people's wallets!  If I keep an extra £1 in my wallet, the economy is £1 smaller.  Strangely, though, I feel £1 better off, not worse off because the economy is that little bit smaller...

So why is "growing the economy" normally seen as a good thing.  Well, usually it is an indirect measure of benefit:

Economists can't actually measure the benefit someone gets from something -- eg, a cup of tea -- directly.   (Nobody has yet invented the "aaah"-ometer.)  So instead economists have to assume the price you were willing to pay for that cup of tea is a measure of its value to you.  If you were willing to pay 20p, then you've presumably received at least twenty pence worth of benefit from it.  Market forces also help to make sure things balance up nicely.  And so the amount of money being spent in the economy is an approximate measure of the benefit people are receiving from the economy.  Usually.

But say the government issues a licence to Oxygen, Inc that allows them to charge £1 per person per day for the air we breathe.  Instantly the economy has jumped by £1 × 365 × 60 million = £21.9bn. But has anyone achieved any actual benefit? Well, no, the air is still exactly the same, it's just the public now has £21.9bn less to spend on other things.

So this is the problem: the economists assume "cost = benefit" and measured the economy accordingly.  But they can't measure the benefit of things you get for free.  So if the government makes something that would be free (eg: air, or 51 year old music) suddenly have a cost, it looks like the economy has grown (hurrah!) but all that has really happened is people are more out of pocket for no extra benefit (boo!).

Irritatingly, Mr Cameron seems to be going for the "look it'll grow the economy" boondoggle, hoping nobody will notice extending copyright is just taking extra cash from our pockets for no extra benefit.

Thursday, 12 July 2007

Conditional approval...

Today I received the official letter from BoGS (that fabulous acronym that stands for Board of Graduate Studies) saying I've been conditionally approved to be awarded the PhD. "Conditional" is subject to giving them a hardbound copy of my thesis for the library, so it's not a very onerous condition...

Friday, 6 July 2007

Google Web Toolkit (GWT)'s cunning plan?

Languages like Java and Python compile it to an intermediate semi-digested form called "p-code" or "bytecode". Google Web Toolkit (GWT) takes Java source and compiles it into cross-browser JavaScript. Effectively, GWT uses JavaScript as its p-code.

But how good is JavaScript at being a p-code? Probably not great, because it wasn't designed as one -- it was designed as a general purpose programming language. Programming languages are designed to cater for the fact that programs are written by people. So, they have lots of syntactic sugar and usability features. P-code, meanwhile, is never written by hand and so needs no usability features. P-code is something designed to be quick and easy for a virtual machine to interpret and optimise. (See any course on compilers for the sorts of changes compilers make when converting source code to an intermediate representation.)

If JavaScript was an efficient p-code language, and if the "virtual machines" for it (the browsers) were efficient, then GWT applications would perform as well as Java or .NET applications. And yet, GWT actually feels a little sluggish.

So how could Google get around this issue? Well, it's in a uniquely good position to introduce a new p-code into browsers: a proper intermediate representation of JavaScript. The new p-code would be put into Firefox first presumably. GWT would then include some code in the sites it generates to see if the browser supports this new p-code. If it does, it would ask the server for blazing fast new p-code; if it doesn't, it would ask for sluggish old JavaScript. And if GWT sites (like most of Google) were blazing fast on Firefox and sluggish on IE or Safari, it wouldn't take the other browser manufacturers long to say "we'd like some of this new p-code goodness too please".

And then Google would have moved some way towards their goal of the Web being their "universal application platform", rather than the slightly hacky, messy, not quite fit-for-purpose application platform that browsers have always been so far.

Of course all this is purely speculation...

Wednesday, 4 July 2007

21 July

So I'm arranging to graduate (subject of course to the official result of my viva), and it turns out the next university Congregation is on 21st July, the day the final Harry Potter book comes out. Which just makes it slightly odd reading lines on the graduation forms about gowns and hoods and how academical dress can be hired from any robemaker in Cambridge...

Wednesday, 27 June 2007

Harry Potter prediction (couldn't resist)

Oh well, I got that one wrong! (Prediction removed).

Saturday, 23 June 2007

Viva survivor

I had my viva (thesis defence) for my PhD on Thursday. I came out smiling. (Though officially I won't get told the result for a few days.)

Thursday, 14 June 2007

No more 50p bus fares

For about a year, Cambridge has run a trial scheme where university members could catch busses around the city for 50p.  But the bus company has just announced this is ending on June 30th.  This is a big shame, because for a brief while it meant that it actually made financial sense to catch the bus.  The usual bus fare for a return ticket is £2.70, using the discounted "day rider".  For the journeys my wife and I make, that's more than twice the cost of driving (Cambridge is a small city).  And that's assuming only one person in the car.  Put two people in the car, and catching the bus would be utterly profligate...  (not to mention slow and unreliable).

On the up-side, I tend to cycle to work in town.  And with bus fares at £2.70 per day, it's now that much harder for me to justify being lazy and catching the bus.

One Week to Viva...

Next Thursday, the 21st, I have my viva voce examination for my PhD -- where two examiners who've read my dissertation come and ask me all sorts of tricky questions about it.  (Or as one person put it "where two lucky people who already have their PhDs tell you whether or not they'll let you join the club").

Thursday, 31 May 2007

Google Gears -- useful, but a couple of early limitations?

I just got back from the Google Developer Day in London, where one of the "big announcements" was Google Gears -- a way for AJAX applications, to work locally on your computer even when offline. Gears keeps a local copy of the web data an application would use in an SQLite database on your PC; the local data is synced with the remote web data whenever you are online.

There seem to be a couple of yet-to-be solved issues that some apps will strike:
  1. Merging conflicts
  2. Knowing what data you need ahead of time
Merging conflicts.
Imagine if Wikipedia was an AJAX site, working offline. Algernon and Berty could both edit the GoogleGears entry, and then when they both go online, who's edit will be uploaded to the site? Algernon's? Berty's? Or will an error happen?

The Gears team are pretty up-front that they haven't solved this yet. Unfortunately, they're not so clear on what will actually happen at the moment -- I asked one of the developers whether there was at least any way for an application to find out which tables or rows are in conflict, but the answer was a pretty blank "we haven't added anything for that, there's just whatever SQL provides". So I guess that means at the moment the last upload wins (and nothing will even know there was ever a conflict). Maybe there's some way to check a lastUpdate timestamp on the server though...?

Knowing how much data you'll need
Storing the data locally will only work for applications that know what data they want to store -- for example your email or calendar. That sounds pretty obvious and unavoidable. But for one recent craze (that Google's keen on) this could be problematic: "mashups of mashups" -- letting users combine information from multiple sites and functionality from multiple mashups.

For example, let's start with a TV guide. Let's call its data t.

Now let's use a mashup that links the TV guide to some reviews from rottentomatoes. Now we need data
t, r(t)

Let's also use a mashup that uses the IMDB or the BBC's program data to find out what other tv shows the actors have been in. "Open All Hours": Granville is played by David Jason who you'd know from "A Touch of Frost" and "Dangermouse"
t, p(t)

But hang on, I don't want it telling me "Open All Hours": Customer Number 3 is played by Joe Bloggs who appeared in "RubbishProgrammeX". I only want to hear about actors who were in quality shows. So let's use the review site again, and combine the mashups so I only hear about the good shows the actors were in.
t, r(t), p(t), r(p(t))

That last one looks a little big in off-line mode. A user might click on any show in the guide today, and the app needs to check the reviews of all the other shows each actor has been in. That's probably ok to do online for one show that the user has just clicked on. It means checking about a thousand reviews (say 20 actors, each having 50 other roles). But for the mashup to work off-line, well the user might click on any show on any of 40 channels today. Let's say there are 1,000 shows on tv today. We need to pre-fetch around a thousand reviews for each of those shows. Suddenly we're pre-fetching a million reviews! (Ok, minus a significant number of overlaps).

To work offline, a mashup-of-mashups could have to do a number of joins across multiple sites, and pre-cache the (quite large) result.

Friday, 25 May 2007

First post

I submitted my PhD recently, and with a fresh stage of life comes a fresh blog! More to the point, I wanted to change the address from the rather cryptic 'whb21' to the more comprehensible 'wbillingsley'. I'll leave the old one there for now, though.