On the downside, it's chewing up an increasingly large amount of my free time, and taking away valuable energy from other projects (*cough* Epoch *cough*). But I sort of needed the distraction for a while, and now it's just too much fun to quit without a decent project to show for all of it.
Of course no IRC bot is complete without the ability to converse, so I started writing up a simple order-2 Markov chain sentence generator. These are pretty bog standard in the language generation world for low-fidelity conversation.
The big trick with Markov chains is training data. An order-2 chain requires a pretty substantial amount of English to inspect before it starts sounding even vaguely sensible. Most of the output is just direct quotes from what it was fed, since it doesn't have enough contextual information to "understand" how to rearrange phrases and sentences yet.
I'm currently at something like 75KB of JSON representing the internal state of the Markov chain system, and it still sounds pretty idiotic. I'm confident based on past experiments that it can be improved quite a bit - the only factor is how much data I can feed it.
I'm secretly having it listen to IRC channels and selecting the longer and more lucid-looking sentences to add to the training repository. This should be a fun way to add some flavor to the bot.
If I decide to continue hacking on this, I'll probably start by cleaning up the rest of the CIRC code I... erm... borrowed. Once that's done, I'm considering posting a copy of the extension code on my scribblings site for other intrepid Chrome users to futz with.
As a stretch goal, I kind of want to write a Bayesian classifier to try and get a bit of a stimulus/response thing going, but that's a ways down the road in all probability.
Sounds fun. Diversion is good and will probably be better for Epoch in the long run.