Work on the mRVM classifier is coming along steadily. This past weekend I played with the following new tools:
Chef and Vagrant
Chef is a tool for scripted server provisioning and Vagrant is a tool for using such a tool on virtualized environments. Vagrant relies on Oracle VirtualBox to create virtual server instances. So, in other words, on demand I can say “I’ll have one Linux server with the following dependencies, please. Thanks.” And if you mess it up? Go have a cup of coffee while Vagrant re-builds it for you. It’s good to test on Linux since I’ve been developing on Mac OS X. And, apparently, I must have been doing something right because I didn’t experience any portability issues when I tested on Linux.
Valgrind is a tool for memory debugging, memory leak detection, and profiling. It actually includes a few different tools, but I was using it this past weekend for finding memory leaks. I knew I had ‘em. I even marked a few sections of code
// TODO(jrm) possible leak. But I didn’t yet know how to track down the leaks and ensure they were all gone. Before tracking down the leaks, my virtual server would crash and when I’d run the same code on my main machine, I’d fire up OS X’s Monitor.app and watch the memory get gobbled up.
I’ll mention that Valgrind works extremely well on Linux, but the OS X port seems like it’s still a work in progress, as its usage apparently requires some suppression files to squelch irrelevant output.
Anyway, the mRVM implementation is working quite while. I even wrote a nifty cross-validation script, entirely in shell script! It uses basic Unix tools like
grep, etc. to divvy the training set up into a 10 splits and then recombine them in all (1, 9) train-test pairs. Using some Iris bird data that Theo gave me, the cross validation gives high accuracy results (90 to 100 percent).
Below is a “screenshot” of the current parameters that mRVM takes. The next evolution of development will add an implementation of the second mRVM algorithm that will greatly improve performance and allow for efficient processing of extremely large datasets.
$ mRVM --help mRVM, 0.0.1 multi-class multi-kernel Relevance Vector Machines (mRVM) mRVM [options] -h, --help print this help and exit -V, --version print version and exit -r, --train FILE set input file (required) -l, --labels FILE set input file (required) -t, --test FILE set input file (required) -a, --answers FILE set input file (required) -k, --kernel specify the kernel: LINEAR (default) POLYNOMIAL GAUSSIAN -v, --verbose set verbosity level: 0 = No output 1 = Normal (default) 2 = Verbose 3 = Debug -p, --param n set param for poly or gauss kernel to n. -T, --tau n set tau parameter -u, --upsilon n set upsilon parameter Based upon work by Psorakis, Damoulas, Girolami. Implementation by Marcell, firstname.lastname@example.org