Skip to main content



mRVM Progress

Work on the mRVM classifier is coming along steadily. This past weekend I played with the following new tools:

Chef and Vagrant

Chef is a tool for scripted server provisioning and Vagrant is a tool for using such a tool on virtualized environments. Vagrant relies on Oracle VirtualBox to create virtual server instances. So, in other words, on demand I can say “I’ll have one Linux server with the following dependencies, please. Thanks.” And if you mess it up? Go have a cup of coffee while Vagrant re-builds it for you. It’s good to test on Linux since I’ve been developing on Mac OS X. And, apparently, I must have been doing something right because I didn’t experience any portability issues when I tested on Linux.

Valgrind

Valgrind is a tool for memory debugging, memory leak detection, and profiling. It actually includes a few different tools, but I was using it this past weekend for finding memory leaks. I knew I had ‘em. I even marked a few sections of code // TODO(jrm) possible leak. But I didn’t yet know how to track down the leaks and ensure they were all gone. Before tracking down the leaks, my virtual server would crash and when I’d run the same code on my main machine, I’d fire up OS X’s Monitor.app and watch the memory get gobbled up.

I’ll mention that Valgrind works extremely well on Linux, but the OS X port seems like it’s still a work in progress, as its usage apparently requires some suppression files to squelch irrelevant output.

mRVM

Anyway, the mRVM implementation is working quite while. I even wrote a nifty cross-validation script, entirely in shell script! It uses basic Unix tools like split, awk, wc, grep, etc. to divvy the training set up into a 10 splits and then recombine them in all (1, 9) train-test pairs. Using some Iris bird data that Theo gave me, the cross validation gives high accuracy results (90 to 100 percent).

Below is a “screenshot” of the current parameters that mRVM takes. The next evolution of development will add an implementation of the second mRVM algorithm that will greatly improve performance and allow for efficient processing of extremely large datasets.

$ mRVM --help
mRVM, 0.0.1 multi-class multi-kernel Relevance Vector Machines (mRVM)
mRVM [options]

  -h, --help         print this help and exit
  -V, --version      print version and exit

  -r, --train   FILE set input file (required)
  -l, --labels  FILE set input file (required)
  -t, --test    FILE set input file (required)
  -a, --answers FILE set input file (required)
  -k, --kernel       specify the kernel:
                       LINEAR (default)
                       POLYNOMIAL
                       GAUSSIAN
  -v, --verbose      set verbosity level:
                       0 = No output
                       1 = Normal (default)
                       2 = Verbose
                       3 = Debug
  -p, --param n      set param for poly or gauss
                     kernel to n.
  -T, --tau n        set tau parameter
  -u, --upsilon n    set upsilon parameter

Based upon work by Psorakis, Damoulas, Girolami.
Implementation by Marcell, jasonmarcell@gmail.com

Comments

Leave a Reply

Pages

Recent Comments


    Skip to toolbar