Skip to main content

Some Results

We have done simulations with our model on the actual data. The data is pretty noisy since it has been collected manually by researchers on the ground. The initial results though are encouraging. We now have some idea of the preferences of the herders as we have recovered the underlying reward function. The next step is to sit with the economists to interpret those results. With these results we can now reason out as to why some of the herders would choose a particular water point over the other.  We are now in a stage where we are cross validating our results and devising performance measures suitable to the problem.  We also wish to try out some more exotic models for the reward function in the future and incorporate more features into out model.

Model Description

We have considered two kinds of models for the MDP. In the first model we have modeled the state space as grid world. Each cell on the grid world represents a particular geographical location of size 0.1 degree in latitude and 0.1 degree in longitude. The action space is based on actions taken each day and consists of 9 actions (i.e. to move to any of the adjacent 8 cells or to stay there in the same cell). There is also a component of time spent incorporated into the model which would steadily eat away the reward earned by staying in any particular state.  The second model we considered is to represent a state as a pair of two waterpoints, with each state representing a transition. States which have same start and end water points are special states which signify staying at a particular waterpoint. We decided to approach the problem using the first model since it is lower in dimension and has lesser number of actions per state.

We also decided to go with a Reward function which is linear in features where the features describe a state and can be themselves be non-linear yielding a non-linear reward function.

We have successfully tested our model on a toy problem where we able to recover a pre-defined reward function (both linear and non-linear).  Below is plot showing the parameters of the recovered reward function (linear) plotted against the parameters of the actual reward function.

Results from toy problem showing actual vs recovered weights for the linear reward

Results from toy problem showing actual vs recovered weights for the linear reward

Approach to the Problem (IRL)

The technical challenge is that herd movement is a large and complex dynamic discrete choice problem: pastoralists must decide which particular water points to move to, as a function of a large and complex state space. Application of the typical econometric modeling techniques from economics would seem to run into two immediate challenges: (i) the curse of modeling (a particular, explicit behavioral model describing the mechanics of pastoralist decision-making, might be difficult to specify correctly in economics), and (ii) the curse of dimensionality (even if a behavioral model can be specified, it can be difficult to estimate the parameters of a sufficiently complex model, without virtually unlimited data).

Hence we were attracted to consider approaches from the reinforcement learning literature, and in particular the emerging literature on inverse reinforcement learning (IRL), also called apprenticeship learning. IRL is more useful for our purposes; IRL is focused on learning about the reward function that rationalizes data generated by an “expert”.  According to Ng and Russell (2000), IRL is the problem of extracting a reward function given observed, optimal behavior. A key motivation for this literature is that the reward function as opposed to the policy itself “can provide the most succinct, robust and transferable definition of a task” (Ng and Russell (2000)). In many cases it may not be possible to learn the policy directly, especially in parts of the state and action spaces that are not often visited.

Typical structural modeling exercise in econometrics is also concerned with a similar problem: backing out the parameters of the agents preferences (utility function) that best match up with observed data. However, the typical approach involves imposing a particular behavioral model on the agent. While this can be nice since the estimated parameters are interpretable, it means that there are definite limits on the complexity of the state and, in particular, action space that such methods can handle, also this approach is fraught with the curse of modeling. IRL, on the other hand, is more of a “model-free” approach. It involves deriving a reward function as a function of states of the world and actions. If so, they can be used to provide a description of how agents map states into actions (and payoffs), and to carry out policy simulations that allow us to vary elements of the state space, and predict how agents would respond.

Reinforcement Learning

Diagram explaining the Reinforcement Learning model

One way in which animals acquire complex behaviours is by learning to obtain rewards and to avoid punishments. Reinforcement learning theory is a formal computational model of this type of learning.

The main idea of reinforcement learning is that an agent interacts with an environment, changing the environment and obtaining a reward for his action. This positive or negative reward will give him information about how to react the next time in a similar situation. Goal is to learn a policy that maximizes the long term reward. Agent’s interaction with the environment is modelled as a Markov Decision Process (MDP).

Reinforcement Learning Algorithm Description

Reinforcement Learning Algorithm Description

Goal of the RL algorithm is to pick actions over time so as to maximize the expected reward obtained over long term: E[R(s0) + R(s1) + … + R(sT)]; As a solution of RL algorithm we get a policy p which specifies an action for each possible state and achieved this goal.

Inverse Reinforcement Learning

Inverse RL algorithm description and relations

Inverse RL algorithm description and relations

In IRL as the name suggests we do the opposite, we start with a policy that maximizes the long term reward and recover a reward function as a function of states which makes that policy optimal. The optimal policy is basically the expert’s policy which is observed from the data. IRL algorithm finally gives a reward function that explains those expert trajectories.

Pastoralist Problem and Collaboration with Cornell’s Applied Economics department

Background on Pastoralist Problem

Our applied problem is centered on explaining the movement over time and space of animal herds in the arid and semi-arid lands of Northern Kenya. In this region crop agriculture is difficult, and hence pastoralists (animal herders) have adopted the livelihood strategy of managing large livestock herds (camels, cattle, sheep, and goats), which are the primary asset/wealth base, and the primary source of income (through livestock transactions, milk, blood, meat and skins). The movement of herders is due to highly variable rainfall: in between winters and summers, there are dry seasons, with virtually no precipitation. At such times the males in the household migrate with the herds to remote water points, dozens if not hundreds of kilometers away from the main town location. Every few years these dry seasons can be more severe than usual and turn into droughts. At such times the pastoralists can suffer greatly, by losing large portions of their herds. Even during relatively mild dry seasons there are a number of reasons to be interested in the herders spatiotemporal movement problem: we might ask whether their herd allocation choices are optimal given states of the environment, we might worry about the environmental degradation caused by herd grazing pressures and the inter-tribal violence and raids that occur during migration seasons, and we might wonder about policies to correct these issues (more police, more control of grazing, drilling more waterpoints) along with the effects of environmental changes (e.g., more variable or lower mean rainfall) on the tribes. Here’s a short article that appeared on IRIN on the current drought in East Africa, affecting the pastoral regions we are studying.

Collaboration with Cornell’s Applied Economics Department

A survey was collected by the USAID Global Livestock Collaborative Research Support program (GL CRSP) “Improving Pastoral Risk Management on East African Rangelands” (PARIMA).  The PARIMA project pursued the goal of improving pastoral risk management using asset and income diversification and making that data accessible for external use. The project focus spanned six locations in Northern Kenya and five kebeles in Southern Ethiopia in one intact semi-arid and arid livestock production and marketing region.  The survey was structured to shed light on the following key issues and their interaction with the risks and uncertainties faced by pastoralists (animal herders) in the survey areas.  1) Pastoralist risk exposure and management behavior.  2) Livestock Marketing.  3) Rural Financial Institutions and 4) Public Service Delivery Systems.  In addition, the survey contains a variety of standard household and individual survey questions concerning household composition, labor allocation, income, expenditures, etc. The data thus collected will provide insight into the cross-sectional (across individuals, households, communities, and subregions) and intertemporal (seasonal and interannual) variation of risk and uncertainty faced by pastoralists.  Related questions of management, utilization of risk mitigating information, the nature of social insurance networks, etc., can be investigated by use of these data.  The trajectory of development through the past and future expectations, the changing distribution of land type and land use patterns, the behavioral responses to climate shocks and to expectations of coming weather patterns can also be studied from the data.

I will be looking at a sub problem of explaining the movements and trajectories of the herders. My main aim will be to recover the utility function for the pastoralists.

Introduction to Pastoral Systems

My project is to develop models for understanding and predicting decisions taken by pastoral (animal herders) communities in response to changes in Nature and their environment. The subjects of the study are East African pastoralists who have biannual dry seasons and need to migrate due to it. A survey was collected by the USAID Global Livestock Collaborative Research Support program (GL CRSP) which will be used as data to model this system.

The objective is to use data on herd movement choices to estimate the parameters of a spatially explicit dynamic model. This model estimates the household herd allocation over space and time when there is herd split or migration caused by some natural phenomena or search for water bodies. We need a model that derives the preferences of real pastoral populations.

There have been previous attempts to study the system using econometric techniques. But given the complex nature of the problem, we plan to approach the problem using advanced Machine Learning techniques like Reinforcement Learning and Inverse Reinforcement Learning. I aim to apply such techniques to develop a decision making model. The data embodies a policy and is indicative of the choices made by pastoral communities. This policy will be used to generate a reward function (corollary to utility function in economics). A policy coupled with reward function would be sufficient to describe the preferences of the pastoralists.

BOOM Award for ICS Social Network Discovery Project

This past semester, Karan Kurani and Jason Marcell were presented with the 2011 EMC Big Data Award at Cornell’s BOOM 2011.

On March 9, 2011, Cornell’s Faculty of Computing and Information Science presented BOOM, an annual showcase of Cornell student research and creativity in digital technology and applications. ICS student researchers Karan Kurani and Jason Marcell won the event’s EMC Big Data Award for their project, Social Network Discovery of Computational Sustainability Community, which also involved research from Kiyan Ahmadizadeh, Bistra Dilkina, and Theo Damoulas. The project focused on the automatic discovery of the Computational Sustainability Community by mining hundreds of thousands of academic research publications. The team used a generative topic model to isolate those publications which pertain to computational sustainability and augment a seed set of publications initially provided by ICS Director Carla Gomes, who served as the project’s advisor.

via ICS News

More information about the ICS SND project can be found here.

Jason and Karan with the BOOM poster

Jason and Karan with the BOOM poster

The BOOM 2011 Big Data Award

The BOOM 2011 Big Data Award

Update on Multi-view Ensemble Learning

Hello World. Its been a long time since the last blog post. I (Karan Kurani) have been working with Dr Theodoros Damoulas and Aurelie Harou over the last 3 months to build upon the work mentioned in the last 2 posts. I was side tracked from this project for a little while by working on the Social Network Discovery Project in the fall semester.

We have looked at methods to obtain multiple (orthogonal) views on the given data in the previous post. We want to exploit these characteristics so that we can capture more information from the data in order to make better predictions. Our approach was to train multiple classifiers on each cluster from each view. We used an mRVM classifier and tested on the same datasets as the one mentioned in the paper.

Dr. Theodoros Damoulas succintly described the reasoning behind the methodology:

The coupling of Multiview Clustering techniques, that are unsupervised learning methods for creating multiple and orthogonal groupings of the data,with probabilistic supervised learning methods, that are powerful predictive models. By producing multiple partitions of the data, a natural segmentation of the problem that promotes homogeneity within each cluster arises and can define a subset of data where a single base classifier can operate.

We tried the following approaches:

1. Obtain the clusters and the views, train a classifier for each cluster and test the data points by using all the trained classifiers to predict on the point.

2. Obtain the clusters and the views, train a classifier for each cluster on the different subspaces obtained, this time we also project the test points onto the orthogonal space and then use the classifiers trained from each view to predict for the test point. We tried two sub-approaches for this experiment – a) Include the test points while clustering for the training phase. b) Do not include the test points while training, but use the projector information to project the test points separately. The former uses more information than the latter and we found no significant difference in the results between the two sub – approaches.

3. Obtain the clusters and views, train a classifier for each view instead of each cluster, then follow the same approach as in point 2.

    We also trained the same number of classifiers as in their respective approaches which were trained by randomly sampling 60% of the training data. We did these to compare the difference between the predictions made by the Random Set with our Multi Ensemble set obtained from 1,2 or 3. It is a fair evaluation approach because we expect many classifiers to perform better than a single classifier.  Upon obtaining the results, we saw that Random Set usually performed better than the single mRVM classifier, thereby confirming our intuition.

    One of the approaches showed improvement over the Random Set, we plan to publish these results soon and I will write more about it in the future.

    Our approach is generic enough to be applied to any data. This is an alternative ensemble technique apart from approaches such as bagging and boosting.

    We also tried all the approaches (1,2 and 3) for the estimation of Poverty Maps. We made some additional modifications which is specific to that domain as we want to utilize the GIS data along with the census and survey data. We are currently in the phase of collecting results and additional experimentation for this application. I will talk in more detail about the efforts and approaches in a future blog post.

    mRVM Progress

    Work on the mRVM classifier is coming along steadily. This past weekend I played with the following new tools:

    Chef and Vagrant

    Chef is a tool for scripted server provisioning and Vagrant is a tool for using such a tool on virtualized environments. Vagrant relies on Oracle VirtualBox to create virtual server instances. So, in other words, on demand I can say “I’ll have one Linux server with the following dependencies, please. Thanks.” And if you mess it up? Go have a cup of coffee while Vagrant re-builds it for you. It’s good to test on Linux since I’ve been developing on Mac OS X. And, apparently, I must have been doing something right because I didn’t experience any portability issues when I tested on Linux.


    Valgrind is a tool for memory debugging, memory leak detection, and profiling. It actually includes a few different tools, but I was using it this past weekend for finding memory leaks. I knew I had ‘em. I even marked a few sections of code // TODO(jrm) possible leak. But I didn’t yet know how to track down the leaks and ensure they were all gone. Before tracking down the leaks, my virtual server would crash and when I’d run the same code on my main machine, I’d fire up OS X’s and watch the memory get gobbled up.

    I’ll mention that Valgrind works extremely well on Linux, but the OS X port seems like it’s still a work in progress, as its usage apparently requires some suppression files to squelch irrelevant output.


    Anyway, the mRVM implementation is working quite while. I even wrote a nifty cross-validation script, entirely in shell script! It uses basic Unix tools like split, awk, wc, grep, etc. to divvy the training set up into a 10 splits and then recombine them in all (1, 9) train-test pairs. Using some Iris bird data that Theo gave me, the cross validation gives high accuracy results (90 to 100 percent).

    Below is a “screenshot” of the current parameters that mRVM takes. The next evolution of development will add an implementation of the second mRVM algorithm that will greatly improve performance and allow for efficient processing of extremely large datasets.

    $ mRVM --help
    mRVM, 0.0.1 multi-class multi-kernel Relevance Vector Machines (mRVM)
    mRVM [options]
      -h, --help         print this help and exit
      -V, --version      print version and exit
      -r, --train   FILE set input file (required)
      -l, --labels  FILE set input file (required)
      -t, --test    FILE set input file (required)
      -a, --answers FILE set input file (required)
      -k, --kernel       specify the kernel:
                           LINEAR (default)
      -v, --verbose      set verbosity level:
                           0 = No output
                           1 = Normal (default)
                           2 = Verbose
                           3 = Debug
      -p, --param n      set param for poly or gauss
                         kernel to n.
      -T, --tau n        set tau parameter
      -u, --upsilon n    set upsilon parameter
    Based upon work by Psorakis, Damoulas, Girolami.
    Implementation by Marcell,

    Using Hibernate for the Social Network Discovery Project

    As the first step of our Social Network Discovery project, we wanted to pull down a bunch of data from some data sources where we could massage it and filter it before it went on it’s way to the next step in the process. We chose to store our data in a MySQL database and our first source of data was a very large XML file containing metadata about thousands of DBLP entries (books, journal articles, etc.) and their citations.

    Essentially, step one of the whole process was to read a big XML file into a MySQL database, preserving the structure and order inherent in the XML. The first challenge was the SAX vs. DOM parser problem: a DOM parser wants to read the whole thing into memory first, whereas a SAX parser operates on the fly as it sees stuff. The DOM parser simply would not do because the dataset was too large and it quickly ran out of memory.

    The next problem is: how do you get this stuff into a database? We’re working in Java, so one solution is to just start banging together SQL insert statements in code and issue them to the database programmatically using Java’s SQL libraries. This get’s the job done, but it’s not the prettiest thing to read or maintain. A proper Object Relational Mapping (ORM) tool is the appropriate tool for this job, one of the more popular of which, for Java, is Hibernate, which we used.

    In order to use Hibernate, you simply add the appropriate JAR files to your project, annotate (or provide mappings for) your model objects, and then add a little bit of configuration to let Hibernate know about your database and a few other settings. After that, Hibernate provides a nice API for basic CRUD operations: create, read, update, destroy. For simple stuff this, arguably, is overkill, but the real payoff comes when your object model gets more complicated: many-to-many relationships, lazy-loading, cascaded operations. But even for simple models, a nice feature is that Hibernate will completely create your whole database and all of its tables from scratch if you so choose (careful with that feature, you could lose valuable data when it recreates all of your tables!)

    Admittedly, I had previous experience with NHibernate, Hibernate’s sister project on the .NET side of things, so much of this was already familiar to me. But I suffered through a bit of frustration with the initial setup, as things are ever-so-slightly different in Java land. Overcoming the learning curve and the initial setup are well worth it though. The return on investment comes with every added object or new object relationship.

    If you wish to follow the progress of the Social Network Discovery project, the code is publicly viewable in github here.

    Multi-Kernel Relevance Vector Machines

    Bayesian classi?cation
    algorithms, the multi-class multi-kernel Relevance Vector Ma-
    chines (mRVMs) that have been recently proposed.

    Since this is a multi-author blog and this is my first contribution, I will start by introducing myself. My name is Jason Marcell. I am a CS Masters student working under the supervision of Dr. Theodoros Damoulas. My primary work for Cornell’s ICS is a C++ implementation of two Bayesian classi?cation algorithms, the multi-class multi-kernel Relevance Vector Machines (mRVMs).

    We are working openly on this project and its progress may be observed on github. The progress has been coming along slowly but steadily this past semester. Our initial goal so far has been to nail down all of the relevant operations required for implementing a classification machine model: reading user-supplied data from a file into a matrix, basic linear algebra operations, random sampling, etc. We have been using Google’s unit test framework for C++ to work slowly and carefully on small pieces of “building block” code that will later find its way into a proper, maintainable, object-oriented architecture. For our scientific math needs, we are using the GNU Scientific Library, a free and open-source library of routines that are of use to the scientific computing community.

    More to come as progress is made!

    Note: I had originally begun blogging over here, but all subsequent posts will live over here from now on.

    keep looking »


    Recent Comments

      Skip to toolbar