Skip to main content



Model Description

We have considered two kinds of models for the MDP. In the first model we have modeled the state space as grid world. Each cell on the grid world represents a particular geographical location of size 0.1 degree in latitude and 0.1 degree in longitude. The action space is based on actions taken each day and consists of 9 actions (i.e. to move to any of the adjacent 8 cells or to stay there in the same cell). There is also a component of time spent incorporated into the model which would steadily eat away the reward earned by staying in any particular state.  The second model we considered is to represent a state as a pair of two waterpoints, with each state representing a transition. States which have same start and end water points are special states which signify staying at a particular waterpoint. We decided to approach the problem using the first model since it is lower in dimension and has lesser number of actions per state.

We also decided to go with a Reward function which is linear in features where the features describe a state and can be themselves be non-linear yielding a non-linear reward function.

We have successfully tested our model on a toy problem where we able to recover a pre-defined reward function (both linear and non-linear).  Below is plot showing the parameters of the recovered reward function (linear) plotted against the parameters of the actual reward function.

Results from toy problem showing actual vs recovered weights for the linear reward

Results from toy problem showing actual vs recovered weights for the linear reward

Comments

Leave a Reply

Pages

Recent Comments


    Skip to toolbar