Hello World. Its been a long time since the last blog post. I (Karan Kurani) have been working with Dr Theodoros Damoulas and Aurelie Harou over the last 3 months to build upon the work mentioned in the last 2 posts. I was side tracked from this project for a little while by working on the Social Network Discovery Project in the fall semester.
We have looked at methods to obtain multiple (orthogonal) views on the given data in the previous post. We want to exploit these characteristics so that we can capture more information from the data in order to make better predictions. Our approach was to train multiple classifiers on each cluster from each view. We used an mRVM classifier and tested on the same datasets as the one mentioned in the paper.
Dr. Theodoros Damoulas succintly described the reasoning behind the methodology:
The coupling of Multiview Clustering techniques, that are unsupervised learning methods for creating multiple and orthogonal groupings of the data,with probabilistic supervised learning methods, that are powerful predictive models. By producing multiple partitions of the data, a natural segmentation of the problem that promotes homogeneity within each cluster arises and can define a subset of data where a single base classifier can operate.
We tried the following approaches:
1. Obtain the clusters and the views, train a classifier for each cluster and test the data points by using all the trained classifiers to predict on the point.
2. Obtain the clusters and the views, train a classifier for each cluster on the different subspaces obtained, this time we also project the test points onto the orthogonal space and then use the classifiers trained from each view to predict for the test point. We tried two sub-approaches for this experiment - a) Include the test points while clustering for the training phase. b) Do not include the test points while training, but use the projector information to project the test points separately. The former uses more information than the latter and we found no significant difference in the results between the two sub – approaches.
3. Obtain the clusters and views, train a classifier for each view instead of each cluster, then follow the same approach as in point 2.
We also trained the same number of classifiers as in their respective approaches which were trained by randomly sampling 60% of the training data. We did these to compare the difference between the predictions made by the Random Set with our Multi Ensemble set obtained from 1,2 or 3. It is a fair evaluation approach because we expect many classifiers to perform better than a single classifier. Upon obtaining the results, we saw that Random Set usually performed better than the single mRVM classifier, thereby confirming our intuition.
One of the approaches showed improvement over the Random Set, we plan to publish these results soon and I will write more about it in the future.
Our approach is generic enough to be applied to any data. This is an alternative ensemble technique apart from approaches such as bagging and boosting.
We also tried all the approaches (1,2 and 3) for the estimation of Poverty Maps. We made some additional modifications which is specific to that domain as we want to utilize the GIS data along with the census and survey data. We are currently in the phase of collecting results and additional experimentation for this application. I will talk in more detail about the efforts and approaches in a future blog post.