## Bayes Rule and Big Data

Source: http://data-informed.com/applying-bayes-theorem-to-a-big-data-world/

In an increasingly technological world, the collection and utilization of big data is paramount. “Applying Bayes Theorem to a Big Data World” written by Catherine Bernardone and published on DataInformed.com discusses the role of Bayes Theory within the development of machine learning and predictive analytics. In lecture we’ve talked about Bayes Theorem in terms of conditional probability- event A’s likelihood of occurring given that event B has occurred. Bayes Theory uses past occurrences to make inferences about the future. In terms of this course, we’ve used Bayes Theory to better our understanding of information cascades and probability, but this is only the beginning. This article displays just how vital Bayes Theorem can be. By combining this theorem with the mountains of data constantly being mined, the door to accurate predictive analytics is opened.

Within Bernardone’s article, she asserts that the more data that we are able to amass regarding a certain topic, the better machine learning can occur. For example, if we were trying to program a computer to predict the weather, we would provide the machine with past weather data-the temperature, air pressure, wind speed, etc of a specific day. Along with the specific features will be the type of day it was or itâ€™s category-snow, rain, sunshine, etc. Eventually, after enough data is collected, programmers will give the computer the weather related features of the day, but this time, ask the computer to predict which category (snow, rain, sunshine etc) is to be expected. Bernardone divides this machine learning process into two steps. First the training process, which includes uploading the data, and outcomes of past events. This allows the computer to understand the pattern of categorization within the data set. Then comes the predicting process which allows the machine to predict outcomes based on past events- the more data the more accurate predictions will be. This format can be applied to nearly anything, as humans can use past successes and failures to help shape future aspects and predict vital outcomes.