Application of Bayes’ Rule in Neural Nets
https://iknowfirst.com/the-application-of-bayes-theorem-to-deep-learning
The article discusses the basics of neural nets and how Bayes’ rule can be integrated into such a process. In the traditional neural net, the first step is normally to devise a function to quantify loss. This loss function is typically based on the accuracy of the predicted outputs compared with actual sample data. Afterwards, the framework for the neural network is set up, and the weights of are learned through a process called stochastic gradient descent, a numerical method which minimizes (an estimation of the minimum) the loss function over a specific domain. More generally, we can think of the weights as how two specific “neurons” of the neural network pass along information. In this sense, the result will be determined by the accumulation of values which are passed onto the final “neuron” which makes the final decision for the predicted output. However, this process can be modified to cover a different type of problem.
In class, we learned Bayes’ rule can be used as to calculate conditional probabilities e.g. “what is the probability that we draw two red marbles given that we have already drew 6 blue ones?” In the context of neural nets, we can apply the same formulation. The information passed along neurons can also be subject to a probability distribution rather than a specific weight. Specifically, the method calculates the probability of a specific weight conditioned on the observed data. In this sense, the neural net will be able to output different predictions for the same inputs as the final “neuron” in the end will be subject to a probability distribution determined by the “neurons” used earlier in the process. While on paper this sounds problematic, it actually addresses many problems such as overfitting and regularization while naturally outputting a confidence level. These advantages demonstrate the importance of Bayes’ rule and how the application of probability can be used to model different problems. Of course, probability is only inherent in some problems, so the decision of whether or not to actually use this Bayesian neural net is still up to the experiment design.