Data Science Step: Bayes' Theorem

Statistics is something almost new to me and Bayes' theorem was completely new to me! So I liked to write about it and make a simple Python code for it to digested it easier:)

So what does it say?

Wikipedia says it describes the probability of an event, based on prior knowledge of conditions that might be related to the event. In fact, Bayes’ Theorem is a way of finding a probability when we know certain other probabilities.

Looking the formula we can see P (H | E) is the likelihood of event H occurring given that E is true. Thus this probability is related to the likelihood of even E occurring that H is true times at probability of H divided by probability of E. In a simple language the occurrence of H is related to E and vice versa at the same time.

Example

Suppose we have a test for diagnosing cancer, the test accuracy is detecting 99% true positive and 99% true negative, meaning there is only 1% error on both positive and negative results. Also we know that only 0.5% of people have cancer. If we take a test from randomly selected person and the result is positive we cannot say 99% the test is correct. Because the probability of not having cancer is much higher than having cancer and the not cancerous error outweighs the positive cancer results.

Lets give a numerical example: If we test 1000 people therefore we expect to have only 5 cancer and 995 non-cancer. But our test has 99% accuracy, meaning from 995 non-cancerous we will have 995*0.01 false detection error, meaning almost 10 false positive results and 5*.01 almost 0 false negative results. Meaning out of 15 positive results of test only 5 of them are genuine 5/15=0.33 !

Now by applying Bayes' theorem we will get the same result:

You can find simple Python code at GitHub

Data Science Step

Thursday, June 7, 2018

Bayes' Theorem

No comments:

Post a Comment

Format