Data Science Step: June 2018

Friday, June 29, 2018

Tic Tac Toe

Tic Tac Toe (Console)

This is a simple game as one of my Python tutorial milestones. I think it worth to keep it and share it with friends.

The code is uploaded in GitHub : https://github.com/DataScienceStep/Python/blob/master/Tic-Tac.py

Saturday, June 23, 2018

Python: Bit-wise Swap Function

Bit-wise Swap Function

Apart from Python binary functions (<< , >> , ^) I noticed I need bit-wise swap function too, It is simple but useful. The Idea is swapping specified bit in binary format of an integer number, if it is 1 turns into 0 and vice versa.
The Functions counts bit positions from right to left and first bit is considered as 1. No error handling implemented.
In the figure you can see the function definition and the sample output as comments.
Enjoy :)

Learning Python via a Project Episode 2

Learning Python via a Project
Episode 2 / ?

(Crime in Vancouver)

After part 1, I decided to get some analytics results from dataset for some basic insights.
In process I learned how to draw custom pie chars and bar charts.

From data I noticed the most common crime in Vancouver is Theft from vehicle and the region with highest number of crimes is Central Business District but I thought maybe the region is huge that's why the number is high so I calculated percentage of crime in region and it turned out Stanley Park has the highest Theft from vehicle crime rate with much lower number of total crimes in the region.

I uploaded the code in Github

Learning Python via a Project Episode 1

If there is any comment or suggestion please comment here or twit me at: @DataScienceStep

https://twitter.com/DataScienceStep

Github like:
https://github.com/DataScienceStep/Python/blob/master/CrimeProject.ipynb

Thursday, June 21, 2018

Learning Python via a Project Episode 1

Learning Python via a Project

Episode 1 / ?

I am a newbie in python so I decided to learn it!
The best way to learn a new language is getting hands dirty with the code :)
so I decided to pick a dataset and start an imaginary project and document whatever I think and do for my future reference and also share with others!

The codes are written in Jupyter Notebook with extra explanations but not tutoring in mind :)

Disclaimer!
These codes and concepts may have wrong ideas or not the best methods, they are just a learning process showcase.

Instead of copy and paste the code here I just give Github address so you can view it there. Later after finishing whole project, I will summarize the concepts that I dealt with them here and update these posts.

leave comments or twit any suggestions.
Thanks

Learning Python via a Project Episode 2

Twitter:
https://twitter.com/DataScienceStep
Github:
https://github.com/DataScienceStep/Python/blob/master/CrimeProject.ipynb

Saturday, June 9, 2018

Python - Linear Regression with tolerance

Ok, here is the situation: In linear regression we come up with a hypothesis to predict a new given value based on train dataset. Standard algorithms output is a real number as prediction but user has no idea about accuracy of the prediction (as far as i know), therefore I thought to write a simple Python code to check how much giving tolerance will be useful.

I wrote a simple code to create a fake dataset which has linear correlation. The output was as figure 1:

As we can see the predicted value in lower values of X has huge tolerance but in higher values it has predicted visibly lower than average data. Therefore knowing how much the predicted value can vary could be useful in some cases. For this matter the function predict() will return final predicted value as well as distance of predicted value from minimum value and maximum value in the nearby data to give better insight of maximum and minimum possible values and the relation of the predicted value with the possible ranges. For demonstration purpose the code prints the calculated values as following:

we can see for given value 2 the predicted value is 0.258 and mean of nearby data around 2 is 0.2612 by calculating average of these two numbers the final prediction will be 0.260 which is enhanced a little based on the data spread nearby. In given value 2 the prediction is almost in the middle of max and min values but for x=3 the distance of prediction with min is -0.21 and with max is 0.14 which indicates the prediction is off from min , and the final value update shows that enhancement is correct. (toward min value)

The function considers 5% of total data as nearby data.

Note: this method works in the range of training data, therefore this function can't enhance results outside of data range as there is no data.

You can find the Python code in GitHub

Thursday, June 7, 2018

Bayes' Theorem

Statistics is something almost new to me and Bayes' theorem was completely new to me! So I liked to write about it and make a simple Python code for it to digested it easier:)

So what does it say?

Wikipedia says it describes the probability of an event, based on prior knowledge of conditions that might be related to the event. In fact, Bayes’ Theorem is a way of finding a probability when we know certain other probabilities.

Looking the formula we can see P (H | E) is the likelihood of event H occurring given that E is true. Thus this probability is related to the likelihood of even E occurring that H is true times at probability of H divided by probability of E. In a simple language the occurrence of H is related to E and vice versa at the same time.

Example

Suppose we have a test for diagnosing cancer, the test accuracy is detecting 99% true positive and 99% true negative, meaning there is only 1% error on both positive and negative results. Also we know that only 0.5% of people have cancer. If we take a test from randomly selected person and the result is positive we cannot say 99% the test is correct. Because the probability of not having cancer is much higher than having cancer and the not cancerous error outweighs the positive cancer results.

Lets give a numerical example: If we test 1000 people therefore we expect to have only 5 cancer and 995 non-cancer. But our test has 99% accuracy, meaning from 995 non-cancerous we will have 995*0.01 false detection error, meaning almost 10 false positive results and 5*.01 almost 0 false negative results. Meaning out of 15 positive results of test only 5 of them are genuine 5/15=0.33 !

Now by applying Bayes' theorem we will get the same result:

You can find simple Python code at GitHub

Python - Fun with matplotlib

I've just started to learn Python and I try to have fun with my new knowledge! Therefore I practice in a fun way and it's fun if I share it with others, so here it is my first fun with python!

This is the output of the code I wrote to use matplotlib library :)

You can find he code from GitHub