Ok, here is the situation: In linear regression we come up with a hypothesis to predict a new given value based on train dataset. Standard algorithms output is a real number as prediction but user has no idea about accuracy of the prediction (as far as i know), therefore I thought to write a simple Python code to check how much giving tolerance will be useful.
I wrote a simple code to create a fake dataset which has linear correlation. The output was as figure 1:
As we can see the predicted value in lower values of X has huge tolerance but in higher values it has predicted visibly lower than average data. Therefore knowing how much the predicted value can vary could be useful in some cases. For this matter the function predict() will return final predicted value as well as distance of predicted value from minimum value and maximum value in the nearby data to give better insight of maximum and minimum possible values and the relation of the predicted value with the possible ranges. For demonstration purpose the code prints the calculated values as following:
we can see for given value 2 the predicted value is 0.258 and mean of nearby data around 2 is 0.2612 by calculating average of these two numbers the final prediction will be 0.260 which is enhanced a little based on the data spread nearby. In given value 2 the prediction is almost in the middle of max and min values but for x=3 the distance of prediction with min is -0.21 and with max is 0.14 which indicates the prediction is off from min , and the final value update shows that enhancement is correct. (toward min value)
The function considers 5% of total data as nearby data.
Note: this method works in the range of training data, therefore this function can't enhance results outside of data range as there is no data.
You can find the Python code in
GitHub