A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data by DJ Patil, American Mathematician and Computer Scientist


Outlier Analysis 

What is Outlier? Why it is a big deal in Data Science?  First of all, Yes it is a big deal in data industry. In Data Science, an Outlier is an Observation point that is distant from other observations. It may be due to variability in the measurement or it may indicate experimental error. 

Outliers in data can distort predictions and affect the accuracy, if you don't detect and handle them.

Let's understand "Interquartile Range(IQR)". It tells how spread middle values are; it can also be used to tell when some of values are too far from the cental values. According to statistics, if a value lie below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR considered as outliers.
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR


Output
outliers
Outliers
If you are wondering why i have used boxplot in the code then you are on right path. To deal with outliers, boxplot is a great way to understand them.
Boxplot is built on five boundaries:- minimum, Q1, Q2, Q3, maximum


boxplot
Boxplot

Minimum is Q1 - 1.5 * IQR.

Q1 is 25th percentile.
Q2 is 50th percentile(middle value of dataset).
Q3 is 75th percentile.
Maximum is Q3 + 1.5 * IQR.

So, if the value lies below minimum are outliers or above maximum are outliers. It becomes very easy through boxplot.



Output
outliers windspeed, data science
Outliers Windspeed in dataset
boxplot,outliers,data science
Boxplot after outliers windspeed

We can check the outliers value where does it lie with other variable value in dataset and also check the boxplot after removing outliers. 


Feel Free to comment your opinions!💓
If you are new on this post, check out it's previous part.