A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data by DJ Patil, American Mathematician and Computer Scientist
Outlier Analysis
What is Outlier? Why it is a big deal in Data Science? First of all, Yes it is a big deal in data industry. In Data Science, an Outlier is an Observation point that is distant from other observations. It may be due to variability in the measurement or it may indicate experimental error.
Outliers in data can distort predictions and affect the accuracy, if you don't detect and handle them.
Let's understand "Interquartile Range(IQR)". It tells how spread middle values are; it can also be used to tell when some of values are too far from the cental values. According to statistics, if a value lie below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR considered as outliers.
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
Output
| Outliers |
Boxplot is built on five boundaries:- minimum, Q1, Q2, Q3, maximum
![]() |
| Boxplot |
Minimum is Q1 - 1.5 * IQR.
Q1 is 25th percentile.
Q2 is 50th percentile(middle value of dataset).
Q3 is 75th percentile.
Maximum is Q3 + 1.5 * IQR.
So, if the value lies below minimum are outliers or above maximum are outliers. It becomes very easy through boxplot.
Output
| Outliers Windspeed in dataset |
| Boxplot after outliers windspeed |
We can check the outliers value where does it lie with other variable value in dataset and also check the boxplot after removing outliers.
Feel Free to comment your opinions!💓
If you are new on this post, check out it's previous part.

0 Comments