Data Science Project | Bike Rental Count | Part 3

A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data by DJ Patil, American Mathematician and Computer Scientist

Outlier Analysis

What is Outlier? Why it is a big deal in Data Science? First of all, Yes it is a big deal in data industry. In Data Science, an Outlier is an Observation point that is distant from other observations. It may be due to variability in the measurement or it may indicate experimental error.

Outliers in data can distort predictions and affect the accuracy, if you don't detect and handle them.

Let's understand "Interquartile Range(IQR)". It tells how spread middle values are; it can also be used to tell when some of values are too far from the cental values. According to statistics, if a value lie below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR considered as outliers.

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

Output

Outliers

If you are wondering why i have used boxplot in the code then you are on right path. To deal with outliers, boxplot is a great way to understand them.
Boxplot is built on five boundaries:- minimum, Q1, Q2, Q3, maximum

Boxplot

Minimum is Q1 - 1.5 * IQR.
Q1 is 25th percentile.
Q2 is 50th percentile(middle value of dataset).
Q3 is 75th percentile.
Maximum is Q3 + 1.5 * IQR.

So, if the value lies below minimum are outliers or above maximum are outliers. It becomes very easy through boxplot.

Output

Outliers Windspeed in dataset

Boxplot after outliers windspeed

We can check the outliers value where does it lie with other variable value in dataset and also check the boxplot after removing outliers.

Feel Free to comment your opinions!💓
If you are new on this post, check out it's previous part.

Datacian

Data Science Project | Bike Rental Count | Part 3 | Outlier Analysis

Outlier Analysis

Post a Comment

0 Comments

More from Datacian

Categories

Contact Us

Recent Posts

Use Labels to Navigate

Translate

Wikipedia

Datacian

Data Science Project | Bike Rental Count | Part 3 | Outlier Analysis

Outlier Analysis

You may like these posts

Post a Comment

0 Comments

More from Datacian

Categories

Contact Us

Recent Posts

Use Labels to Navigate

Translate

Wikipedia