TORTURE THE DATA, AND IT WILL CONFESS TO ANYTHING  by Ronald Coase, Nobel Prize Laureate


Exploratory Data Analysis

It is the best way to start and understand the project. It is the critical part of the Methodology. It is an approach for summarizing, visualizing, and becoming familiar with the important characteristics of a data set.

Skipping this part lead to choosing the wrong variables for the model and generate the inaccurate models.

Install "ggplot2", "GGally", "ggthemes", "wesanderson" packages.

Output
Data Science, bike rental, project, whole population
Whole Population distribution by count variable

Before running the code, we need to understand what this code will produce?, What kind of Information would it give?, How we are gonna interpret the graph?, What we will do with the information?. We have to answer all these question to ourselves.
We need to understand datatype of our variable in the first order of business.

"cnt" is a continuous variable. Histogram is the good option here.
But remember "ggplot2" package is a powerful tool for visualization, there are unlimited option to produce graph and visualize. Understand the variable then go for visualization.

A good graph is a combination of right choices of colors, themes, it should not pinch in eyes and have to be elegant though simple graph work very well too. It's just demand and supply.

This graph is a combination of both histogram and density, we can clearly observe that most of the count data between 3000 - 6000. Initially, the amount of count is low but slowly it is gradually increasing but in the end side, the count is decreasing in a constant form.

So, we can have an idea from this graph that on daily basis, number of bike rented between 3000-6000 most of the times.


Output
yearly count, data science project, analysis
Bike Rental count yearly wise

We can observe a important information from this graph. This graph is distributed on yearly basis. Red bar is 2011 and Green bar is 2012. We can compare both bar side by side and also can look separately. Lets see the red bar first, In 2011 the highest number of bike rented count is between 4500-5000 and in 2012, initially the bike rented count is low in comparison with 2011 but it keep growing till the 6000-8000.
This difference is because company might have changed its policy in 2012 or have introduced new services for the customers.

Output
data science project, monthly basis,r
Bike Rental count monthly wise

From this graph, we can clearly observe the month that is most profitable for the business i.e. June, August, September, October.


Output
Data Science project, analysis, weather condition
Bike Rental Count weather wise

What sort of effect occurs on bike rented count if weather changes?  So, we can easily compare these three weather graph.  We can see in Red color graph,bike rented count is maintained, no effect on business but if weather is Mist, Cloudy the count is becoming low. Weather  Light snow, rainy and thunderstorm is also bad for business, there might be a chance no one renting a bike.



If you are new on this post, check out it's part 1.
Feel free to comment your opinions.💓