Statistics are like a bikini. What they reveal is interesting. But what they hide is vital by Aaron Levenstein, Author and former business professor


statistics, lynda.com
Source: Lynda.com


Basic and most important terms of Statistic

We will try to understand some most common terms of statistics that we encounter on daily basis through "describe" function in R language that found under "psych" package. To get benefited by this function, first you need to install the package.

Best thing about this function, it will reveal alot about your data. You will get to know the behavior of your data. That's awesome, through one word function, we would have much needed information of the data. Let's begin...

Just type the function name and provide it with the dataset. Are you ready to see the magic!😎


statistics, data science
Basic Statistics

Factor variables are labelled an asterisk[*]

1. vars : It is the variable number in dataset that listed serially.
2. n : Total number of observations in each variable  .
3. mean : The average of each variable in dataset is found by adding all numbers of each variable and then dividing by the number of values in the set.
4. sd : [Standard Deviation] : It is a measure of how close the numbers are to the mean. A low sd indicates that the values tend to be close to the mean of the set while a high sd indicates that values are more dispersed over a wider range.
5. median : It is the middle value if ranked in order(either in ascending or descending).
6. trimmed : It is similar to mean, only it trims any outliers. Regular mean is sensitive towards outliers, so a trimmed mean can often be a better fit to the dataset. Also called truncated mean.
7. mad : [Median Absolute Deviation] : It is a robust measure of how spread out a set of data set is. If data is normal, look up to standard deviation but if data is not normal then mad is best.
8. min : It is the smallest value in each variable of data set.
9. max : It is the maximum value in each variable of data set.
10. range : It is the difference between maximum value and minimum value. Used to understand the dispersion in the data.
11. skew : It represents the imbalance and asymmetry from the mean of a data distribution. It can be in negative, zero, positive. Negative means left- skewed distributions. Positive means right- skewed distributions. Zero means normal data distributions.
12. kurtosis : It is a measure used to describe the degree to which the score cluster in the peak or the tails of frequency distribution . If the score is more that means more outliers and if the score is less that means less outliers.
13. se : [standard error] : It is a measure of spread just like standard deviation. Higher the number, more spread out your data is. It tells how far your sample statistic(like mean) deviates from actual population mean.


Feel free to comment your opinions💓