What is Stratified Random Sampling and it's implementaton in R

An hour sitting with a pretty girl on a park bench passes like a minute, but a minute sitting on a hot stove seems like an hour. That's relativity. by Albert Einstein, Nobel Prize in Physics | Father of Relativity.  

Source: datasciencemadesimple.com


Stratified Random Sampling

Stratified Random Sampling is a method in which every element of the population is grouped into homogeneous subgroups before sampling. Each subgroup is known as strata(like age, education, gender, so on). From every strata, a element is chosen randomly that can represent its strata.

This Sampling is used to highlight the differences between the groups in a population. We use this method when we want to representation from each subgroup from the population.

Example

Suppose there is a XYZ company which has the worldwide operations and unfortunately there business is going down so CEO wants to make some major changes from ground level to the top level of the company. Worldwide, XYZ has 10,000 employees so to make some new policies he want to have suggestions from every one but it is not possible, right? 10,000 responses, he will go mad. A guy from his team suggests that he doesn't need to hear from whole 10,000 people, he only need to select some people randomly from each cities or, countries or, department in a way that their opinion represent the same as of the whole group.

CEO decides that he will select in a way that those person represent his/her city operations(city wise because it is more effective  as each city faces its own issues). So, he made subgroups city wise. 

Let's assume there are five cities for our simplicity and in each city, company has its own strength of employees based on the demand.

City1: 2000 employees
City2: 500 employees
City3: 3000 employees
City4: 1500 employees
City5: 3000 employees

Sample size of the starta: (size of sample)/(population size)*strata size

CEO wants 100 people out of 10,000. After applying above formula, from City1 : 20 employees, City2: 5 employees, City3: 30 employees, City4: 15 employees, City5: 30 employees. City1(20)+City2(5)+City3(30)+City4(15)+City5(30) = 100 employees.

That's it hard part is done. Now, we have to select randomly those people from each city like selecting randomly 20 employees out of 2000 employees from City1.

This is only just an example to make you understand the concept in a easy way.

Concluding the example in steps:

Step1: Select on what factor you want to divide your subgroups
Step2: Make a table representing your data
Step3: Decide your Sample Size
Step4: Apply the above formula that is used
Step5: Perform Random Sampling

Implementation



Alternate method


   
If you are building a model like a predictive model and you want to split your dataset into train-test dataset using startified sampling then:
  


Feel Free to drop your reviews💓

Post a Comment

0 Comments