Mean and Median

ASK FOR THE MEDIAN, NOT FOR MEAN!

Vikas Johal

Vikas Johal

MAY - 19 - 4 min read

WHY MEDIAN IS BETTER THAN THE MEAN IN CASE OF SKEWED DATA?

WHAT ARE THE MEAN AND MEDIAN?

  • [Mean] - Gives the average value of a set of data.
  • [Median] - Gives the median value of a set of data arranged in either ascending or descending data.

SAMPLE : A father wants his daughter to get married to a man who has a financially stable and sound family. One of his friends suggests a house with five brothers, (Pandavas) sons of Pandu.

NAMEANNUAL SALARY OF FIVE PANDAVAS
1Yudhishthir6 lakh
2Arjuna40lakh
3Bhima7lakh
4Nakula5lakh
5Sahadeva2 lakh

SOL: Mean value of salary of five pandavas= (Total sum of salary of five Pandavas)/(number of Pandavas)

(6+40+7+5+2)/5 = Mean salary of five pandavas = 12lakh

Therefore, the average salary of Pandu's family = 12lakh

Median

Arrange the salary data first : 2lakh, 5lakh, 6lakh, 7lakh, 40lakh

Median gives the value of middle term i.e out of five arrange term 3rd is a middle term and the corresponding value is 6 lakh The median salary of Pandu's family is around 6 lakhs.

From the above example, the average salary of Pandu's family comes out 12lakh whereas the median gives a value of around 6 lakhs. Here is a given case, due to the high salary of Arjuna i.e. 40 lakh, the whole family's average income increases drastically to 12 lakh or one can take it as the salary of each Pandavas is 12 lakh annually. But in the real picture, out of five Pandavas, four have an annual income of less than 8 lakh.

Whereas median shows that the median salary of a family or each Pandavas is around 6 lakh which is more close to the salaries of four Pandavas out of five. So, the median is giving more accurate information about the financial status of Pandu’s family to the girl's father. Here, the high salary Arjuna is playing role of an outlier. The Median here reduces the effect of the outlier and gives better information than the mean.

So, in conclusion:

"Median is better than mean in case of outlier or skewed data because median reduces the effect of outlier and gives more close information about the actual data." "So it shows that skewed data/ rare data/ outlier should not be used to model general phenomenon’’.

about the author

Vikas Johal is Junior Data Scientist and Power BI Instructor at Analogica, Bangalore. Presently exploring the vast field of Data Science and Business Intelligence.