Measure of Central Tendency

Firstly what is Central Tendency?

In Statistics Central Tendency is a single value that accurately summarizes the data or distribution of data. For small amount of data it is easy to analyze the data, but in case of large datasets its very hectic to analyze the data by just looking through it, that is where these measures come in handy.

Mean, Median and mode are the three measures used to find the Central Tendency.

Mean:

Mean represents the average value of the data. It is denoted by using a Greek letter mu.

mean is defined as the ratio of sum of all the points in a dataset to the number of points in a data set.

In mathematical terms,

Example:

Suppose consider a dataset which has the data of weights of people in a state.

weights=[50,55,54,67,78,44]

The mean can be calculated as (50+55+54+67+78+44)/6 = 58.

The mean of the above data is 58.

Median:

Median is the second measure of central tendency.

Median is just the central value in a dataset, Firstly we will find out how mean is calculated, then we will see how Median is helpful when we we have an outlier in a dataset.

Steps to calculate Median:

The first step is to arrange the data points in increasing order.

In the second step if the dataset contains odd number of values just pick the central value in the data set, If the dataset contains even number of value just compute the average of the two central values.

Let us see some examples so that we will have a clear understanding.

Example:

suppose we have data :[20,65,45,76,80]

First step is to arrange the data points in sorted order [20,45,65,76,80].

The Dataset contains odd number of values so we pick the middle term as Median, so the Median is 65.

Median provides an advantage over mean if the dataset contains an outlier.

Suppose our data looks like [2,3,4,5,6,7,80]

If we calculate the mean it gives 15.2, where as the median is 5.

we can see the huge difference between mean and median when we have an outlier. so when we know that our data has an outlier it is better to use median as the measure of central tendency.

Mode:

Mode is the third measure to find out the central tendency.

Mode is the value that occurs more number of times in a dataset, In some cases two or more values could repeat same times in that case all the values are considered mode of the data .

Example: suppose our dataset looks like [3,4,5,3,6,3,7,83]

As 3 is most frequently repeating in the dataset it is the mode.

Aspiring to be a Data Scientist

More from Krishnachaitanya

Aspiring to be a Data Scientist