Unmask Outliers Instantly in Google Sheets (1 Formula!)

Published on

By inputting the range of data points into a formula, we can promptly identify outliers in Google Sheets. The formula will return TRUE or FALSE Boolean values, indicating the presence of an outlier.

This eliminates the need for manual calculations, including determining quartile 1 and quartile 3, finding the interquartile range, and establishing lower and upper limit values for outlier identification. The formula handles all these tasks seamlessly.

Outliers:

In a given range, outliers are data points that significantly deviate from the majority of other data points.

Example:

Imagine you’re a steel distributor, and companies typically purchase materials ranging from 1 to 20 tons. Suddenly, an order for 100 tons arrives. This significant deviation raises eyebrows—Is it a typo, a genuine (but unusual) purchase, or something else?

Identifying outliers in data manually can be tedious. However, in Google Sheets, you can use an elegant formula to streamline the process. Simply input your data range, and the formula can flag potential outliers, often returning TRUE or FALSE values.

However, it’s crucial to note that blindly relying on TRUE/FALSE flags without considering context can lead to overlooking valid data.

Detecting Outliers in Google Sheets: Formula

Here is the formula for swiftly identifying outliers in Google Sheets:

=ArrayFormula(LET(
   range, $B$2:$B, 
   q_one, QUARTILE.INC(range, 1), 
   q_three, QUARTILE.INC(range, 3), 
   iqr, q_three-q_one, 
   lower_bound, q_one-1.5*iqr, 
   upper_bound, q_three+1.5*iqr, 
   outlier, NOT(ISBETWEEN(range, lower_bound, upper_bound)), 
   IF(range="", ,outlier)
))

Where:

  • $B$2:$B is the range containing the data points for which you need to identify outliers. So, replace $B$2:$B with the actual data range.
Example illustrating outlier identification in Google Sheets using statistical techniques

I’ve used absolute references because we will use this formula for highlighting outliers in Google Sheets as well. We will explore that after the formula explanation below.

Formula Breakdown

The LET function in the outlier formula serves a key role in optimizing performance when identifying outliers in a large dataset by eliminating redundant calculations. Additionally, it allows us to assign names to ranges for clarity and simplicity.

For instance, in the formula above, we employ the LET function to calculate the Interquartile Range (IQR) and assign it the name ‘iqr.’ This named range is then used in subsequent calculations, avoiding the need for repetitive IQR calculations.

Furthermore, we name the data range $B$2:$B as ‘range,’ and wherever $B$2:$B appears later in the formula, we substitute it with ‘range.’ This approach streamlines the formula, making it more readable and easier to manage. Simply replace $B$2:$B with your desired data range, and the formula will unveil the outliers.

Syntax of the LET Function:

LET(name1, value_expression1, [name2, …], [value_expression2, …], formula_expression)

How the Formula Finds Outliers in Google Sheets

Let me explain the outlier formula with the following table in Google Sheets:

NamesValue ExpressionsRemarks
range$B$2:$B
q_oneQUARTILE.INC(range, 1)Represents the value below which 25% of data points fall when arranged in ascending (A to Z) order.
q_threeQUARTILE.INC(range, 3)Represents the value below which 75% of data points fall when arranged in ascending (A to Z) order.
iqrq_three-q_oneRepresents the spread of the middle 50% of the data.
lower_boundq_one-1.5*iqr* Calculated value used to identify potential outliers below it. Typically calculated as Q1 – 1.5 * IQR.
upper_boundq_three+1.5*iqr* Upper boundary beyond which data points might be considered outliers. Calculated using Upper Limit = Q3 + 1.5 * IQR.
outlierNOT(ISBETWEEN(range, lower_bound, upper_bound))The formula that finds outliers. The ISBETWEEN function returns TRUE if data points are between lower and upper limits. Wrapping the NOT converts FALSE to TRUE and TRUE to FALSE.
Formula ExpressionIF(range=””, ,outlier)Returns blank if data points in the range are blank, else returns the value determined by the outlier part of the formula.

* 1.5 is a commonly used multiplier for defining the boundary.

How to Highlight Outliers in Google Sheets

If you prefer highlighting data points rather than returning TRUE or FALSE to identify outliers, you can make a few adjustments to the formula. This will allow you to easily spot outliers in a dataset.

To highlight outliers without relying on any helper range, use the following custom rule in Conditional Formatting in Google Sheets:

=LET(
   range, $B$2:$B, 
   q_one, QUARTILE.INC(range, 1), 
   q_three, QUARTILE.INC(range, 3), 
   iqr, q_three-q_one, 
   lower_bound, q_one-1.5*iqr, 
   upper_bound, q_three+1.5*iqr, 
   outlier, NOT(ISBETWEEN(B2, lower_bound, upper_bound)), 
   IF(B2="", ,outlier)
)

Replace $B$2:$B with your actual range and B2 with the cell ID where your range begins.

Example of highlighting outliers in Google Sheets

The changes in this highlight rule, compared to the formula that finds outliers, are minimal. In the outlier formula, we used the ‘range’ in ISBETWEEN and in the formula expression part.

In the custom formula rule, we used cell reference instead because we need to test each value (data points) individually, not as a range, for highlighting. Therefore, I removed the ARRAYFORMULA function since it is primarily used for expanding the ‘outlier’ and formula expression parts.

To apply the rule:

  1. Select the range.
  2. Click on Format > Conditional formatting.
  3. Under “Format rules,” choose “Custom formula is.”
  4. Enter the above formula (highlight rule).
  5. Click “Done.”

Conclusion

You can use either the formula or the highlighting rule to identify outliers in Google Sheets, and the choice is yours. I prefer the highlight rule.

If my dataset is very large, I would not directly resort to the highlight rule. Instead, I would first apply the formula that returns TRUE or FALSE and then use those values for highlighting. This approach improves performance.

In the above example, I would select B2:B, the data range, and use the following custom formula for highlighting:

=C2=TRUE

Resources:

  1. How to Use the Percentile Function in Google Sheets
  2. Percentile Rank Wise Conditional Formatting in Google Sheets
  3. Calculating Percentile for Each Group in Google Sheets
  4. How to Use the Quartile Function in Google Sheets
  5. The PERCENTRANK Functions in Google Sheets
Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Highlight Upcoming Birthdays in Google Sheets

When highlighting upcoming birthdays in Google Sheets, one important aspect is ignoring the year...

Compare Two Tables for Differences in Excel

To compare two tables with similar data for differences, you can use the XLOOKUP...

Calculate Weighted Average in Pivot Table in Google Sheets

You can calculate a weighted average in a Pivot Table using a custom formula...

Summarize Data and Keep the Last Record in Google Sheets

In Google Sheets, we can summarize data (like sum, average, min, max, count) using...

More like this

Highlight Upcoming Birthdays in Google Sheets

When highlighting upcoming birthdays in Google Sheets, one important aspect is ignoring the year...

Calculate Weighted Average in Pivot Table in Google Sheets

You can calculate a weighted average in a Pivot Table using a custom formula...

Summarize Data and Keep the Last Record in Google Sheets

In Google Sheets, we can summarize data (like sum, average, min, max, count) using...

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.