HomeGoogle DocsSpreadsheetFilter Distinct Records in Google Sheets

Filter Distinct Records in Google Sheets

Published on

We can use a SORTN-based dynamic formula to filter distinct records in a multi-column table in Google Sheets.

Before we delve into the formulas, let’s clarify the difference between unique and distinct records.

This is important because Google Sheets has a function named UNIQUE, and we often refer to an array applied with UNIQUE as unique values.

NameUniqueDistinct
Emily✔✔
Sophia✔✔
Eric✔✔
Olivia✔✔
Ben✔
Ben

“Unique” refers to items in a dataset that appear only once. If any item occurs more than once, it won’t be considered unique.

“Distinct” also refers to unique items within a dataset. It considers each unique occurrence of an item in the dataset, removing any duplicates.

Sample Data and Filtered Distinct Records

The following table is in the range A1:B in Google Sheets. I want to filter distinct records from this multi-column table.

NameStatus
EmilyProfessional
SophiaUnprofessional
EricProfessional
OliviaProfessional
BenUnprofessional
BenUnprofessional

The formula should return the following distinct records:

EmilyProfessional
SophiaUnprofessional
EricProfessional
OliviaProfessional
BenUnprofessional

Formula to Filter Distinct Records in Google Sheets

Formula:

=LET(
   range, A2:B, 
   CHOOSECOLS(
      SORT(
         SORTN(HSTACK(range, ROW(range)), 9^9, 2, 1, TRUE), 
         COLUMNS(range)+1, TRUE
      ), 
      SEQUENCE(1, COLUMNS(range))
   )
)

The above formula filters distinct records in the specified data range.

How do I use this formula for a different data range in Google Sheets?

I’ve made this formula adaptive to data ranges with any number of columns. You should make only two changes:

  1. Replace A2:B with your actual data range.
  2. Replace 1 (highlighted) in the formula with the key column number, i.e., the position of the column in the table that determines distinct.

Formula Explanation

Let’s break down the formula.

Step # 1:

Actually, the formula is as simple as:

=SORTN(A2:B, 9^9, 2, 1, TRUE)

The syntax of the SORTN function (Adapted to our Formula):

SORTN(range, n, display_ties_mode, sort_column, is_ascending)

Where A2:B represents the range, 9^9 represents n, the number of distinct records (since we are unsure about it, we used 9^9, an arbitrarily large number). 2 is the display_ties_mode to make the data distinct, 1 is the sort_column (key column), and TRUE represents is_ascending, sorting the data in ascending order.

Step #2:

The issue with this formula is the sorting performed. So the output won’t be in the original data order.

To solve that, we have made two changes in the formula.

  1. Instead of A2:B, we used the range HSTACK(A2:B, ROW(A2:B)). This adds a row number column as the third column.
  2. Wrapped the SORTN with SORT to sort the distinct records based on the row numbers (last column) in ascending order.

The formula becomes:

=SORT(SORTN(HSTACK(A2:B, ROW(A2:B)), 9^9, 2, 1, TRUE), 3, TRUE)

This solves the sorting issue but introduces a new problem, an extra row number column in the output.

EmilyProfessional2
SophiaUnprofessional3
EricProfessional4
OliviaProfessional5
BenUnprofessional6

Step #3:

We can use the CHOOSECOLS function to return only the columns we want. In our above sample data, we have two columns.

=CHOOSECOLS(SORT(SORTN(HSTACK(A2:B, ROW(A2:B)), 9^9, 2, 1, TRUE), 3, TRUE), {1, 2})

We can use this formula to filter distinct records in Google Sheets. It’s not dynamic, and users may want to make several changes to make this formula adaptive to their data range.

So I have used the LET function.

Implementing LET:

It helps us name the range A2:B as ‘range’ and use that in the formula instead of referring to A2:B.

=LET(range, A2:B, CHOOSECOLS(SORT(SORTN(HSTACK(range, ROW(range)), 9^9, 2, 1, TRUE), 3, TRUE), {1, 2}))

Another enhancement made is to dynamically find the last column, which is the row number column. Instead of specifying 3, which is as per our data, I used COLUMNS(range)+1.

=LET(range, A2:B, CHOOSECOLS(SORT(SORTN(HSTACK(range, ROW(range)), 9^9, 2, 1, TRUE), COLUMNS(range)+1, TRUE), {1, 2}))

Finally replaced the number of columns in CHOOSECOLS, which is {1, 2}, with SEQUENCE(1, COLUMNS(range)).

Resources

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Interactive Random Task Assigner in Google Sheets

You have multiple tasks and multiple people. Here's a fun way to randomly assign...

Google Sheets Bar and Column Chart with Target Coloring

To color each data point in the column or bar chart according to the...

Analyzing Column A Between Non-Adjacent Values in Column B: Google Sheets

This tutorial addresses a common scenario encountered in Google Sheets: how to sum, average,...

Excel Word Count: Beyond SUBSTITUTE

You might want to count the number of words in a cell in Excel...

More like this

Interactive Random Task Assigner in Google Sheets

You have multiple tasks and multiple people. Here's a fun way to randomly assign...

Google Sheets Bar and Column Chart with Target Coloring

To color each data point in the column or bar chart according to the...

Analyzing Column A Between Non-Adjacent Values in Column B: Google Sheets

This tutorial addresses a common scenario encountered in Google Sheets: how to sum, average,...

4 COMMENTS

  1. I have a table with the following rows: HF-1, HF-2, HF-3, and so on.

    On another Sheet, I would like to use only the next new (not used before) HF.

    How do to so, please?

    • Hi, Daniil,

      You can use LET and XLOOKUP as per the following example.

      =let(last_value,split(xlookup("HF*",Sheet1!B2:B100,Sheet1!B2:B100,"No Match",2,-1),"-"),index(last_value,0,1)&"-"&index(last_value,0,2)+1)

      B2:B100 is the range that contains HF-1, HF-2, HF-3, and so on.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.