We can use a SORTN-based dynamic formula to filter distinct records in a multi-column table in Google Sheets.
Before we delve into the formulas, let’s clarify the difference between unique and distinct records.
This is important because Google Sheets has a function named UNIQUE, and we often refer to an array applied with UNIQUE as unique values.
Name | Unique | Distinct |
Emily | ✔ | ✔ |
Sophia | ✔ | ✔ |
Eric | ✔ | ✔ |
Olivia | ✔ | ✔ |
Ben | ✔ | |
Ben |
“Unique” refers to items in a dataset that appear only once. If any item occurs more than once, it won’t be considered unique.
“Distinct” also refers to unique items within a dataset. It considers each unique occurrence of an item in the dataset, removing any duplicates.
Sample Data and Filtered Distinct Records
The following table is in the range A1:B in Google Sheets. I want to filter distinct records from this multi-column table.
Name | Status |
Emily | Professional |
Sophia | Unprofessional |
Eric | Professional |
Olivia | Professional |
Ben | Unprofessional |
Ben | Unprofessional |
The formula should return the following distinct records:
Emily | Professional |
Sophia | Unprofessional |
Eric | Professional |
Olivia | Professional |
Ben | Unprofessional |
Formula to Filter Distinct Records in Google Sheets
Formula:
=LET(
range, A2:B,
CHOOSECOLS(
SORT(
SORTN(HSTACK(range, ROW(range)), 9^9, 2, 1, TRUE),
COLUMNS(range)+1, TRUE
),
SEQUENCE(1, COLUMNS(range))
)
)
The above formula filters distinct records in the specified data range.
How do I use this formula for a different data range in Google Sheets?
I’ve made this formula adaptive to data ranges with any number of columns. You should make only two changes:
- Replace A2:B with your actual data range.
- Replace 1 (highlighted) in the formula with the key column number, i.e., the position of the column in the table that determines distinct.
Formula Explanation
Let’s break down the formula.
Step # 1:
Actually, the formula is as simple as:
=SORTN(A2:B, 9^9, 2, 1, TRUE)
The syntax of the SORTN function (Adapted to our Formula):
SORTN(range, n, display_ties_mode, sort_column, is_ascending)
Where A2:B represents the range
, 9^9 represents n
, the number of distinct records (since we are unsure about it, we used 9^9, an arbitrarily large number). 2 is the display_ties_mode
to make the data distinct, 1 is the sort_column
(key column), and TRUE represents is_ascending
, sorting the data in ascending order.
Step #2:
The issue with this formula is the sorting performed. So the output won’t be in the original data order.
To solve that, we have made two changes in the formula.
- Instead of A2:B, we used the range
HSTACK(A2:B, ROW(A2:B))
. This adds a row number column as the third column. - Wrapped the SORTN with SORT to sort the distinct records based on the row numbers (last column) in ascending order.
The formula becomes:
=SORT(SORTN(HSTACK(A2:B, ROW(A2:B)), 9^9, 2, 1, TRUE), 3, TRUE)
This solves the sorting issue but introduces a new problem, an extra row number column in the output.
Emily | Professional | 2 |
Sophia | Unprofessional | 3 |
Eric | Professional | 4 |
Olivia | Professional | 5 |
Ben | Unprofessional | 6 |
Step #3:
We can use the CHOOSECOLS function to return only the columns we want. In our above sample data, we have two columns.
=CHOOSECOLS(SORT(SORTN(HSTACK(A2:B, ROW(A2:B)), 9^9, 2, 1, TRUE), 3, TRUE), {1, 2})
We can use this formula to filter distinct records in Google Sheets. It’s not dynamic, and users may want to make several changes to make this formula adaptive to their data range.
So I have used the LET function.
Implementing LET:
It helps us name the range A2:B as ‘range’ and use that in the formula instead of referring to A2:B.
=LET(range, A2:B, CHOOSECOLS(SORT(SORTN(HSTACK(range, ROW(range)), 9^9, 2, 1, TRUE), 3, TRUE), {1, 2}))
Another enhancement made is to dynamically find the last column, which is the row number column. Instead of specifying 3, which is as per our data, I used COLUMNS(range)+1
.
=LET(range, A2:B, CHOOSECOLS(SORT(SORTN(HSTACK(range, ROW(range)), 9^9, 2, 1, TRUE), COLUMNS(range)+1, TRUE), {1, 2}))
Finally replaced the number of columns in CHOOSECOLS, which is {1, 2}
, with SEQUENCE(1, COLUMNS(range))
.
I have a table with the following rows: HF-1, HF-2, HF-3, and so on.
On another Sheet, I would like to use only the next new (not used before) HF.
How do to so, please?
Hi, Daniil,
You can use LET and XLOOKUP as per the following example.
=let(last_value,split(xlookup("HF*",Sheet1!B2:B100,Sheet1!B2:B100,"No Match",2,-1),"-"),index(last_value,0,1)&"-"&index(last_value,0,2)+1)
B2:B100 is the range that contains HF-1, HF-2, HF-3, and so on.
How do you return the C column?
Hi, Sifar,
I have mentioned that in the tutorial itself.
You need to change this part
(A2:A6&"|"&B2:B6)
with this(A2:A6&"|"&B2:B6&"|"&C2:C6)