In this tutorial, you will get the array and non-array formulas to remove duplicates from comma-delimited strings in Google Sheets.
It must not necessarily be a comma delimiter. You can use the formula if the delimiter is a pipe, new line, etc.
I’ll explain what corrections you should make in the formula.
You may already have the formula to remove duplicates from comma-delimited strings in Google Sheets.
But do you have an array formula to expand for each row?
The purpose of the array formula here is to remove duplicates from comma (pipe or any other) delimited strings from each row by residing in the topmost row in the range.
I’ve two types of array formulas: An old-school one and a new one based on the Lambda functions.
Removing Duplicates from Comma-Delimited Strings: Non-Array Formula
Non-Formula:
=ArrayFormula(textjoin(", ",true,unique(trim(split(A2,",")),true)))
Insert the above formula in cell B2. Then copy and paste it down the column as far as you want.
How does it work?
The SPLIT divides the text around the comma and puts each fragment into a separate cell in the row.
The TRIM removes white spaces.
Then UNIQUE removes duplicates. The role of the TEXTJOIN is to join the fragments back into a comma-delimited string.
The ArrayFormula is to support the non-array TRIM function.
The above is a non-array formula to remove duplicates from comma-delimited strings.
If the delimiter is a pipe, replace split(A2,",")
with split(A2,"|")
.
Also in the TEXJOINT, replace textjoin(", "
with textjoin("| "
.
Let’s move to the array formula to remove duplicates from the comma-delimited text.
Removing Duplicates from Comma-Delimited Strings: Array Formula
I suggest you use the Lambda formula which is clean and works better.
Old School Approach (Coded before Lambda): ✖
The array formula is a bit lengthier. Here it is.
Array Formula (Old School):
=ArrayFormula(transpose(split(textjoin(", ",1,IFERROR((if(row(indirect("A2:A"&counta(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),","))<>""))))))-match(REGEXEXTRACT(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),","))<>""))),"\w+"),REGEXEXTRACT(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&substitute(A2:A,", ",","&row(A2:A)&" "),","))<>""))),"\w+"),0)=1,"|",)®EXREPLACE(to_text(unique(filter(flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),",")),flatten(split(if(len(A2:A),row(A2:A),)&" "&trim(substitute(A2:A,",",","&row(A2:A)&" ")),","))<>""))),"^(\w*\W*){1}","")))),"|")))
For the sample sheet with the formula entered (step-by-step), please scroll down to the end of the post.
How does this formula remove duplicate texts from the comma-delimited strings?
We can explain the above formula in 9 steps under three categories.
- Split Strings and Row Wise Unique (3 steps).
- Elements to Combine Strings Back to Its Rows (4 steps).
- Final Steps to Remove Duplicates from Comma Delimited Strings (2 steps).
1. Split Strings and Row Wise Unique
The below formulas go in cells B2, C2, and D2. Please see the image below to understand what they return.
1. Inserting Row Numbers and a White Space Immediately After the Delimiters.
=ArrayFormula(trim(substitute(A2:A,",",","&row(A2:A)&" ")))
2. Inserting Row Numbers at the Beginning of Values in Each Row.
=ArrayFormula(if(len(A2:A),row(A2:A),)&" "&B2:B)
The purpose of the above two formulas is to add row numbers to each string.
3. Split Strings Based on Delimiter, Unique, and Filter.
=ArrayFormula(to_text(unique(filter(flatten(split(C2:C,",")),flatten(split(C2:C,","))<>""))))
When you unique the strings this way, the formula will only remove duplicates from the same row.
We have solved one main issue we may encounter when trying to remove duplicates from comma-delimited strings.
2. Elements to Combine Strings Back to Their Rows
The below formulas go in cells E2, F2, G2, and H2.
The step # 3 formula in the previous step returns unique row number added values.
Below will separate the row numbers (4th step) and the rest of the values (5th step). Please refer to the image below.
4. Extracting Row Numbers.
=ArrayFormula(REGEXEXTRACT(D2:D16,"\w+"))
5. Extracting Strings.
=ArrayFormula(REGEXREPLACE(D2:D16,"^(\w*\W*){1}",""))
Related: Regex to Get All Words after Nth Word in a Sentence in Google Sheets.
The purpose of the following two (steps # 6 and 7) formulas is to return a separator to mark the end of each row based on column D.
We will achieve that by returning the running count of step # 4 output and replacing sequence number 1 with a pipe and the rest of the values with blanks.
6. Running Count Based on Extracted Row Numbers.
=ArrayFormula(row(A2:A16)-match(E2:E16,E2:E16,0))
7. Delimiter to Separate Rows.
=ArrayFormula(if(G2:G16=1,"|",))
Note:- Here, I have used the |
character. If your strings are separated by the same, not a comma, then use here a ~
character instead.
3. Final Formula that Removes Duplicates from Comma-Delimited Strings
Please see columns F and H. We want those columns in the final step. First, we will combine the pipe (or tilde) separators in column H with the strings in column F.
Then join them using a comma separator, split at the pipe (or tilde) separator, and transpose.
The following formulas go in cells I2 and J2.
8. Add Separators with Extracted Strings.
=ArrayFormula(H2:H16&F2:F16)
9. Combine Strings Using a Comma Delimiter and Split Based on Separator.
=transpose(split(textjoin(", ",1,I2:I16),"|"))
This way we can remove duplicates from comma-delimited strings in Google Sheets.
Note: There shouldn’t be any blank rows in the range. Also, this resource-hungry formula may return errors in a larger data range.
New Approach (Lambda): ✔
Array Formula (Lambda – New):
=map(A2:A,lambda(r,iferror(ArrayFormula(textjoin(", ",true,unique(trim(split(r,",")),true))))))
Just insert this code in cell B2. Unlike the non-array formula, you won’t require to copy-paste it down.
It will spill down automatically if there are blank cells below B2.
It is the same non-array formula we used to remove duplicates from the comma-delimited strings.
The only difference is the MAP Lambda helper function.
The LHF helps the non-array formula to spill down.