The title Split to Column and Categorize probably gives you a very vague idea of the content, right? So let me explain what I meant to say about this new Google Sheets tutorial.
I have a category in cell A1 and the items that come under that category are entered as comma separated values in cell B1.
With an array formula, you can split that values into a column and categorize in Google Sheets.
The above categorization is also possible with multiple rows of values. First, let me explain the above concept with two images.
Problem:
Expected Output:
This type of split and categorization is possible in Google Sheets with a formula. Here what happening is as follows.
A single piece of formula splits the comma-separated values in cell B1 and arrange it into a column. Together with that the category in cell A1 got filled down.
My formula has one more advantage. You can use the formula to split comma-separated values into a column and categorize multiple rows of data.
Example Screenshot to Split to Column and Categorize Multiple Rows of Values:
Formula:
=Query(ArrayFormula(TRIM(split(transpose(split(textjoin("^",1,if(len(B1:B),REGEXREPLACE(B1:B&",",",","^"&A1:A&","),)),",")),"^"))),"Select Col2,Col1")
I know I should explain the role of each formula used in this combo. Here is the step-by-step instructions/information.
How to Split Comma Delimited Values into Column and Categorize
The Regexreplace, Textjoin, Split, and Transpose are the main functions in the above split to a column and categorize combo formula. Trim, ArrayFormula, Len, and Query are optional but have their own roles.
Reference: Google Sheets Functions guide.
Step # 1:
Regexreplace to Replace Comma Delimiter with Categories
Look at the middle of the formula from where the entire combo develops. It’s none other than the Regexreplace function.
Formula:
=REGEXREPLACE(B1&",",",","^"&A1&",")
Output:
The B1&","
added an extra comma at the end of the comma-separated text in cell B1. That value acts as the reference cell, I mean, ‘text’ in the Regexreplace.
REGEXREPLACE(text, regular_expression, replacement)
The Regexreplace function (see the regular_expression) substitutes all the existing commas (the second comma in the formula represents it) with the category.
Actually, it not merely replaced the commas with the category but replaced the commas with a caret sign + category + comma. Please refer to the image above.
This caret and extra comma will act as the delimiter in two split functions in the combo later.
In my original formula, you may see infinite rows in the Regexreplace function.
When you use infinite rows, as I have mentioned many times on this site, you must exclude the blank rows using If + Len combo. If you want you can follow this tutorial for that additional info – LEN Function in Google Sheets and Practical Use of It.
=ArrayFormula(if(len(B1:B),REGEXREPLACE(B1:B&",",",","^"&A1:A&","),))
You may not see the ArrayFormula in my master formula as I have moved that to the beginning of that formula to avoid multiple uses of it.
Step # 2:
Textjoin to Join the Values in Column B as a Single Row Value
There are two rows. When joining these two rows (or more rows) using the Textjoin function, I have again used the caret as the delimiter.
Formula:
=ArrayFormula(textjoin("^",1,if(len(B1:B),REGEXREPLACE(B1:B&",",",","^"&A1:A&","),)))
Output:
Step # 3:
Split and Transpose Values Based on Comma Delimiter
We are one step closer to achieve our goal, i.e, split to columns and categorize in Sheets.
This is the first Split and this time I have used the comma delimiter. Also transposed that output to a column.
=ArrayFormula(transpose(split(textjoin("^",1,if(len(B1:B),REGEXREPLACE(B1:B&",",",","^"&A1:A&","),)),",")))
Step # 4:
Split and Trim to Separate Categories and Associated Values
In the second split formula, I’ve used the caret as the delimiter. So that we have now is one column with categories and another with the values. The Trim function is to remove any extra space in the values.
=ArrayFormula(TRIM(split(transpose(split(textjoin("^",1,if(len(B1:B),REGEXREPLACE(B1:B&",",",","^"&A1:A&","),)),",")),"^")))
Actually, this formula is enough to split comma separated values into columns and categorize. But this formula returns the value column first and the category column second which seems absurd.
In the next, I mean final step, we can rearrange the columns.
Step # 5:
Query to Re-Order the Column Position
See the final formula on the top. The Query used there is to re-order the columns.
If you have more than two columns you may just tweak the formula as follows.
Split One Column and Categorize Multiple Columns
Here also the comma-separated values are only in one column. But there are other two columns.
You can simply include that in the formula. See the underlined part.
Additional Resources:
- Split Number to Digits in Google Sheets.
- How to Split Text to Columns or Rows in Google Sheets by Delimiter.
- Replace Multiple Comma Separated Values in Google Sheets.
- How to Count Comma Separated Words in a Cell in Google Sheets.
- Sum, Count, Cumulative Sum Comma Separated Values in Google Sheets.
- Extract Unique Values from a Comma Separated List in Google Sheets.
– screenshot URL remove my admin –
with my attempt …
Hi, Federico Finati,
Thanks for the screenshot. I could understand you want the opposite of the problem attended in this tutorial.
Sample data in A1:B
A | B
Item | Category
Fruit | Orange
Fruit | Lemon
Fruit | Banana
Vegetable | Spinach
Vegetable | Cucumber
As per this sample, you can use these formulas.
D2:
=unique(A2:A)
E2:
=byrow(unique(A2:A),lambda(r, textjoin(", ",true,filter(B2:B,A2:A=r))))
N.B.:- Byrow is a new function. You can find the related tutorial in my Function Guide.
Synthetic and very elegant … I was lost in the query command … Thank you.
Thanks for your feedback.
Fine work, thank you.
I should solve the dual problem (join two categorized columns)
Hi, Federico Finati,
Sorry, I’m in the dark. Please share an example below.
Just what I needed!
You can discard the QUERY() call if you swap around the concatenation inside the regexp replace so the category goes first:
=ArrayFormula(TRIM(split(transpose(split(textjoin("^",1,if(len(B1:B),
REGEXREPLACE(","&B1:B,",",","&A1:A&"^"),)),",")),"^")))
Instead of
Apple, Mango, Orange, Banana,
Apple^Fruits, Mango^Fruits, Orange^Fruits, Banana^Fruits,
it’s now
,Apple, Mango, Orange, Banana
,Fruits^Apple, Fruits^Mango, Fruits^Orange, Fruits^Banana
Hi, Scott,
Thanks for your valuable feedback!
Best,