Knowing how to extract specific text from a cell in Google Sheets can save you time by reducing manual cut-and-paste tasks. There are many real-life scenarios where this skill becomes useful.
Example Scenario
Imagine you have a list of generic medicine names in one column. You need to extract information like the product ID, the chemical name of the drug, and dosage strength into separate cells. This is possible using different Google Sheets functions.
However, there’s no single function that fits all cases. The function you use depends on how the text string is structured.
The benefits of extracting text from cells are numerous. You can apply filters to the extracted data, apply conditional formatting, and even run analysis on specific sections of the text.
How to Extract Different Texts From a Cell in Google Sheets
Generic Name of Medicine | Drug | Dosage 1 | Dosage 2 | Inj. / Tabs/ Caps |
Pantoprazole 40 mg Tabs | Pantoprazole | 40 | Tabs | |
Domperidone 30 mg + Pantoprazole 40 mg Caps | Domperidone | 30 | 40 | Caps |
Pantoprazole 40 mg/ 10ml Inj. | Pantoprazole | 40 | Inj. |
In the table above, the generic names of a few medicines are listed in Column A. Columns B to E show the extracted information from Column A.
You can split the drug name, dosage, and form (tablet, injection, etc.) into different cells. While you could simply use the SPLIT function with spaces as delimiters, this method won’t give you precise control over what you want to extract.
The goal here is to learn how to extract specific text from a cell into different columns using various Google Sheets formulas.
Why SPLIT Isn’t Ideal Here
The SPLIT function might seem like an easy solution, but it splits all text by spaces, which can lead to unwanted results.
Formula:
=SPLIT(A2, " ")
Result:
- “Pantoprazole” in one cell
- “40” in another
- “mg” in a third
- “Tabs” in a fourth
While it splits the text, the SPLIT function doesn’t help you target specific pieces of information. What we need is a more targeted approach for extracting specific texts, such as the drug name, dosage, and form.
Extracting Specific Texts with Formulas
We’ll now go over several different formulas that can extract specific texts from a cell. The example below references data from the table above.
- Extracting the Drug Name
To extract the first word from the text string in cell A2, you can use the REGEXEXTRACT formula:
Formula in B2: =REGEXEXTRACT(A2, "\w+")
This formula extracts the first word, which in our case is the drug name, “Pantoprazole.”
- Extracting the Dosage (First Number)
To extract the dosage from the text, you can use the following REGEXEXTRACT formula:
Formula in C2: =REGEXEXTRACT(A2, "(\d+)")
This formula extracts the first number it finds, which is 40 in “Pantoprazole 40 mg Tabs.”
However, sometimes the dosage may appear multiple times, such as “Domperidone 30 mg + Pantoprazole 40 mg.” In such cases, we need a formula that extracts all the dosages into separate cells.
Final Formula in C2 for Multiple Dosages: =ArrayFormula(REGEXEXTRACT(SPLIT(A2, "+"), "(\d+) mg"))
This formula first splits the text using the “+” symbol as a delimiter and then applies REGEXEXTRACT to pull out any numbers followed by " mg"
. It returns the first dosage in C2 and the second dosage in D2.
- Extracting the Medicine Form (Tabs, Caps, Inj.)
To extract whether the medicine is a tablet, capsule, or injection, you can use the REGEXEXTRACT function:
Formula in E2: =REGEXEXTRACT(A2,"[\w.]+\z")
This formula extracts the last alphanumeric sequence (including dots) from the end of the text in cell A2.
Now, you can select the range B2:E2 and drag the fill handle from E2 down to E4 to apply the formula to the other rows.
Alternative Methods for Different Text Structures
Your text string might follow a different structure, and the formulas above may not always be the best fit. In such cases, you can combine other functions like FIND, LEN, and MID to extract specific parts of the text from cells.
For example:
- Use FIND to locate the position of specific text within a string.
- Use LEN to determine the total length of a string or a specific substring.
- Use MID to extract a substring starting from a specific position.
If you’re interested in learning more about regular expression patterns, feel free to check out this resource: RE2 Syntax on GitHub.
Conclusion
By using functions like REGEXEXTRACT, SPLIT, RIGHT, and more, you can extract specific text from cells in Google Sheets to streamline your workflow and gain more control over your data. Depending on the structure of your text, you may need to use a combination of functions to get the exact result you need.
Dear Prasanth,
I would like to extract all words with capital letters from what is in cell A1. Let us say that the text in cell A1 is the below:
MR SMITH: Hello my name is MS. JONES and the great TOM LOVE"
Can one extract
MR SMITH
,MS. JONES
, andTOM LOVE
into separate columns?Hi, Jason,
I have this formula to extract uppercase strings. But it partially does the job.
=split(regexreplace(A1,"[a-z:]+",)," ",0)
Please note, in the formula, there are two spaces within double quotation marks. Please modify like that when you copy-paste the above formula into your sheet.
Hi Prasanth.
I was able to extract the first year using your
REGEXEXTRACT(B10,"[0-9/]+")
with this text in B10: “name, President of city (1996–2006)”Result: 1996
But now I would like to extract the second year. Do you know how I can do that?
Hi, Jason,
Try this one.
=REGEXEXTRACT(B10,"\–([\d\.]+)\)")
Replace the
–
in the formula with the one that copied from your text itself.Another related question
How to extract the date from the text (when the date could have an unknown number of digits. This grabs the first digit.
=REGEXEXTRACT(B10,"\d+")
6/30/97: “Press Remarks with contry_name Foreign Minister name_of_minister, City, Country”
Hi, Jason,
This would help.
=REGEXEXTRACT(B10,"[0-9/]+")
I have a cell with data posted as follows
D
L
W
W
I want it to show D L W W
Is there a way to do this?
Hi, Chris Durell
Assume cell B6 contains the said characters. I guess the characters are separated by New Lines. Either clean them or replace them with space characters.
To clean the New Line characters, use the following CLEAN formula.
=clean(B6)
To replace New Line characters with space characters, use the SUBSTITUTE formula.
=SUBSTITUTE(B6,char(10)," ")
Related: How to Move New Lines in a Cell to Columns in Google Sheets.
This is super helpful! Another question though, what if you wanted to extract: “Pantoprazole 40 mg” from A10?
Thanks!
Hi, Amanda,
You can use any of the below Regex.
This Regex formula extracts phrase after the “+” sign.
=trim(REGEXEXTRACT(A10, "\+(.+)"))
This Regex is to extract the first three whole words after the “+” sign.
=trim(regexextract(REGEXEXTRACT(A10, "\+(.+)"),"[\w]* [\w]* [\w]* [\w]*"))
VERY helpful. I have the following text in A10:
“Press Remarks with contry_name Foreign Minister name_of_minister, City, Country”
Note: Sample text modified by admin.
But I get an error when I use the above command
=trim(regexextract(REGEXEXTRACT(A10, "\+(.+)"),"[\w]* [\w]* [\w]* [\w]*"))
Hi, Jason,
You haven’t mentioned what you want to extract!