How to Extract Multiple Words Using REGEXEXTRACT in Google Sheets

Published on

We can use the REGEXEXTRACT function in Google Sheets to extract multiple words from a cell and put them into a row or column.

I don’t think it’s necessary to explain the purpose of this text extraction, as it varies depending on your job. However, here’s an example of a scenario where it might be useful.

I want to search a cell containing a list of country names and extract some of the countries if they exist in the list.

These are the names of some of the top countries where the blog Info Inspired has a good presence.

With the help of the REGEXEXTRACT function, I am extracting all of the matching words in a sentence, or comma-separated list, in a cell in Google Sheets.

We can use nested REGEXEXTRACT formulas (as shown in Figure 1 below) or a LAMBDA-based REGEXEXTRACT function (as shown in Figure 3 at the bottom) for this.

extract multiple matching words in a cell
Figure 1

Nested formulas are not ideal in all cases because they can make the formula lengthier and more difficult to understand. So, you would definitely want to try the LAMBDA-based formula instead.

Steps to Extract Multiple Words Using REGEXEXTRACT in Google Sheets

Syntax of the REGEXEXTRACT Function:

REGEXEXTRACT(text, regular_expression)

You can extract a single word from a cell by using the REGEXEXTRACT formula. For example, suppose cell A1 has the following country list:

India, United States, United Kingdom, Canada, Philippines, Australia, France, Brazil, Netherlands, Malaysia, Germany, Indonesia, Pakistan, South Africa, Spain, Singapore, Italy, Mexico, Sweden, Peru, Israel, Hong Kong, Vietnam, Bangladesh, Nigeria

If you want to extract the country name “Hong Kong,” you can use the following formula:

=REGEXEXTRACT(A1,"(?i)Hong Kong")

But don’t just stop there! You can make this REGEX formula behave like a logical test.

That means you can test cell A1 for the word “Hong Kong” and return the word if it is found, or else return another value.

This type of test is necessary. Otherwise, the REGEXEXTRACT function would sometimes return a #NA error.

Here is that formula, where we are making use of the IFERROR function:

=IFERROR(REGEXEXTRACT(A1,"(?i)Hong Kong"),"Not Available")

So, always remember to use the IFERROR function or the IFNA function together with REGEX in such scenarios.

Now, let us go back to our topic, which is how to extract multiple words using REGEXEXTRACT in Google Sheets.

Here, we can follow two approaches. If the number of words to extract is limited to 2-3 words, we can use a nested REGEXEXTRACT formula or a LAMBDA-based formula.

1. Nested REGEXEXTRACT Formula

The following formula extracts multiple words using REGEXEXTRACT in nested form, and the functions are separated by commas and placed within Curly Braces.

=IFERROR(
     {
     REGEXEXTRACT(A1,"(?i)Singapore"),
     REGEXEXTRACT(A1,"(?i)Italy"),
     REGEXEXTRACT(A1,"(?i)Brazil")
     }
)

The result would be as follows.

result showing multiple regex extract as an array
Figure 2

If the formula finds any mismatch, the result will have one or more blank cells.

For example, if cell A1 does not contain the word “Italy”, then cell D1 will be blank.

In our main formula (please refer to Figure 1 above), I have simply used the TRANSPOSE function to return the output into a column.

=TRANSPOSE(
     IFERROR(
          {
          REGEXEXTRACT(A1,"(?i)Singapore"),
          REGEXEXTRACT(A1,"(?i)Italy"),
          REGEXEXTRACT(A1,"(?i)Brazil")
          }
     )
)

You can use the QUERY function to remove blank cells and limit the number of values to a specific number (n).

=QUERY(
     TRANSPOSE(
          IFERROR(
               {
               REGEXEXTRACT(A1,"(?i)Singapore"),
               REGEXEXTRACT(A1,"(?i)Italy"),
               REGEXEXTRACT(A1,"(?i)Brazil")
               }
          )
     ),
"Select * where Col1<>'' Limit 3"
)

Here, the LIMIT clause in the last part controls the number of items to be returned. It’s 3 in the above formula.

2. Extract Multiple Words from a Cell in Google Sheets Using Lambda-Based REGEXEXTRACT

Syntax of the MAP Function:

MAP(array1, [array2, …], lambda)

If you want to extract a large number of matching words from a cell, you cannot use the nested REGEXEXTRACT formula in Google Sheets. The formula will become too large and difficult to manage.

Instead, you can use a MAP (LAMBDA helper function) and REGEXEXTRACT combo. This will allow you to extract multiple matching words without having to create a large and complex formula.

=LET(
     range,A1,
     value,{"Singapore","Italy","Brazil"},
     TOCOL(
          MAP(
               TOCOL(value,1),
               LAMBDA(r,REGEXEXTRACT(TEXTJOIN(" ",TRUE,range),"(?i)"&r))
          ),3
     )
)

Where:

  • Cell A1 contains the list of words to extract.
  • The array {"Singapore", "Italy", "Brazil"} contains the countries (values) to match in the list.

To match more values, you can insert them within the array (Curly Brackets) separated by commas.

This formula is flexible enough to:

  • Use a cell range like A1:A instead of a single cell like A1.
  • Match more values by placing them as a cell range like C1:C instead of a literal array, i.e., {"Singapore", "Italy", "Brazil"}.

Here is an example:

=LET(
     range,A1:A,
     value,C1:C,
     TOCOL(
          MAP(
               TOCOL(value,1),
               LAMBDA(r,REGEXEXTRACT(TEXTJOIN(" ",TRUE,range),"(?i)"&r)
          )
     ),3
     )
)
Extract Multiple Words in Google Sheets Using Lambda-Based REGEXEXTRACT
Figure 3

That’s all I have to say about how to extract multiple words using REGEXEXTRACT in Google Sheets.

Prashanth KV
Prashanth KV
Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

Highlight Upcoming Birthdays in Google Sheets

When highlighting upcoming birthdays in Google Sheets, one important aspect is ignoring the year...

Compare Two Tables for Differences in Excel

To compare two tables with similar data for differences, you can use the XLOOKUP...

Calculate Weighted Average in Pivot Table in Google Sheets

You can calculate a weighted average in a Pivot Table using a custom formula...

Summarize Data and Keep the Last Record in Google Sheets

In Google Sheets, we can summarize data (like sum, average, min, max, count) using...

More like this

Highlight Upcoming Birthdays in Google Sheets

When highlighting upcoming birthdays in Google Sheets, one important aspect is ignoring the year...

Calculate Weighted Average in Pivot Table in Google Sheets

You can calculate a weighted average in a Pivot Table using a custom formula...

Summarize Data and Keep the Last Record in Google Sheets

In Google Sheets, we can summarize data (like sum, average, min, max, count) using...

17 COMMENTS

  1. Hi Prashanth,

    What if we needed to extract multiple or all occurrences of given patterns?

    For example, consider these three patterns:

    REGEXEXTRACT(A1:A,"(?i)(\(.*?\))"),
    REGEXEXTRACT(A1:A,"(?i)(\[.*?\])"),
    REGEXEXTRACT(A1:A,"(?i)(Cf\.\s.*?\.(\s|$))")

    I tried the following formula, but it only extracts the first occurrence of each pattern:

    =ArrayFormula(IF(REGEXMATCH(A1:A, "See Below"), "", IFERROR({
    REGEXEXTRACT(A1:A, "(?i)(\(.*?\))"),
    REGEXEXTRACT(A1:A, "(?i)(\[.*?\])"),
    REGEXEXTRACT(A1:A, "(?i)(Cf\.s.*?\.(\s|$))")
    })))

    • Hi Frank,

      It seems that extracting multiple occurrences of GIVEN PATTERNS with regex is not possible.

      You can try the following formula for the value in cell A2.

      =LET(test,TOCOL(SPLIT(
      SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,
      "(","|("),")",")|"),"[","|["),"]","]|"),"Cf","|Cf"),"*. ","*.|"),"|")),
      JOIN(" ",FILTER(test,(LEFT(test,1)="(")+(LEFT(test,2)="Cf")+(LEFT(test,1)="["))))

      If you want to spill this formula down, you may need to use the MAP function as follows:

      =IFERROR(MAP(A2:A,LAMBDA(row,LET(test,TOCOL(SPLIT(
      SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(row,
      "(","|("),")",")|"),"[","|["),"]","]|"),"Cf","|Cf"),"*. ","*.|"),"|")),
      JOIN(" ",FILTER(test,(LEFT(test,1)="(")+(LEFT(test,2)="Cf")+(LEFT(test,1)="[")))))))

  2. Is there a fix for the error “Calculation limit was reached while trying to compute this formula”?

    Using:

    =LET(range,A27:A8269,value,B27:B212,
    MAP(range,LAMBDA(range,TOROW(
    MAP(TOCOL(value,1),LAMBDA(r,REGEXEXTRACT(range,"(?i)"&r))),3))))

  3. Hi Prashanth,

    It works great! Thank you!

    And it’s all dynamic, as a bonus.

    I’ll try to figure out how it works as soon as possible, with all those new functions.

    Thanks again!

    • Hi Frank,

      Thanks for sharing your sample sheet. I couldn’t enter my formula because I only have view access.

      I understand that you want to extract multiple values from A1:A matching values in B1:B. But place the extracted values in the corresponding row not as a whole list.

      This formula in C1 will match B1:B in A1 and extract the matching values in C1:1, then match B1:B in A2 and extract the matching values in C2:2 and so on.

      =LET(range,A1:A3,value,B1:B,
      MAP(range,LAMBDA(range,TOROW(
      MAP(TOCOL(value,1),LAMBDA(r,REGEXEXTRACT(range,"(?i)"&r))),3))))

      I hope this is helpful!

  4. What if I have more than 100 words over 1000+ cells with multiple occurrences per cell (most between 2 and 6) to extract? Don’t you have a simpler way than hardcoding the words reference strings? How about a solution using cell references? And possibly using a range?

    • Hi Frank,

      Thanks for your response.

      At the time of writing this tutorial, I had no option to do that. We are now armed with the LAMBDA functions. So we can use MAP or BYROW for that. I’ve updated this post to include an example.

  5. Hi,

    I have one issue with REGEXEXTRACT. I want to match and extract the exact word.

    If I use the below formula, it will display the result “Sing.” But I want it to return “NA” because your A1 cell “Sing” word is not there.

    =regexextract(A1,"(?i)Sing")

    So, how can we extract the exact match instead of the partial match?

    Thanks & Regards,

    Vineet

  6. For the final query function, is the syntax off? Because when I try and use it for myself, it leaves me with a parse error.

    I’m not sure about the end with a quotation without a starting quotation.

  7. I do have the start time 10 PM and end time 2 AM. I need to extract only numbers and the required time duration.

    Thanks in Advance.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.