How to Use REGEXEXTRACT Function in Google Sheets

Published on

The purpose of the REGEXEXTRACT function in Google Sheets is to extract the first matching substring from a string based on a given pattern (regular expression).

There are three built-in REGEX functions in Google Sheets:

  1. REGEXEXTRACT: Extracts a substring
  2. REGEXMATCH: Matches a substring
  3. REGEXREPLACE: Replaces a substring

These functions share a similar use, as they all rely on RE2 regular expressions. Therefore, understanding one of these functions is usually sufficient to comprehend the others.

Now, let’s delve into how to use the REGEXEXTRACT function in Google Sheets. We’ll start with the syntax and then explore the most frequently used RE2 expressions.

Syntax and Arguments

Syntax:

REGEXEXTRACT(text, regular_expression)

Arguments:

  • text: The text or cell reference containing the text from which to extract a substring.
  • regular_expression: The regular expression pattern used to match and extract the substring from the text.

It’s important to note that the REGEXEXTRACT function operates exclusively on text values. If the extracted value from the text is numeric, you can use the VALUE function to convert it to a numeric format.

Examples of REGEXEXTRACT Function in Google Sheets

1. Extracting Specific Text

  • Cell value in A2 is “Info Inspired 2017, India”
  • Formula: =REGEXEXTRACT(A2, "Info Inspired")
  • Result: “Info Inspired”

The formula extracts the text “Info Inspired” exactly as it appears in the cell, considering case sensitivity. To make it case-insensitive, you can use the following pattern: "(?i)Info Inspired"

If the formula fails to match the pattern, it will return an #N/A error. To handle this, you can wrap the REGEXEXTRACT function with the IFNA function to replace that error.

The case sensitivity and IFNA apply to the formulas below as well.

2. Extracting Either Text

  • Cell value in A3 is “Info Inspired 2017, India”
  • Formula: =REGEXEXTRACT(A3, "US|India")
  • Result: India

The pattern uses the pipe symbol (|) to separate each text option. To add one more country, follow this pattern: "US|India|UK"

For case-insensitive extraction of either substring, you can use the pattern "(?i)US|India".

3. Extracting Fixed-Length Substrings

  • Cell value in A4 is “Info Inspired 2017”
  • Formula: =REGEXEXTRACT(A4, "....")
  • Result: Info
  • Formula: =REGEXEXTRACT(A4, "In..")
  • Result: Info

In the above pattern used in the REGEXEXTRACT function, the . (period) matches any single character.

The first formula extracts the first four characters from the text. When you want to extract a certain number of characters from a specific position or from the end, you can adjust the regular expression pattern accordingly. For instance:

Formula: =REGEXEXTRACT(A4, "....\z")

Result: 2017

This formula extracts the last four characters from the text in cell A4.

    4. Extracting Numeric Sequence

    • Cell value in A6 is “Info Inspired 2017 Blog”
    • Formula: =REGEXEXTRACT(A6, "\d")
    • Result: 2
    • Formula: =REGEXEXTRACT(A6, "\d+")
    • Result: 2017

    In this pattern, “\d” matches a single digit (0-9), while “\d+” matches one or more consecutive digits (0-9).

    5. Extracting Alphanumeric Sequence

    • Cell value in A8 is “Info Inspired 2017 Blog”
    • Formula: =REGEXEXTRACT(A8, "(\w)")
    • Result: I
    • Formula: =REGEXEXTRACT(A8, "(\w+)")
    • Result: Info

    You can use the following REGEXEXTRACT formulas to extract the last sequence of consecutive word characters from the content of cell A8.

    =REGEXEXTRACT(A8, "(\w+)$") // will return "Blog"
    =REGEXEXTRACT(A8, "(\w+)\z") // will return "Blog"
    Examples of REGEXEXTRACT Function in Google Sheets - Set #1

    6. Extracting Word and Number Combination

    • Cell value in A8 is “Info Inspired 2017 Blog”
    • Formula: =REGEXEXTRACT(A8, "(\w*)\s(\d+)")
    • Result: Inspired (in one cell), 2017 (in another, horizontally)

    7. Extracting Whole Word Based on Partial Match

    • Cell value in A9 is “Google Sheets Tutorials:”
    • Formula: =REGEXEXTRACT(A9,"Tut\w+")
    • Result: Tutorials
      • Extracting Text Matching “Tut” Followed by Alphanumeric Characters
    • Formula: =REGEXEXTRACT(A9,"Tut.*")
    • Result: Tutorials:
      • Extracting Text Matching “Tut” followed by Any Characters

    8. Extracting Lowercase Alphabetic Sequence

    • Cell value in A11 is “INFO info”
    • Formula: =REGEXEXTRACT(A11, "[a-z]")
    • Result: i
    • Formula: =REGEXEXTRACT(A11, "[a-z]+")
    • Result: info

    9. Extracting Uppercase Alphabetic Sequences

    • Cell value in A11 is “INFO info”
    • Formula: =REGEXEXTRACT(A11, "[A-Z]")
    • Result: I
    • Formula: =REGEXEXTRACT(A11, "[A-Z]+")
    • Result: INFO

    10. Extracting Content between Certain Characters

    • Cell value in A15 is “Name: [ Prashanth KV]”.
    • Formula: =REGEXEXTRACT(A15, "\[(.+)\]")
    • Result: Prashanth KV
    • Cell value in A16 is “email me <info@temp.com>”
    • Formula: =REGEXEXTRACT(A24, "<(.+)>")
    • Result: info@temp.com
    Examples of REGEXEXTRACT Function in Google Sheets - Set #2

    Resources

    Above, we’ve explored numerous examples of extracting substrings from strings using the REGEXEXTRACT function in Google Sheets. Below, you’ll find specific tutorials that demonstrate slicing strings.

    1. Extract Username from Email Address Using Regex in Google Sheets
    2. How to Extract Multiple Words Using Regexextract in Google Sheets
    3. Extract All Numbers from the Text and SUM It in Google Sheets
    4. Extract Different Texts From a Cell in Google Sheets
    5. Extract Vowels and Consonants Separately in Google Sheets
    6. Extract Unique Values from a Comma Separated List in Google Sheets
    7. Extract Numbers Within Square Brackets in Each Row in Google Sheets
    8. Extract, Replace, Match Nth Occurrence of a String or Number in Google Sheets
    9. Extract Every Nth Line from Multi-Line Cells in Google Sheets
    10. Extract Numbers Prefixed by Currency Signs from a String in Google Sheets
    11. How to Match | Extract Nth Word in a Line in Google Sheets
    12. Extract First, Last, and Middle Names in Google Sheets (Formula Options)
    13. Extract Last N Values from a Delimiter Separated String in Google Sheets
    14. How to Extract Decimal Part of a Number in Google Sheets
    15. Extract the First Letter of Each Word in Google Sheets – Array Formulas
    16. How to Extract Negative Numbers from Text Strings in Google Sheets
    Prashanth KV
    Prashanth KV
    Your Trusted Google Sheets and Excel Guide Prashanth KV brings a wealth of experience in Google Sheets and Excel, cultivated through years of work with multinational corporations in Mumbai and Dubai. As a recognized Google Product Expert in Docs Editors, Prashanth shares his expertise through insightful blogging since 2012. Explore his blog for practical tips and guidance on maximizing your spreadsheet skills.

    Finding Most Frequent Text in Excel with Dynamic Array Formulas

    Looking to identify the most frequently occurring text in Excel? You can do this...

    Hierarchical Numbering Sequences in Excel

    Creating hierarchical numbering sequences in an Excel spreadsheet can significantly improve the way you...

    How to Easily Repeat a Sequence of Numbers in Excel

    Excel offers multiple ways to accomplish tasks, and the simplicity of each approach depends...

    Create a Sequence of Dates at Every Nth Row in Excel (Dynamic Array)

    Would you like to create a sequence of dates in every nth row in...

    More like this

    XMATCH Row by Row: Finding Values Across a Range in Google Sheets

    Using the BYROW function with XMATCH in Google Sheets allows us to match values...

    Limit Formula Expansion to a Specific Row in Google Sheets

    In this tutorial, I’ll explain how to limit the expansion of an array formula...

    3-D Referencing Structured Data Tables in Google Sheets

    When you have several tables within a single sheet—not across multiple sheets in a...

    68 COMMENTS

    1. I use your site often to get answers for various Google Sheets functionality and find it very helpful. Have a question about the Regexextract function for the following situation

      I want to extract my and other employees’ names, which are of different lengths, from the following format.

      “NEFT-P15031008433074-1-Jonathan Lance”

      I can extract the date after the first “-” (hyphen).

      How do I set up the Regexextract formula to extract the employee’s name after the third hyphen? Thanks.

      Regards,
      Jonathan

    2. Hi, I would like to extract everything after the first dash but before a word.

      Thanks for any help you can provide. I’ve given you editing privileges with the URL below.

      —URL removed by admin—

      • Hi, Spencer,

        Thanks for sharing your sheet with a clear example. Please find the array version of the following formula there.

        =regexextract(regexreplace(A2,"^(?:.*?word_here)","$0@"), "\-(.*)\@")

    3. Hello Prashanth,

      I have 3 lines(ctrl+enter) in one cell:

      “D1ACA
      Cu 4×10/10
      Max 63A”

      I want to extract line 1 (D1ACA) to one cell. 2 to another cell … and so on. Can you help me?

      Thanks in advance from Sweden

    4. Hi,

      Let’s say I would like to pull the month digit out from a date like this>> 23/04/2021.

      May I know how should I form my formula?

      • Hi, Alex Kwan,

        If the date in question is formatted as a text value, then use the below Regex.

        =REGEXEXTRACT(to_text(E3), "\/(.+)\/")

        If the date is actually a date formatted value, the below Month function may help.

        =month(E3)

        You can check the format using =ISDATE(E3)

    5. Thank you. this has partially solved the issue but also created a new issue. If I share a google sheet with you so you can see the cells I’m working with – is that possible? Also, I’m curious, what the “fishes” are in the formula. I’ve never seen that before.

      • Hi, Jesse Yuen,

        The fish is just a delimiter (character code). Instead, you can use the available symbols (characters) on your keyboard such as a ~, | etc.

        Feel free to share the link via reply.

    6. Hi – I’m trying to pull Instagram handles out of a google sheet. this formula: =REGEXEXTRACT(C5,"\@[A-Za-z0-9]+") works for handles that begin with a number or a letter, but it doesn’t work in the following scenarios:

      1. If there are multiple handles in a cell (e.g. @abc and @123), the formula only captures 1 handle.

      2. if the handle includes any punctuation like a “.” or “-” (those are the most common), the formula captures up to the “.” or “-“. I tried this: =REGEXEXTRACT(C4,"\@[A-Za-z0-9]+\.[A-Za-z0-9]+"), but this only works if a handle has “.”, otherwise it returns “#N/A”

      3. If the handle begins with a “.” or “-“, the formula creates an error.

      I’m hoping that I can use 1 formula for all these scenarios, but I have listed the scenarios in priority so I would be happy if 1 formula could solve for the base + scenario 1.

      • Hi, Jesse Yuen,

        This combo formula may help.

        =textjoin(", ",1,filter(split(REGEXreplace(C5,"(\@[\w]+)","🐠$0🐠"),"🐠"),search("@",split(REGEXreplace(C5,"(\@[\w]+)","🐠$0🐠"),"🐠"))))

    7. I really dig your tutorials as you don’t spoon-feed everything to your readers but make them figure some things out on their own via the examples. Great way to learn if you ask me 🙂

      Now I have this situation at work I need help with 😀

      I have the following info in one cell (without quotes):

      “(ABC) Joe Smith, (DEF) Jane Doe, (GHI) Michael Jordan, (ABC) Wayne Gretzky”

      where the letters in parentheses are their job codes

      The result I’m looking to output in another cell is this (without quotes):

      “Joe Smith, Wayne Gretzky”

      where I want to extract and display the names of only those with the job code (ABC) and disregard the rest.

      Hoping you can help Prasanth! Thanks.

      • Hi, Arthur,

        Assume the names to extract without designations are in cell A1. Then you can try the following formula in cell B1.

        =textjoin(",",1,filter(split(trim(REGEXREPLACE(A1, "\(([a-zA-Z]+)\)","")),","),search("(ABC)",split(A1,","))))

        Another formula as an alternative.

        =ArrayFormula(trim(textjoin(",",1,regexreplace(filter(SPLIT(A1,","),search("(ABC)",SPLIT(A1,","))),"\(.*\)",""))))

    8. Hi,

      I would like to extract a date value and price in a specific string.

      I’m creating a trading journal for options. Here is the string.

      Ex: abc201224p00025000

      abc=ticket
      201224 = date (yymmdd)
      p = option strategy (always one letter p or c)
      00025000 = price (always 8 digits) use only first 7 digits to determine price. $25.00.

      Can you help me? I have tried and get stuck with a number and not a date value or price value.

      Thank you,

      Daniel

      • Hi, Daniel,

        The date you can extract and reformat using the below combo formula.

        =date("20"&left(REGEXEXTRACT(A1,"(\d+)"),2),mid(REGEXEXTRACT(A1,"(\d+)"),3,2),right(REGEXEXTRACT(A1,"(\d+)"),2))

        Regarding the price, use the below Regexextract.

        =REGEXEXTRACT(A1,"(\d+)$")

        It would return 00025000. You can further use the Regex or MID function to return the amount from this.

    9. Hi,

      I have to loop thru a range of cells and use REGEXEXTRACT. How do I pass the cell value dynamically to REGEXEXTRACT inside the app script?

      Thanks

    10. Really helpful thank you!

      I have the following:

      2019 Nurture.02 - B2C - BRSM Commerce - Sample: Virtual Booth with Forrester.Email A

      I only need Sample: Virtual Booth with Forrester.Email A what is the formula for this?

      • Hi, Ian Burgers,

        I’ve modified part of your text due to a ‘possible’ SEO issue.

        I could understand that you want to extract the strings after the last hyphen. This may help.

        =REGEXREPLACE(A1, "(.*)\-","")

        To trim any white space, use TRIM with REGEX formula as below.

        =trim(REGEXREPLACE(A1, "(.*)\-",""))

    11. Hi Prashanth,

      Thanks for putting all this useful info together!

      If I have some numbers separated by quotation marks. How can I split them so as to return the number of each group in different cells horizontally?

      For example:

      Input : “2”,”2″,”2″,”2″
      Desired output: 4 cells horizontally with number 2

      Another example:

      Input: “4-0,13″,”1″,”2-3,6″,”3”
      Desired output: 4 cells
      first cel: 4-0,13
      second cell: 1
      third cell: 2-3,6
      last cell: 3

      THANK YOU!

    12. Hi Prashanth,

      First of all, thank you very much for this guide.

      I have the following….. “Communication Services | Telecom Services | USA”

      How would I extract everything up to first | so in the above “Communication Services”

            • hi Prashanth. Many thanks! it works well until I face a more complicated example when we have several “|”.

              For example in “Communication Services | Telecom Services | USA|Germany|Angola”

              REGEXEXTRACT(A1, "\|(.+)\|")

              This returns everything between the first “|” and the last one ( Telecom Services | USA | Germany). Could you please help with advice on how this should be changed to get back “Telecom Services” only???

    13. Hello Prashanth!

      How would I go about extracting a number that duplicated itself in a cell (and it may be worth noting that the number may be different lengths in each cell)?

      For example:
      A1 = 33
      I want to extract just ‘3’

      Or,
      A1 = 123123
      I want to extract just ‘123’

      Thank you in advance.

      • Hi, Nadia,

        This regex-based combo formula will do the job for you.

        =join("",unique(TRANSPOSE(split(REGEXREPLACE(A1&"",".{1}", "$0,"),","))))*1

        But there might be an easier way.

    14. Hi Prashanth,

      What should I do if I want to Extract initials from names?

      Eg: “Rita Ruiz”
      result: “RR”

      Eg: “Don Miguel Rukeyser”
      result: “DMR”

      Eg: “Donald Runnels”
      result: “DR”

      • Hi, Nguyễn Thanh Tuyền,

        You may try this, please.

        =REGEXREPLACE(A2,"[a-z\s]","")

        If the names are in A2:A, then use the below REGEX array formula in B.

        =ArrayFormula(if(len(A2:A),REGEXREPLACE(A2:A,"[a-z\s]",""),))

    15. Thank you, Prashanth. This was really helpful, but I’m still having some trouble.

      I am trying to create a phone log with a tab for each letter of the alphabet. I should note that information comes from a Google Form. Can you provide any guidance? Thank you.

    16. Hi Prashanth,

      For this formula;

      =query(transpose(split(A1,char(10))),"Select * where Col1 contains 'Item'")

      If the item details are in more than one line?

      For example:
      Item: *2 x big
      ball*

      Is there any solution for this problem?

    17. Thanks, its also work 🙂

      =REGEXEXTRACT(clean(SUBSTITUTE(A1,char(10),"~")),"(\d+)~~Congratulations")

      This will returns the number before a specific string. So what should I change if I want to return number after a specific string?

      • It depends on the pattern of value. Here is one example.

        String in cell A1: 10 students scored 500 points

        The formula in cell B1 to return the number present just after the substring “scored”

        =REGEXEXTRACT(A1,"scored (\d+)")

        Result: 500

      • Hi, Sterben,

        Assume the above text with numbers is in cell A1. Then try this Regexextract.

        =REGEXEXTRACT(clean(A1),"(\d+)Congratulations")

        Included the CLEAN function assuming the text entered in multiple lines in cell A1. If not, you can remove the use of it.

          • The new line character makes the issue. Here are two more formulas.

            Option 1: Returns the second number.

            =regexextract(A1,"(?:\D*(\d+)){2}")

            Option 2: Returns the number before a specific string.

            =REGEXEXTRACT(clean(SUBSTITUTE(A1,char(10),"~")),"(\d+)~~Congratulations")

            • Thank you, Prashanth. It work like a charm!

              I have another question.

              Example:
              Service: XXXXX
              Item: *4 × ruler*
              Total price: XXXX

              XXXXX
              XXXXX

              Date: XXXX
              Time: XXXX
              Type: XXXX
              Service: XXXXXX
              Item: *2 ball*
              Total price: XXXXX

              I want to extract both “Item” ‘ s detail

              Result:
              Item: *2 ball*
              Item: *4 × ruler*

              Is it possible?

    18. I have pulled this formula from the web;

      =(regexextract(index(IMPORTHTML("http://finviz.com/quote.ashx?t="&"ggn","table", 11),7,2),"\d{1,}\.\d{1,}") + 0)/12

      As long as I use the stock symbol “ggn” it works fine but when I try to reference “ggn” with the cell reference in the spread sheet it gives me an error. any ideas?

      • Hi, Steve,

        It works perfectly for me.

        =(regexextract(index(IMPORTHTML("http://finviz.com/quote.ashx?t="&A1,"table", 11),7,2),"\d{1,}\.\d{1,}") + 0)/12

        Cell A1 contains the string “ggn” without quotes.

        Best,

    19. Prashanth,

      Thanks for the examples.

      I am trying to extract an options OPRA code into 4 different cells. I cannot use split text because there are no delimiters, and the char or number lengths are never consistent.

      Examples:

      The code has 4 parts: stock symbol, date, C or P, price

      .SCHL191018P22.5 – would split to: SCHL – 191018 – P – 22.5
      .MU190920P46 – would split to MU – 190920 – P – 46
      .NIO19092CP3 – would split to – NIO – 190927 – C – 3

      With the code in A1, the period is not needed, so I can use:=RIGHT(A1, LEN(A5)-1) = SCHL191018P22.5

      With that result in B2, I used =LEFT(B1,SEARCH("19",B1)-1) to get the code of whatever is left of the year 19.

      I don’t think this is a very clean formula since the year will change and I would have to redo the formula every year. Do you have a better suggestion?

      Then I used =REGEXEXTRACT(B1, "[0-9]+") to get the date = 191018. The date is always 6 numbers long. Sheets doesn’t read this as a date, and I have not figured out how to switch the year,month,date around.

      If you have any suggestions on the remaining parts, I would greatly appreciate a point in the right direction.

      !!!EDIT!!!

      As soon I post this, I go back and figure it out…. here is what I came up with:

      =REGEXEXTRACT(A3, "([A-Za-z]+)([0-9]+)([A-Za-z]+)((\d*\.)?\d+)")

      It generates an extra column, but I can just hide it.

        • Yes, it does. Much cleaner, thank you.

          I’ve now moved on to complex option codes, they add a dash or plus, so I can use the SPLIT function.

          Example:

          .AAPL190927C207.5-.AAPL190927C210+.AAPL190927P205-.AAPL190927P202.5

          Formula

          =split(A13, "-|+")

          I wanted to use truncate to expand down rows, but I got errors for the rows below that already had formula data in. So, split will expand out columns instead.

          I still never solved the date problem.

          Using DATE function converts to 1900 rather than 2000, throwing off other calculations:

          W13 cell’s data is “190927”

          =DATE((left(W13,2)),(mid(W13,3,2)),(right(W13,2)))

          This shows 09/27/1919 instead of 09/27/2019. Any ideas?

    20. How would I extract characters between a / and a |? I’m having some trouble?

      Example: electrical/wire-management/cable-ties | Cable Tie Design:Standard

      Also, extract Characters After or Before –like After |

      Any help is appreciated!

      • Hi, Calvin,

        Formula to extract the string between a forward slash and a pipe.

        =REGEXEXTRACT(C2, "\/(.+)\|")

        Since there are multiple forward slashes, replacing the first forward slash with any other character (blank) will give you a more accurate result.

        =REGEXEXTRACT(SUBSTITUTE(C2,"/","",1), "\/(.+)\|")

        To extract the strings before the pipe, use this Regexextract formula.

        =REGEXEXTRACT(C2, "(.+)\|")

        After the pipe, you can try this.

        =REGEXEXTRACT(C2, "\|(.+)")

        Best,

    21. Hi Prashanth,

      I’m trying to extract one or multiple hashtags from a long text cell. I’m able to extract one but I’m not sure how to extract the 2nd, 3rd, 4th, etc. info a row.

      For example, here is my formula:

      =REGEXEXTRACT(D2,"#\w+")

      I also tried this formula:

      =iferror({REGEXEXTRACT(D2,"#\w+"),REGEXEXTRACT(D2,"#\w+")})

      Thanks!

      • Hi, Dontae Mears,

        To extract multiple hashtags, you can possibly use this REGEXREPLACE based formula in Google Sheets.

        =SPLIT(REGEXREPLACE(D2, "(^|\s)[^#]\S*", "")," ")

        Best

    22. Hi I’m trying to regEx a date like this: 2/20/2019, and I’m having trouble with the year: using this did not work REGEXEXTRACT(F2,"\/([0-9]+)"),"") because it extracts the 20

      Any ideas of what I should do?

    23. I have a Google Sheets file that pulls all of my Amazon orders from my Gmail for further processing. I want a second cell that pulls just the total order cost.

      A snip of the email body text in A1:

      Total Before Tax:
      Estimated Tax:
      Order Total:
      $37.61
      $1.92
      $39.53
      We hope to see you again soon.

      I cannot seem to get a REGEXEXTRACT formula to pull the $37.61 figure (I am ok with ignoring the actual total including tax).

    24. Hi, How can I extract characters between @ and . from below email id.
      “corey@g4designhouse.com” OR “accounting@g4designhouse.com”

      Output needed is g4designhouse

      Please help

      • Hi, Shagun Chaudhary,

        To extract the domain name without TLD from email address, I mean the text between “@” and “.”, you can use the below Regexreplace formula.

        =REGEXEXTRACT(A1, "@(.+)\.")

        Replace the cell address.

        Best,

    25. I have the value $1,200.00 at M274

      =REGEXEXTRACT(M274, "[0-9]+")

      only extracts “1”.

      Perhaps your tutorial could use an update? or perhaps I misread something.

    26. The page seems really helpfull. Can you please tell me how to use =REGEXEXTRACT(A22, "\(([A-Za-z]+)\)") if multiple content between special character are to be extracted.

      • Hi, Laura,
        I fear I don’t interpret your question correctly. I guess you want to try something like this.
        Text in Cell A1: “This is a (relatively new tech) blog.”
        Result: “relatively new tech”
        If this is the case, you can use either of the below REGEX formulas.
        Formula 1:
        =REGEXEXTRACT(A1, "\((.+)\)")
        Formula 2:
        =REGEXEXTRACT(A1, "\(([A-Za-z\s]+)\)")

    LEAVE A REPLY

    Please enter your comment!
    Please enter your name here

    This site uses Akismet to reduce spam. Learn how your comment data is processed.