Skip to main content

Last Updated September 6, 2022

As a marketer, you’re often faced with lots of data that needs cleaning up, whether that’s from Google Analytics, Search Console or any of the other hundreds of platforms and services we work with every day. This can be done manually, but there are specific formulas you can employ to speed things up, make your day easier and get more done.

Being able to extract text after a specific character in Google Sheets is a very useful skill that can save you time, by allowing you to combine specific elements for data analysis, or for building out campaigns (ad copy templates, keyword templates etc), or reporting.

Some examples could include: 

  • Extract the domain portion of a URL in order to count unique domains
  • Extract a portion of a URL between occurrences of “/” in order to build out ad copy for your Google Ads campaigns
  • Extract the brand names or product categories from the URLs in a competitors sitemap

How to extract text after a character in Google Sheets or Excel

In order to extract the text after a certain character in Google Sheets, we’ll either be employing a combination of the RIGHT, LEN and FIND formulas, or MID, LEN and FIND

In the example below, we’ll be looking to extract the domain name of the email from the email address field. Because each domain name varies in length, we won’t know the exact number of characters to take from the right-hand side, and therefore where in the string to start. We overcome this by combining RIGHT with the LEN function. This solution works when you have a consistent character across the rows of data you’re looking at. In this instance, that character is the @ symbol.


= RIGHT(C2, LEN(C2) - FIND ("@",C2))



Here’s how to do it

  1. RIGHT extracts text from the right-hand side of the cell. We need to let it know how many characters from the right to extract. As the domain portion of the email address is not a fixed length, we need to work with other formulas to allow this value to be static. We open the formula with RIGHT, and both other formulas go inside it, telling it the [number_of_characters] to extract

  1. LEN calculates the total length of the field, e.g. the number of characters, including spaces. In this example, cell C1 the total length of the field is 23

  1. FIND then brings everything together by finding the position in the string of the chosen character. As we want to extract the domain portion of the email address, we need a consistent character across each one. We can therefore use the @ character. We’ll then minus the position in the string of @ from the total length in turn calculating the [number_of_characters] to extract from the right-hand side. In cell C1, the @ is at position 10.


Bringing this all together then, the LEN formula calculates the length (23), minus the position of the @ with FIND (10), leaves us with 13 characters to place into the RIGHT formula, and the result of “moonfruit.com”.

This approach works when there is a consistent character across the rows you’re looking at. If this character varies across the rows, you’ll need to employ a more advanced formula called REGEXEXTRACT, explained below

How to extract text after a pattern using Regex in Google Sheets

However, sometimes you may be analysing manually captured data with poor data hygiene, and so with inconsistent characters in your rows, so you need a pattern which captures multiple variations. This is where REGEX EXTRACT comes in handy.

= RIGHT (C2, LEN(C2) - FIND(REGEXEXTRACT(C2,”#|%"), C2)))


If there are one or more patterns that could be used, then a solution could be to combine RIGHT, LEN and FIND with REGEXMATCH. Regex allows you to match multiple patterns, and in this example, find characters at the right-hand side of the cell whether they match your chosen patterns

=REGEXEXTRACT (text, regular_expression)


Text – the text to be tested against the regular expression


Regular expression – the regular expression to test


Within Google Sheets, the regular_expression must be wrapped in quotation marks, as above


Here’s how to do it  

  1. Firstly, test your regex pattern to ensure it is working across all of the rows you want to test

  1. Place this within RIGHT and LEN in order to calculate the number of characters from the right to extract


Regex Extract then matches the position of either the # or the % and this is used within the FIND function

Some other solutions that you may find useful when working with text in Google Sheets 

How to extract text before a character in Google Sheets or Excel

Similarly, in order to extract the text before a certain character in Google Sheets, we’ll be employing a combination of the LEFT and FIND formulas. In this example, we’ll be looking to extract the email address without the domain portion, so everything to the left of the @ symbol.

= LEFT (C2, FIND("@",C2)-1)

Here’s how to do it

  1. LEFT extracts characters from the left-hand side of the cell. However, because the length of the name portion of the email address is not consistent, we need to combine LEFT with FIND to identify the exact position to finish at each time

  1. FIND looks for the first position of the matching character within the cell and therefore tells us the number of characters to extract

  1. By combining LEFT with FIND, you tell the LEFT function how many characters from the left to extract, as the number of characters will not be static across the rows. We need to use minus 1, as the FIND formula includes the @ within the calculation, and we need to start 1 before this to only capture the name portion

How to extract text between n’th occurrence’s of a character in Google Sheets or Excel

=substitute(mid(A2,find(char(160),substitute(A2,"/",char(160),3))+1,find(char(160),substitute(A2,"/",char(160),4))-B2-1),"-","

Sometimes it won’t be a specific character that you want to extract text either before, between, or after, but it will be between two specific occurrences of the same character. Such is the case when working with URLs.

As a marketer this can be particularly useful for competitor analysis when snooping on competitors sitemaps to see which brands or categories they’re ranging.

The below example is taken from a small portion of the sitemap of ASOS.com, where we have 6 brands that we want to extract from the URL. If we took the whole sitemap, we could be left with thousands, even hundreds of thousands of URLs.

Say we wanted to understand which brands were ranged on a competitor’s site? One way we could achieve this is by extracting the brand portion of the URL from the competitor’s sitemap. In this case, we would need to extract “reebok”, which is between the third and fourth occurrence of the forward slash. We need to use the forward slash because the length of each brand will differ so the character position will change. Due to the structure of the site, this will be the same for all of the brands.


Here we can see the workings of how to extract the brand name from the url, which makes use of MID, FIND, CHAR and SUBSTITUTE  to find the position of the n’th occurence of /.


Here’s how to do it

  1. Find the position of the third “/”

  1. Find the position of the fourth  “/”

  1. Place the two find formulas inside a MID formula to find the starting position of the brand name, and its length, by subtracting position 4 from position 3.

  1. Finally, wrap the formula in SUBSTITUTE to remove the hyphens and add a space instead

Summary

Being able to work with and manipulate text in Google Sheets is an extremely useful skill to learn, especially when working with lots of different marketing platforms. Most problems when working with text in Google Sheets can be solved by learning how to use and combine RIGHT, LEFT, LEN, FIND, MID, SUBSTITUTE and REGEX formulas as needed. So learn these and you should be fine in most situations.

We’ve shown how to extract text after a character in Google Sheets and Excel, and covered some of the common problems you may face as a marketer when extracting and manipulating text. We’ve also covered extracting text before a character, and using Regex when there’s no consistent character across your rows.