Three Regex Functions in Google Sheets
Continuing on with my series on the many marketing uses of regex (aka regular expressions), in this post I dive into the three regex functions in Google Sheets:
- REGEXMATCH()
- REGEXEXTRACT()
- REGEXREPLACE()
Follow Along
If you want to follow along with the data I use, you can make a copy of the Google Workbook I used and practice.
Breaking It Down
There are a lot of moving parts with these functions, so I’m not going to attempt to capture everything in the video here. I cover more examples in the video and the Google Sheet. But I’ll provide a high-level summary of the functions for more advanced users. If you’re new to regular expressions, you’ll want to start with my comprehensive guide to regex for marketers. I also created a guide for using regex in Google Analytics as well as Google Search Console.
REGEXMATCH()
Syntax
The basic syntax of the function is:
REGEXMATCH(text, regular_expression)
Example
= IF(REGEXMATCH(C2,"medium\.com"),"Medium Domain", IF(REGEXMATCH(C2,"^https?://medium\."),"Medium Subdomain", "Main Domain"))
Explanation
Think of the REGEXMATCH() function as a contains filter but with more options. It’s a fan fave for creating segments on the fly.
To wit, in the example I used in the video, I created a contrived column I called Domain Strategy, which identified if a URL was on the Medium main domain, a Medium subdomain, or one of the main domains. You’re first letting Google Sheets know which cell to look in (e.g., C2) and then what pattern you want to match. I wrapped my REGEXMATCH() functions in IF() functions to assign the label for each pattern match. Pretty straightforward.
Caveat
One important note with these regex functions is it isn’t entirely intuitive when you first start out to put your regex in quotation marks, but it’s required because it’s a string.
REGEXEXTRACT()
Syntax
The basic syntax of the function is:
REGEXEXTRACT(text, regular_expression)
Example
=REGEXEXTRACT(C2,"https?://([^/]+)/")
Explanation
The REGEXEXTRACT() function in Google Sheets allows you to extract a substring from a string based on a regular expression. The parentheses in my function take advantage of something called a capture group. They identify the pattern you want to match and extract. Although you don’t need capture groups for simple extractions, I almost always need them in marketing tasks. They will also come into play in REGEXREPLACE() functions, so it’s good to get comfortable with them.
What you’re doing with your capture group—or groups if you want to match more than one pattern—is identifying what you want to keep and what you want to leave behind. Imagine those parentheses as being a lifeboat, and you need to decide what you want to take with you if your ship goes down. 🌊🚣♀️ In my example, I only wanted to extract the subdomain, so I didn’t want to include the “https://” or anything after the third /.
So what does that regex communicate? The “[^/]” says there can be any character except a /. If we don’t exclude it, our regex will match the entire URL after the subdomain. We don’t want that. The “+” just communicates that there must be at least one more character, sans forward slashes. The next / isn’t actually required; I just think it makes the function a little more readable.
REGEXREPLACE()
Syntax
The basic syntax of the function is:
REGEXREPLACE(text, regular_expression, replacement)
Example
=REGEXREPLACE(B2,"(\w+), (\w+)","$2 $1")
Explanation
The REGEXREPLACE() function allows you to replace parts of a string that match a regular expression pattern with a specified replacement string. In the first example in my sample workbook, I needed to switch the last and first name and drop the comma. So I used capture groups to identify that there’s a group of letters followed by a comma followed by another group of letters. Then it assigns what’s essentially an index (that starts at 1, not 0) to each of the capture groups. So the “$2 $1” just communicates, “Put the second capture group in front of the first and separate them with a space.” Because the comma isn’t included in a capture group, it’s left behind.
Summary
Learning to rock regex in the context of data gathering and cleaning is a game changer and will significantly improve your ability to add contrived columns to your data or clean up your data without needing to manually go through your data to make those fixes. If you’re new to using regex in Google Sheets, I highly recommend you use the data I provide to practice. It all makes sense until you actually have a project that requires one of these functions. 🙃
Leave a Reply