One thing I like to do when doing a competitive analysis for a client is rifle through their backlinks vis-a-vis their competitors’ and reverse engineer their competitors’ marketing strategies. But sometimes that means extracting text from another column of data to organize your data for maximum pivoting. Since this is a common task I do a lot, I’m going to show you how to pull domains out of a list of URLs using a combo meal of the LEFT and SEARCH functions.
First Batter Up: LEFT Function
The LEFT function follows the following structure:
LEFT( text, [number_of_characters] )
text:Β For this you can either put text in quotation marks inside the formula or point to the cell where the text is. Most people choose the latter. It’s much more scalable.
number_of_characters: So anytime you see something in brackets like this, it means that parameter is optional. In the case of the LEFT function, I think it’s kind of silly to make the number of characters optional. If you leave it out, Excel will just grab one character. Can’t really think of many examples that would be useful, but knock yourself out if you find one. If you have a column of data and the text has the same number of characters in each cell, you’re in luck. You can just enter the number of characters.
Most of us aren’t that lucky though, especially Internet marketers who have to do messy tasks like extract the domain from a group of links from different domains. In such occasions the SEARCH or FIND functions are really helpful.
SEARCH vs FIND Functions in Excel
I like SEARCH over find because it’s not case sensitive (and FIND is). Plus, SEARCH allows you to use wildcards and FIND doesn’t. One of my fave trainers, Mike Girvin, did a YouTube video demonstrating the differences between the two. I can only think of one time I needed to use FIND (because I needed it to be case sensitive). But in this case either will do.
Next Up: SEARCH Function
So what the SEARCH and FIND functions do is return the position of a character you search for. This works swimmingly with the LEFT function because you can find a character’s position and return that for the number of characters you want to extract. If you need to stop at the character before the one you can isolate using the SEARCH function, you can subtract 1 (or however many you need) from the value returned by the SEARCH function. Or you might need to include the character(s) just after it; in that case you would just add 1 to the value.
The SEARCH function follows the following structure:
SEARCH( substring, string, [start_position] )
substring: Pure, unbridled geek speak that means whatever you’re searching for (we’re going to use “/”)
string: Typically the cell this text string is in, though you could enter text as long as you flank it with quotation marks
start_position: This is optional because oftentimes you’ll start looking for your character in the beginning of the string. In our case, the two forward slashes at the beginning of each URL (e.g., http:// and https://) prove to be problematic. So what we’ll need to do is tell Excel to start counting at the 9th character (because http:// has 7 characters and https:// has 8).
Final Formula
When I’m finished the formula looks like this:
=LEFT(B3,SEARCH("/",B3,9))
Note: If you’re doing this in a formatted table, Excel will used structured references instead. That’s what I always do. One other little benefit is you don’t have to double-click the top cell to send the formula down. It gets applied to the column automatically.
I just didn’t use a table for this tutorial because the structured references can make the formulas look unnecessarily complicated when you’re just learning.
Watch It In Action
Still a little squeamish? Download the workbook and check out the video.
Stupid Feedback Noise
In quiet spots in the video I get feedback. The only solution I’ve had any success with is to unplug the cord from my PC, but I forgot to do that. Sorry.
How To Use This Data
In my next post I’ll show you how you can then take this data, pop it into a pivot table, and analyze it to steal glean from all of your competitors’ best ideas.
Learn More
If you want to learn more about filters in Excel, check out myΒ Annielytics Dashboard Course, a video course that will teach you how to put your data in stilettos and work the pole. π
Brian Jensen says
Great stuff Annie, the video was a really nice touch. I’m looking forward to the next post in the series and appreciate the heads-up on Mike Girvin!
Annie Cushing says
Thanks, Brian! Yeah, Mike is awesome. Love his work, and he’s a super nice guy to boot. I’m editing the video for the next post right now. π
cayecaye says
Hey Annie, as always excel formula rules! π I have a question though our team does the same url cleaning daily, however we do it using the TEXT to COLUMNS feature using / as the delimiting factor. After we delimit we concatenate the main URL.. does this process about sums up the formula above?
Annie Cushing says
Text to columns also works. It’s just more of a hatchet job approach. But I used it before I learned how to extract text.
Peter Traychev says
Great stuff, thanks π I’ve been using =MID(A2,8,SEARCH(“/”,A2,9)-8) but this one is simpler. If there is a way to strip “http://” and “http://www.” part in a single formula that would be great.
Annie Cushing says
Hey Peter,
Mynda Treacy wrote this post on how to extract just the domain: http://www.myonlinetraininghub.com/excel-mid-function. Let me know if you need any help with it!
Modesto Siotos says
Hi Annie,
Thanks for the tips. One of the example URLs in the spreadsheet is
http://useconomy.about.com/od/suppl1/tp/BP_Oil_Spill.htm but the returned domain appears as http://useconomy.about.com/ which is a subdomain rather than http://about.com/ which is the actual domain.
Do you know what’s the quickest way to return the true domain rather than the subdomain? Thanks!
Annie Cushing says
Well, technically www is a subdomain too. But since I always do my research at the subdomain level, I keep them.
Rob James says
If you want to strip any URL down to the basic domain you end up with a lot of complicated formulae, for example once you’ve remove http://www. the following formula will remove the first subdomain.
=IFERROR(IF(LEN(SUBSTITUTE(LEFT(C2,LEN(C2)-7),”.”,””))= LEN(LEFT(C2,LEN(C2)-7)),C2,RIGHT(C2,LEN(C2)-SEARCH(“.”,C2,1))),””)
Dacey says
Great finding this from a long time
Too much Happy
Annie Cushing says
Wonderful! Happy to be of help!
Brian Jackson says
Thanks Annie for this great little formula π I gave this post a mention in my most recent blog post: http://brianjackson.io/how-i-cleaned-up-a-30000-backlink-profile-in-15-minutes/
Annie Cushing says
Thanks! π
Ti. says
Thank you Annie for sharing knowledge about the extraction of domains through Excel. Thank you Rob for this formula to extract subdomains.
Annie Cushing says
My pleasure!
Marc-jean Desjardins says
Is it possible to search a number of unique rows in a single column and have it count the number of occurrences they appear, respectively, on another tab inside the same workbook?
Ex: Pretend we have two Columns, A and B on sheet2, Column A is the list of 50 states abbreviated, starting at A2 and going down to A51.
I want column B on sheet2 to count the number of occurrences each state appears in column D on sheet1.
I don’t want to use a pivot table. Is there a combo of IF/Search/etc… functions that can do this? I want to be able to drag down the box and have it it populate the count(s), so if the States order switch around (in column A on sheet2) it still remains correct.
Any help is greatly appreciated! π
Annie Cushing says
Oh yeah. These issues are nearly impossible to work through w/o a dummy file. But I’d try wrapping a SEARCH function in a COUNTIF function. I think that’s going to give you your answer. Good luck!
Mark says
Apologies if someone has already mentioned this alternative method or if they linked to another resource that provides instructions, but there is another way to do this very quickly by running multiple “find” and “replace all” queries. Step 1: type in “http://” and replace all with nothing (this will strip all instances of “http://”). Step 2: type in “https://” and replace all with nothing (this will strip all instances of “https://”). Step 3: type in “www.” and replace all with nothing (this will strip all “www” subdomains). Step 4: type in “/*” and replace all with nothing (this will strip all folders beyond the TLD after the first “/” — don’t forget the asterisk). It’s important to follow the steps in the exact order provided.
Annie Cushing says
This is not an ideal way to go about getting what you need because it 1) is manual and 2) does permanent damage to your data. It’s much better to use a formula and keep everything intact.
Jack says
Great Post.
Annie Cushing says
Thanks.