Regex to remove certain characters or text in Excel
Regex to remove certain characters or text in Excel
Have you ever thought how powerful Excel would be if someone could enrich its toolbox with regular
expressions? We have not only thought but worked on it :) And now, you can add this wonderful RegEx
function to your own workbooks and wipe out substrings matching a pattern in no time!
Last week, we looked at how to use regular expressions to replace strings in Excel. For this, we created
a custom Regex Replace function. As it turned out, the function goes beyond its primary use and can
not only replace strings but also remove them. How could that be? In terms of Excel, removing a value
is nothing else but replacing it with an empty string, something that our Regex function is very good
at!
1 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
create your own user-de�ned function. The good news is that such a function is already written,
tested, and ready for use. All you have to do is to copy this code, paste it in your VBA editor, and then
save your �le as a macro-enabled workbook (.xlsm).
The �rst three arguments are required, the last two are optional.
Where:
Replacement - the text to replace with. To remove substrings matching the pattern, use an empty
string ("") for replacement.
Instance_num (optional) - the instance to replace. If omitted, all found matches are replaced
(default).
Match_case (optional) - a Boolean value indicating whether to match or ignore text case. For case-
sensitive matching, use TRUE (default); for case-insensitive - FALSE.
Tip. In simple cases, you can remove speci�c characters or words from cells with Excel formulas.
But regular expressions provide a lot more options for this.
2 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
The RegExpReplace function is designed to �nd all substrings matching a given regex. Which
occurrences to remove is controlled by the 4th optional argument, named instance_num.
The default is "all matches" - when the instance_num argument is omitted, all found matches are
removed. To delete a speci�c match, de�ne the instance number.
In the below strings, suppose you want to delete the �rst order number. All such numbers start with
the hash sign (#) and contain exactly 5 digits. So, we can identify them using this regex:
Pattern: #\d{5}\b
The word boundary \b speci�es that a matching substring cannot be part of a bigger string such as
#10000001.
3 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
For instance, to standardize telephone numbers written in various formats, �rst we get rid of speci�c
characters such as parentheses, hyphens, dots and whitespaces.
Pattern: \(|\)|-|\.|\s
For convenience, you can enter the regex is a separate cell, and refer to that cell using an absolute
reference such as $A$2:
And then, you can standardize the formatting the way you want by using the concatenation operator
(&) and Text functions such as RIGHT, MID and LEFT.
For example, to write all phone numbers in the (123) 456-7890 format, the formula is:
4 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
The pattern is based on negated character classes - a caret is put inside a character class [^ ] to match
any single character NOT in brackets. The + quanti�er forces it to regard consecutive characters as a
single match, so that a replacement is done for a matching substring rather than for each individual
character.
To remove non-alphanumeric characters, i.e. all characters except letters and digits:
Pattern: [^0-9a-zA-Z]+
Pattern: [^0-9a-zA-Z ]+
To delete all characters except letters, digits and underscore, you can use \W that stands for any
character that is NOT alphanumeric character or underscore:
Pattern: \W+
If you want to keep some other characters, e.g. punctuation marks, put them inside the brackets.
For instance, to strip o� any character other than a letter, digit, period, comma, or space, use the
following regex:
Pattern: [^0-9a-zA-Z\., ]+
5 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
This successfully eliminates all special characters, but extra whitespace remains.
To �x this, you can nest the above function into another one that replaces multiple spaces with a
single space character.
Or just use the native TRIM function with the same e�ect:
Pattern: \D+
Pattern: [^0-9]+
6 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
Pattern: [^\d]+
Tip. If your goal is to remove text and spill the remaining numbers into separate cells or place them
all in one cell separated with a speci�ed delimiter, then use the RegExpExtract function as explained
in How to extract numbers from string using regular expressions.
If you have single-line strings that only contain normal spaces (value 32 in the 7-bit ASCII system), it
does not really matter which of the below regexes you use. In case of multi-line strings, it does make a
di�erence.
This formula will strip anything after the �rst space in each line. For the results to display correctly, be
sure to turn Wrap Text on.
To strip o� everything after a whitespace (including a space, tab, carriage return and new line), the
regex is:
Pattern: \s.*
Because \s matches a few di�erent whitespace types including a new line (\n), this formula deletes
7 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
everything after the �rst space in a cell, no matter how many lines there are in it.
In single-line strings, this will remove everything after char. In multi-line strings, each line will be
processed individually because in the VBA Regex �avor, a period (.) matches any character except a
new line.
To delete anything after a given character, including new lines, \n is added to the pattern.
For example, to remove text after the �rst comma in a string, try these regular expressions:
Pattern: ,.*
8 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
Pattern: ,(.|\n)*
In the screenshot below, you can examine how the outcomes di�er.
To match anything up to the last space, this regex will do (quotation marks are added to make a space
after an asterisk noticeable).
To match anything before the last whitespace (including a space, tab, carriage return, and new line),
use this regular expression.
Pattern: .*\s
9 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
Pattern: ^[^ ]* +
From the start of a string ^, we match zero or more non-space characters [^ ]* that are immediately
followed by one or more spaces " +". The last part is added to prevent potential leading spaces in the
results.
To remove text before �rst space in each line, the formula is written in the default "all matches" mode
(instance_num omitted):
To delete text before the �rst space in the �rst line, and leave all other lines intact, the instance_num
argument is set to 1:
10 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
Translated into a human language, it says: "from the start of a string anchored by ^, match 0 or more
characters except char [^char]* up to the �rst occurrence of char.
For example, to delete all text before the �rst colon, use this regular expression:
Pattern: ^[^:]*:
To avoid leading spaces in the results, add a whitespace character \s* to the end. This will remove
everything before the �rst colon and trim any spaces right after it:
Pattern: ^[^:]*:\s*
Tip. Besides regular expressions, Excel has its own means to remove text by position or match. To
learn how to accomplish the task with native formulas, please see How to remove text before or
after a character in Excel.
11 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
For instance, to remove all characters except lowercase letters and dots, the regex is:
Pattern: [^a-z\.]+
In fact, we could do without the + quanti�er here as our function replaces all found matches. The
quanti�er just makes it a little faster - instead of handling each individual character, you replace a
substring.
Given that html tags are always placed within angle brackets <>, you can �nd them using one of the
following regexes.
Negated class:
Pattern: <[^>]*>
Here, we match an opening angle bracket, followed by zero or more occurrences of any character
except the closing angle bracket [^>]* up to the nearest closing angle bracket.
Lazy search:
12 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
Pattern: <.*?>
Here, we match anything from the �rst opening bracket to the �rst closing bracket. The question mark
forces .* to match as few characters as possible until it �nds a closing bracket.
Whichever pattern you choose, the result will be absolutely the same.
For example, to remove all html tags from a string in A5 and leave text, the formula is:
This solution works perfectly for single text (rows 5 - 9). For multiple texts (rows 10 - 12), the results are
questionable - texts from di�erent tags are merged into one. Is this correct or not? I'm afraid, it's not
something that can be easily decided - all depends on your understanding of the desired outcome.
For example, in B11, the result "A1" is expected; while in B10, you might want "data1" and "data2" to
be separated with a space.
To remove html tags and separate the remaining texts with spaces, you can proceed in this way:
13 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
Your part of the job is to construct a regular expression and serve it to the function :) Let me show you
how to do that on a practical example.
In fact, we have already built a similar regex for deleting html tags, i.e. text within angle brackets.
Obviously, the same methods will work for square and round brackets too.
Pattern: (\(.*?\))|(\[.*?\])
The trick is using a lazy quanti�er (*?) to match the shortest possible substring. The �rst group (\(.*?\))
matches anything from an opening parenthesis to the �rst closing parenthesis. The second group
(\[.*?\]) matches anything from an opening bracket to the �rst closing bracket. A vertical bar | acts as
14 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
the OR operator.
With the pattern determined, let's "feed" it to our Regex Remove function. Here's how:
1. On the Ablebits Data tab, in the Text group, click Regex Tools.
2. On the Regex Tools pane, select your source strings, enter your regex, choose the Remove option,
and hit Remove.
To get the results as formulas, not values, select the Insert as a formula check box.
To remove text within brackets from strings in A2:A5, we con�gure the settings as follows:
As the result, the AblebitsRegexRemove function is inserted in a new column next to your original data.
15 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...
The function can also be entered directly in a cell via the standard Insert Function dialog box, where it
is categorized under AblebitsUDFs.
As AblebitsRegexRemove is designed to remove text, it requires only two arguments - the source string
and regex. Both parameters can be de�ned directly in a formula or supplied in the form of cell
references. If needed, this custom function can be used together with any native ones.
For example, to trim extra spaces in the resulting strings, you can utilize the TRIM function as a
wrapper:
=TRIM(AblebitsRegexRemove(A5, $A$2))
That's how to remove strings in Excel using regular expressions. I thank you for reading and look
forward to seeing you on our blog next week!
16 of 16 2/17/2025, 8:48 AM