Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Regex to remove certain characters or text in Excel

The document provides a comprehensive guide on using regular expressions (RegEx) in Excel to remove specific characters or text from strings. It explains how to create a custom VBA function, RegExpReplace, to facilitate this process, along with various examples and patterns for different removal tasks such as deleting special characters, non-numeric characters, and HTML tags. The document emphasizes the flexibility and power of RegEx in enhancing Excel's text manipulation capabilities.

Uploaded by

s.saneep4678
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Regex to remove certain characters or text in Excel

The document provides a comprehensive guide on using regular expressions (RegEx) in Excel to remove specific characters or text from strings. It explains how to create a custom VBA function, RegExpReplace, to facilitate this process, along with various examples and patterns for different removal tasks such as deleting special characters, non-numeric characters, and HTML tags. The document emphasizes the flexibility and power of RegEx in enhancing Excel's text manipulation capabilities.

Uploaded by

s.saneep4678
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Excel Regex to remove certain characters or text


from strings
by Svetlana Cheusheva, updated on August 22, 2023

Have you ever thought how powerful Excel would be if someone could enrich its toolbox with regular
expressions? We have not only thought but worked on it :) And now, you can add this wonderful RegEx
function to your own workbooks and wipe out substrings matching a pattern in no time!

Last week, we looked at how to use regular expressions to replace strings in Excel. For this, we created
a custom Regex Replace function. As it turned out, the function goes beyond its primary use and can
not only replace strings but also remove them. How could that be? In terms of Excel, removing a value
is nothing else but replacing it with an empty string, something that our Regex function is very good
at!

VBA RegExp function to remove substrings

Excel Regex Remove examples

Delete all matches or speci�c match

Strip certain characters

Remove special characters

Delete non-numeric characters

Strip o� everything after space

Remove text after certain character

Purge everything before space

Eliminate everything before character

Remove everything except

Regular expression to remove HTML tags

Ablebits Regex Remove Tool

VBA RegExp function to remove substrings in Excel


As we all know, regular expressions are not supported in Excel by default. To enable them, you need to

1 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

create your own user-de�ned function. The good news is that such a function is already written,
tested, and ready for use. All you have to do is to copy this code, paste it in your VBA editor, and then
save your �le as a macro-enabled workbook (.xlsm).

The function has the following syntax:

RegExpReplace(text, pattern, replacement, [instance_num], [match_case])

The �rst three arguments are required, the last two are optional.

Where:

Text - the text string to search in.

Pattern - the regular expression to search for.

Replacement - the text to replace with. To remove substrings matching the pattern, use an empty
string ("") for replacement.

Instance_num (optional) - the instance to replace. If omitted, all found matches are replaced
(default).

Match_case (optional) - a Boolean value indicating whether to match or ignore text case. For case-
sensitive matching, use TRUE (default); for case-insensitive - FALSE.

For more information, please see RegExpReplace function.

Tip. In simple cases, you can remove speci�c characters or words from cells with Excel formulas.
But regular expressions provide a lot more options for this.

How to remove strings using regular expressions -


examples
As mentioned above, to remove parts of text matching a pattern, you are to replace them with an
empty string. So, a generic formula takes this shape:

RegExpReplace(text, pattern, "", [instance_num], [match_case])

The below examples show various implementations of this basic concept.

Remove all matches or speci�c match

2 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

The RegExpReplace function is designed to �nd all substrings matching a given regex. Which
occurrences to remove is controlled by the 4th optional argument, named instance_num.

The default is "all matches" - when the instance_num argument is omitted, all found matches are
removed. To delete a speci�c match, de�ne the instance number.

In the below strings, suppose you want to delete the �rst order number. All such numbers start with
the hash sign (#) and contain exactly 5 digits. So, we can identify them using this regex:

Pattern: #\d{5}\b

The word boundary \b speci�es that a matching substring cannot be part of a bigger string such as
#10000001.

To remove all matches, the instance_num argument is not de�ned:

=RegExpReplace(A5, "#\d{5}\b", "")

To eradicate only the �rst occurrence, we set the instance_num argument to 1:

=RegExpReplace(A5, "#\d{5}\b", "", 1)

3 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Regex to remove certain characters


To strip o� certain characters from a string, just write down all unwanted characters and separate
them with a vertical bar | which acts as an OR operator in regexes.

For instance, to standardize telephone numbers written in various formats, �rst we get rid of speci�c
characters such as parentheses, hyphens, dots and whitespaces.

Pattern: \(|\)|-|\.|\s

=RegExpReplace(A5, "\(|\)|-|\.|\s", "")

The result of this operation is a 10-digit number like "1234567890".

For convenience, you can enter the regex is a separate cell, and refer to that cell using an absolute
reference such as $A$2:

=RegExpReplace(A5, $A$2, "")

And then, you can standardize the formatting the way you want by using the concatenation operator
(&) and Text functions such as RIGHT, MID and LEFT.

For example, to write all phone numbers in the (123) 456-7890 format, the formula is:

="("&LEFT(B5, 3)&") "&MID(B5, 4, 3)&"-"&RIGHT(B5, 4)

Where B5 is the output of the RegExpReplace function.

4 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Remove special characters using regex


In one of our tutorials, we looked at how to remove unwanted characters in Excel using inbuilt and
custom functions. Regular expressions make things a lot easier! Instead of listing all the characters to
delete, just specify the ones you want to keep :)

The pattern is based on negated character classes - a caret is put inside a character class [^ ] to match
any single character NOT in brackets. The + quanti�er forces it to regard consecutive characters as a
single match, so that a replacement is done for a matching substring rather than for each individual
character.

Depending on your needs, choose one of the following regexes.

To remove non-alphanumeric characters, i.e. all characters except letters and digits:

Pattern: [^0-9a-zA-Z]+

To purge all characters except letters, digits and spaces:

Pattern: [^0-9a-zA-Z ]+

To delete all characters except letters, digits and underscore, you can use \W that stands for any
character that is NOT alphanumeric character or underscore:

Pattern: \W+

If you want to keep some other characters, e.g. punctuation marks, put them inside the brackets.

For instance, to strip o� any character other than a letter, digit, period, comma, or space, use the
following regex:

Pattern: [^0-9a-zA-Z\., ]+

5 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

This successfully eliminates all special characters, but extra whitespace remains.

To �x this, you can nest the above function into another one that replaces multiple spaces with a
single space character.

=RegExpReplace(RegExpReplace(A5,$A$2,""), " +", " ")

Or just use the native TRIM function with the same e�ect:

=TRIM(RegExpReplace(A5, $A$2, ""))

Regex to remove non-numeric characters


To delete all non-numeric characters from a string, you can use either this long formula or one of the
very simple regexes listed below.

Match any character that is NOT a digit:

Pattern: \D+

Strip non-numeric characters using negated classes:

Pattern: [^0-9]+

6 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Pattern: [^\d]+

Tip. If your goal is to remove text and spill the remaining numbers into separate cells or place them
all in one cell separated with a speci�ed delimiter, then use the RegExpExtract function as explained
in How to extract numbers from string using regular expressions.

Regex to remove everything after space


To wipe out everything after a space, use either the space ( ) or whitespace (\s) character to �nd the
�rst space and .* to match any characters after it.

If you have single-line strings that only contain normal spaces (value 32 in the 7-bit ASCII system), it
does not really matter which of the below regexes you use. In case of multi-line strings, it does make a
di�erence.

To remove everything after a space character, use this regex:

Pattern: " .*"

=RegExpReplace(A5, " .*", "")

This formula will strip anything after the �rst space in each line. For the results to display correctly, be
sure to turn Wrap Text on.

To strip o� everything after a whitespace (including a space, tab, carriage return and new line), the
regex is:

Pattern: \s.*

=RegExpReplace(A5, "\s.*", "")

Because \s matches a few di�erent whitespace types including a new line (\n), this formula deletes

7 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

everything after the �rst space in a cell, no matter how many lines there are in it.

Regex to remove text after speci�c character


Using the methods from the previous example, you can eradicate text after any character that you
specify.

To handle each line separately:

Generic pattern: char.*

In single-line strings, this will remove everything after char. In multi-line strings, each line will be
processed individually because in the VBA Regex �avor, a period (.) matches any character except a
new line.

To process all lines as a single string:

Generic pattern: char(.|\n)*

To delete anything after a given character, including new lines, \n is added to the pattern.

For example, to remove text after the �rst comma in a string, try these regular expressions:

Pattern: ,.*

8 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Pattern: ,(.|\n)*

In the screenshot below, you can examine how the outcomes di�er.

Regex to remove everything before space


When working with long strings of text, you may sometimes want to make them shorter by removing
the same part of information in all cells. Below we'll discuss two such cases.

Remove everything before the last space


As with the previous example, a regular expression depends on your understanding of a "space".

To match anything up to the last space, this regex will do (quotation marks are added to make a space
after an asterisk noticeable).

Pattern: ".* "

To match anything before the last whitespace (including a space, tab, carriage return, and new line),
use this regular expression.

Pattern: .*\s

The di�erence is especially noticeable on multi-line strings.

9 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Strip everything before the �rst space


To match anything up to the �rst space in a string, you can use this regular expression:

Pattern: ^[^ ]* +

From the start of a string ^, we match zero or more non-space characters [^ ]* that are immediately
followed by one or more spaces " +". The last part is added to prevent potential leading spaces in the
results.

To remove text before �rst space in each line, the formula is written in the default "all matches" mode
(instance_num omitted):

=RegExpReplace(A5, "^[^ ]* +", "")

To delete text before the �rst space in the �rst line, and leave all other lines intact, the instance_num
argument is set to 1:

=RegExpReplace(A5, "^[^ ]* +", "", 1)

10 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Regex to strip o� everything before character


The easiest way to remove all text before a speci�c character is by using a regex like this:

Generic pattern: ^[^char]*char

Translated into a human language, it says: "from the start of a string anchored by ^, match 0 or more
characters except char [^char]* up to the �rst occurrence of char.

For example, to delete all text before the �rst colon, use this regular expression:

Pattern: ^[^:]*:

To avoid leading spaces in the results, add a whitespace character \s* to the end. This will remove
everything before the �rst colon and trim any spaces right after it:

Pattern: ^[^:]*:\s*

=RegExpReplace(A5, "^[^:]*:\s*", "")

Tip. Besides regular expressions, Excel has its own means to remove text by position or match. To
learn how to accomplish the task with native formulas, please see How to remove text before or
after a character in Excel.

11 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Regex to remove everything except


To eradicate all characters from a string except the ones you want to keep, use negated character
classes.

For instance, to remove all characters except lowercase letters and dots, the regex is:

Pattern: [^a-z\.]+

In fact, we could do without the + quanti�er here as our function replaces all found matches. The
quanti�er just makes it a little faster - instead of handling each individual character, you replace a
substring.

=RegExpReplace(A5, "[^a-z\.]+", "")

Regex to remove html tags in Excel


First o�, it should be noted that HTML is not a regular language, so parsing it using regular
expressions is not the best way. That said, regexes can de�nitely help strip tags out of your cells to
make your dataset cleaner.

Given that html tags are always placed within angle brackets <>, you can �nd them using one of the
following regexes.

Negated class:

Pattern: <[^>]*>

Here, we match an opening angle bracket, followed by zero or more occurrences of any character
except the closing angle bracket [^>]* up to the nearest closing angle bracket.

Lazy search:

12 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

Pattern: <.*?>

Here, we match anything from the �rst opening bracket to the �rst closing bracket. The question mark
forces .* to match as few characters as possible until it �nds a closing bracket.

Whichever pattern you choose, the result will be absolutely the same.

For example, to remove all html tags from a string in A5 and leave text, the formula is:

=RegExpReplace(A5, "<[^>]*>", "")

Or you can use the lazy quanti�er as shown in the screenshot:

This solution works perfectly for single text (rows 5 - 9). For multiple texts (rows 10 - 12), the results are
questionable - texts from di�erent tags are merged into one. Is this correct or not? I'm afraid, it's not
something that can be easily decided - all depends on your understanding of the desired outcome.
For example, in B11, the result "A1" is expected; while in B10, you might want "data1" and "data2" to
be separated with a space.

To remove html tags and separate the remaining texts with spaces, you can proceed in this way:

1. Replace tags with spaces " ", not empty strings:

=RegExpReplace(A5, "<[^>]*>", " ")

2. Reduce multiple spaces to a single space character:

=RegExpReplace(RegExpReplace(A5, "<[^>]*>", " "), " +", " ")

3. Trim leading and trailing spaces:

=TRIM(RegExpReplace(RegExpReplace(A5, "<[^>]*>", " "), " +", " "))

13 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

The result will look something like this:

Ablebits Regex Remove Tool


If you've had a chance to use our Ultimate Suite for Excel, you probably already discovered the new
Regex Tools introduced with the recent release. The beauty of these .NET based Regex functions is
that they, �rstly, support full-featured regular expression syntax free of VBA RegExp limitations, and
secondly, do not require inserting any VBA code in your workbooks as all code integration is done by
us at the backend.

Your part of the job is to construct a regular expression and serve it to the function :) Let me show you
how to do that on a practical example.

How to remove text in brackets and parentheses using regex


In long text strings, less important information is often enclosed in [brackets] and (parentheses). How
do you remove those irrelevant details keeping all other data?

In fact, we have already built a similar regex for deleting html tags, i.e. text within angle brackets.
Obviously, the same methods will work for square and round brackets too.

Pattern: (\(.*?\))|(\[.*?\])

The trick is using a lazy quanti�er (*?) to match the shortest possible substring. The �rst group (\(.*?\))
matches anything from an opening parenthesis to the �rst closing parenthesis. The second group
(\[.*?\]) matches anything from an opening bracket to the �rst closing bracket. A vertical bar | acts as

14 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

the OR operator.

With the pattern determined, let's "feed" it to our Regex Remove function. Here's how:

1. On the Ablebits Data tab, in the Text group, click Regex Tools.

2. On the Regex Tools pane, select your source strings, enter your regex, choose the Remove option,
and hit Remove.

To get the results as formulas, not values, select the Insert as a formula check box.

To remove text within brackets from strings in A2:A5, we con�gure the settings as follows:

As the result, the AblebitsRegexRemove function is inserted in a new column next to your original data.

15 of 16 2/17/2025, 8:48 AM
Firefox https://www.ablebits.com/office-addins-blog/regex-remove-characters-t...

The function can also be entered directly in a cell via the standard Insert Function dialog box, where it
is categorized under AblebitsUDFs.

As AblebitsRegexRemove is designed to remove text, it requires only two arguments - the source string
and regex. Both parameters can be de�ned directly in a formula or supplied in the form of cell
references. If needed, this custom function can be used together with any native ones.

For example, to trim extra spaces in the resulting strings, you can utilize the TRIM function as a
wrapper:

=TRIM(AblebitsRegexRemove(A5, $A$2))

That's how to remove strings in Excel using regular expressions. I thank you for reading and look
forward to seeing you on our blog next week!

16 of 16 2/17/2025, 8:48 AM

You might also like