Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DEV Community

Ayushman Chaturvedi
Ayushman Chaturvedi

Posted on

JavaScript Regular Expressions Made Simple

If you’ve ever worked with strings in JavaScript—maybe trying to check if an email is valid in a form or clean up some messy input—you’ve probably run into something called regular expressions, or regex.

At first glance, regex looked like a bunch of gibberish to me—like what the heck is this—/\d{3}-?\d{3}-?\d{4}/. But later I realised, its not as scary as it seems. In fact, its just a way to describe patterns in text. Let me break it down for you in plain English.

Creating a regular expression

In JavaScript, a regular expression is an object, constructed with either the RegExp constructor or with forward slash (/) characters enclosing a pattern as a value (literal notation).

let pattern1 = new RegExp("hello");
let pattern2 = /hello/;
Enter fullscreen mode Exit fullscreen mode

Both expressions do the same thing—they look for the word “hello” in a string. The second one is shorter and more commonly used.

Basic Matching

Just like normal objects, regular expressions also have methods. The most common method is test(), which accepts a string and returns a Boolean that tells us whether the string matches the pattern in the expression.

console.log(/cat/.test("black cat")); // true
console.log(/dog/.test("black cat")); // false
Enter fullscreen mode Exit fullscreen mode

Match Sets of Characters

SETS OF CHARACTER

Placing a set of characters between square brackets matches that part of the regular expression to any of the characters within the brackets.

console.log(/[abcdefghijklmnopqrstuvwxyz]/.test("year 2021")); // true
Enter fullscreen mode Exit fullscreen mode

RANGES OF CHARACTER

The above expression matches all strings that contain lowercase English letters. We can make the expression shorter by using a hyphen (-). A hyphen between two characters between square brackets represents a range of characters.

/[a-z]/.test("hello123") // true
/[0-9]/.test("hello123") // true
Enter fullscreen mode Exit fullscreen mode

For a range of characters indicated with a hyphen, the ordering of the characters is determined by their Unicode number. For example, characters a-z (codes 97-122) are next to each in the Unicode ordering, and so using range [a-z] includes every character in this range and matches all lowercase Latin letters.


CHARACTER GROUPS SHORTHAND

In regular expressions, character sets/groups have a built-in shorthand for writing them. Digits ([0-9]) can be represented as \d. Here are some common character sets and what they mean:

Shorthand Meaning
\d Any digit (0-9)
\w Any word character (a-z, A-Z, 0-9, _)
\s Any whitespace character
\D Anything not a digit
\W Anything not a word character
\S Anything not a space

If we want to match a phone number with format XXX-XXX-XXXX, here’s how we can do it:

let phoneNum = /\d\d\d-\d\d\d-\d\d\d\d/
console.log(phoneNum.test("202-588-6500")); // true
console.log(phoneNum.test("67-500-647")); // false
Enter fullscreen mode Exit fullscreen mode

EXCLUDE CHARACTERS

The caret (^) character lets us invert a set of characters. That is, it matches any character except the character(s) in the given set.

console.log(/[^\d]/.test("ujdhf345kd")); // true
console.log(/[^\d]/.test("3453")); //  false
Enter fullscreen mode Exit fullscreen mode

SPECIAL CHARACTERS

Characters like plus signs (+) and question marks (?) have special meanings in regular expressions and need to be preceded by a backslash if we want to indicate the character itself.

let helloQuestion = /hello\?/
Enter fullscreen mode Exit fullscreen mode

These shorthand codes can also be used within square brackets to indicate a set of characters. For example, [\d] represents any digit. When special characters like the plus (+) and the question mark (?) are used between square brackets, they lose their special meaning. So, [+?] matches any plus or question mark.

REPEATED PATTERNS

When we want to match things that repeat (like digits in a phone number), we use special symbols:

+ means "one or more times."

console.log(/\d+/.test("123")); // true (because 1, 2, 3 are digits)
console.log(/\d+/.test("abc")); // false (no digits)
Enter fullscreen mode Exit fullscreen mode

It matches if at least one digit is there.

* means "zero or more times."

console.log(/a*/.test("aaa"));   // true (matches all a's)
console.log(/a*/.test(""));      // true (zero a's is also allowed!)
console.log(/a*/.test("bbb"));   // true (even though there's no 'a', it matches zero a's)
Enter fullscreen mode Exit fullscreen mode

So, it doesn't require the pattern to be present. It's okay if it's there many times, or not at all.

We can say how many times something should appear using curly braces {}.

• {3} means exactly 3 times
• {2, 4} means between 2 and 4 times
• {2,} means 2 or more times

console.log(/\d{3}/.test("123"));    // true (exactly 3 digits)
console.log(/\d{3}/.test("12"));     // false (only 2 digits)
console.log(/\d{2,4}/.test("1234")); // true (4 digits is allowed)
Enter fullscreen mode Exit fullscreen mode

OPTIONAL CHARACTERS

To make a part of a pattern optional, we use the question mark (?). It allows a character to occur zero or one number of times.
For Example, Phone numbers are usually valid even when they are not hyphenated. We can make the hyphen optional. To make a part of a pattern optional, we use the question mark (?).

let phoneNum = /\d{3}-?\d{3}-?\d{4}/
console.log(phoneNum.test("202-588-6500")); //  true
console.log(phoneNum.test("2025886500")); //  true
Enter fullscreen mode Exit fullscreen mode

In the above example, the pattern matches even when the hyphen character (-) is omitted.


GROUP CHARACTERS

We use Parentheses to group parts of a pattern, so that symbols like +, *, or {} apply to the entire group, not just a single character. When a part of a regular expression is surrounded by parentheses, it is treated as a single element by any operations following it.

let laugh = /(ha)+/;
console.log(laugh.test("hahaha")); // true (group "ha" repeated)
console.log(laugh.test("haa"));    // false ("ha" not repeated properly)
Enter fullscreen mode Exit fullscreen mode

CASE SENSITIVITY

We can add the letter i after the regex to make the pattern case-insensitive.

let greet = /hello/i;
console.log(greet.test("HELLO"));  // true
console.log(greet.test("Hello"));  // true
Enter fullscreen mode Exit fullscreen mode

MATCHING WITHIN BOUNDARIES

To make a matching span through an entire string, we use:
• ^ → beginning of string
• $ → end of string


We can use both to make sure the whole string matches the pattern, not just part of it.

let onlyNumbers = /^\d+$/;
console.log(onlyNumbers.test("12345")); // true (only digits)
console.log(onlyNumbers.test("12a45")); // false (has a letter)
console.log(onlyNumbers.test(" 12345")); // false (starts with space)
Enter fullscreen mode Exit fullscreen mode

WORD BOUNDARIES

The marker \b refers to a word boundary, which can be the start or end of the string. \b is like an invisible wall between words. It checks if something is at the start or end of a word. It can also refer to any place in the string that has a word character on one side and a non-word character on the other side.

console.log(/\bcat\b/.test("black cat"));  // true (exact word "cat")
console.log(/\bcat\b/.test("category"));   // false (not a whole word)
console.log(/\bcat/.test("catfish"));      // true (starts with "cat")
Enter fullscreen mode Exit fullscreen mode

ALTERNATIVES WITH THE OR OPERATOR

We use the pipe character (|) to indicate a choice between a pattern to its left and that to its right. For example, we can match a text that contains the word “watch” in either its plural (ending with “es”) form, past tense (ending with “ed”), or personal noun (ending with “er”) form.

let word = /\b\watch(es|ed|er)?\b/;
console.log(word.test("watch")); // true
console.log(word.test("watched")); // true
console.log(word.test("watching")); // false
Enter fullscreen mode Exit fullscreen mode

Other methods for matching

exec()

We already know that the test() method just tells us whether something matches a pattern or not. It always gives result in either true or false. But if we want to see what is actually matched and where it is found in the string, then we use exec() method.

let execMatch = /\d+/.exec("abc 123");
console.log(execMatch); // Array [ "123" ]
console.log(execMatch.index); // 4
let execMatch2 = /\d+/.exec("abc");
console.log(execMatch2); // null
Enter fullscreen mode Exit fullscreen mode

Here’s what’s happening:

  • exec() finds "123" in the string "abc 123".
  • It gives us an array where the first item is the matched text.
  • It also adds a property called index that shows where in the string the match started (position 4 in this case).
  • If there’s no match, exec() returns null.

match()

The match() method works on strings instead of patterns. But it behaves similarly to exec().

console.log("abc 123".match(/\d+/)); // [ "123" ]
Enter fullscreen mode Exit fullscreen mode

Match and Replace

Sometimes, we want to replace part of a string with something else — for example, changing "a" to "e" in "haha". JavaScript gives us a method called .replace() for this.

console.log("haha".replace("a", "e")); // heha
Enter fullscreen mode Exit fullscreen mode

Here, only the first "a" is replaced with "e".


Using Regex with replace():

We can also use regular expressions as the first argument of replace(). This is powerful because it lets us replace patterns, not just exact text.

console.log("hahehahehe".replace(/a/, "e"));
// hehehahehe   (only replaces the first "a")
Enter fullscreen mode Exit fullscreen mode

Replace All Matches with /g

Want to replace every match, not just the first? Add the g (global) flag:

console.log("hahehahehe".replace(/a/g, "e"));
// hehehehehe   (all "a"s are now "e")
Enter fullscreen mode Exit fullscreen mode

If we just want to replace all exact matches (not patterns), we can also use .replaceAll():

console.log("hahehahehe".replaceAll("a", "e"));
// hehehehehe
Enter fullscreen mode Exit fullscreen mode

Using a Function in replace()

Instead of a string, we can also pass a function as the second argument. This lets us do something dynamic with each match.

Example: Convert some specific words to uppercase:

let phrase = "unicef is a humanitarian ngo.";
let result = phrase.replace(/\b(unicef|ngo)\b/g, word => word.toUpperCase());
console.log(result);
// UNICEF is a humanitarian NGO.
Enter fullscreen mode Exit fullscreen mode

The regex looks for whole words unicef or ngo.

The function takes each matched word and returns it in uppercase.

Top comments (8)

Collapse
 
naveen_pandey_deed0f41760 profile image
Naveen Pandey

Thanks for sharing

Collapse
 
tarun_c5d7738bd5211c46683 profile image
Tarun

this helped me clear my doubts🥰

Collapse
 
shriyansh_pandey profile image
Shriyansh Pandey

Helpful guide!

Collapse
 
anshumansingh7 profile image
Anshuman Singh

Brilliant info, devs need more of these blogs.

Collapse
 
rohit_raz_a90d83c22588695 profile image
Rohit Raz

Very Important thing is discussed here. I am very impressed!

Collapse
 
vishwasdubey150 profile image
Vishwas Dubey

Informative 🔥

Collapse
 
rks17 profile image
Rohit Kumar Sahu

Knowledgeable and engaging

Collapse
 
lovish_verma_4514dcf70862 profile image
Lovish Verma

Very helpful ✌️🤌