JavaScript Regular Expressions

This section describes JavaScript regular expressions. It provides a brief overview of each syntax element.

The syntax for define a regular expression:

let re = /pattern/flags;
// or
let re = new RegExp(pattern, flags)

Some external links:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExpr (MDN)

Regex flags

Flags are special parameters that can change the way a regular expression is interpreted or the way it interacts with the input text. Each flag corresponds to one accessor property on the RegExp object.

Flag	Description	Corresponding property
d	Generate indices for substring matches.	`hasIndices`
g	Global search.	`global`
i	Case-insensitive search.	`ignoreCase`
m	Makes ^ and $ match the start and end of each line instead of those of the entire string.	`multiline`
s	Allows . to match newline characters.	`dotAll`
u	"Unicode"; treat a pattern as a sequence of Unicode code points.	`unicode`
v	An upgrade to the u mode with more Unicode features.	`unicodeSets`
y	Perform a "sticky" search that matches starting at the current position in the target string.	`sticky`

Assertions

Assertions are constructs that test whether the string meets a certain condition at the specified position, but not consume characters. Assertions cannot be quantified.

Input boundary assertion: ^, $: Asserts that the current position is the start or end of input, or start or end of a line if the m flag is set.
Lookahead assertion: (?=...), (?!...): Asserts that the current position is followed or not followed by a certain pattern.
Lookbehind assertion: (?<=...), (?<!...): Asserts that the current position is preceded or not preceded by a certain pattern.
Word boundary assertion: \b, \B: Asserts that the current position is a word boundary.

Atoms

Atoms are the most basic units of a regular expression. Each atom consumes one or more characters in the string, and either fails the match or allows the pattern to continue matching with the next atom.

Backreference: \1, \2: Matches a previously matched subpattern captured with a capturing group.
Capturing group: (...): Matches a subpattern and remembers information about the match.
Character class: [...], [^...]
Matches any character in or not in a set of characters. When the v flag is enabled, it can also be used to match finite-length strings.
Character class escape: \d, \D, \w, \W, \s, \S: Matches any character in or not in a predefined set of characters.
Character escape: \n, \u{...}: Matches a character that may not be able to be conveniently represented in its literal form.
Literal character: a, b: Matches a specific character.
Modifier: (?ims-ims:...): Overrides flag settings in a specific part of a regular expression.
Named backreference: \k<name>: Matches a previously matched subpattern captured with a named capturing group.
Named capturing group: (?<name>...): Matches a subpattern and remembers information about the match. The group can later be identified by a custom name instead of by its index in the pattern.
Non-capturing group: (?:...): Matches a subpattern without remembering information about the match.
Unicode character class escape: \p{...}, \P{...}: Matches a set of characters specified by a Unicode property. When the v flag is enabled, it can also be used to match finite-length strings.
Wildcard: .: Matches any character except line terminators, unless the s flag is set.

Escape sequences

Escape sequences in regexes refer to any kind of syntax formed by \ followed by one or more characters. They may serve very different purposes depending on what follow \. Below is a list of all valid escape sequences:

Escape sequence	Followed by	Meaning
`\B`	None	Non-word-boundary assertion
`\D`	None	Character class escape representing non-digit characters
`\P`	{, a Unicode property and/or value, then }	Unicode character class escape representing characters without the specified Unicode property
`\S`	None	Character class escape representing non-white-space characters
`\W`	None	Character class escape representing non-word characters
`\b`	None	Word boundary assertion; inside character classes, represents U+0008 (BACKSPACE)
`\c`	A letter from A to Z or a to z	A character escape representing the control character with value equal to the letter's character value modulo 32
`\d`	None	Character class escape representing digit characters (0 to 9)
`\f`	None	Character escape representing U+000C (FORM FEED)
`\k`	<, an identifier, then >	A named backreference
`\n`	None	Character escape representing U+000A (LINE FEED)
`\p`	{, a Unicode property and/or value, then }	Unicode character class escape representing characters with the specified Unicode property
`\q`	{, a string, then a }	Only valid inside v-mode character classes; represents the string to be matched literally
`\r`	None	Character escape representing U+000D (CARRIAGE RETURN)
`\s`	None	Character class escape representing whitespace characters
`\t`	None	Character escape representing U+0009 (CHARACTER TABULATION)
`\u`	4 hexadecimal digits; or {, 1 to 6 hexadecimal digits, then }	Character escape representing the character with the given code point
`\v`	None	Character escape representing U+000B (LINE TABULATION)
`\w`	None	Character class escape representing word characters (A to Z, a to z, 0 to 9, _)
`\x`	2 hexadecimal digits	Character escape representing the character with the given value
`\0`	None	Character escape representing U+0000 (NULL)

\ followed by 0 and another digit becomes a legacy octal escape sequence, which is forbidden in Unicode-aware mode. \ followed by any other digit sequence becomes a backreference.

In addition, \ can be followed by some non-letter-or-digit characters, in which case the escape sequence is always a character escape representing the escaped character itself:

\$, $, $, \*, \+, \., \/, \?, \[, \\, \], \^, \{, \|, \}: valid everywhere
\-: only valid inside character classes
\!, \#, \%, \&, \,, \:, \;, \<, \=, \>, \@, \`, \~: only valid inside v-mode character classes

The other ASCII characters, namely space character, ", ', _, and any letter character not mentioned above, are not valid escape sequences. In Unicode-unaware mode, escape sequences that are not one of the above become identity escapes: they represent the character that follows the backslash. For example, \a represents the character a. This behavior limits the ability to introduce new escape sequences without causing backward compatibility issues, and is therefore forbidden in Unicode-aware mode.

`RegExp` Static Properties

RegExp.$1, …, RegExp.$9 (Deprecated): Static read-only properties that contain parenthesized substring matches.
RegExp.input ($_) (Deprecated): A static property that contains the last string against which a regular expression was successfully matched.
RegExp.lastMatch ($&) (Deprecated): A static read-only property that contains the last matched substring.
RegExp.lastParen ($+) (Deprecated): A static read-only property that contains the last parenthesized substring match.
RegExp.leftContext ($`) (Deprecated): A static read-only property that contains the substring preceding the most recent match.
RegExp.rightContext ($') (Deprecated): A static read-only property that contains the substring following the most recent match.
RegExp[Symbol.species]: The constructor function that is used to create derived objects.

Static `RegExp.escape(input_text)`

The RegExp.escape() static method escapes any potential regex syntax characters in a string, and returns a new string that can be safely used as a literal pattern for the RegExp() constructor.

When dynamically creating a RegExp with user-provided content, consider using this function to sanitize the input (unless the input is actually intended to contain regex syntax).

RegExp.escape(string)

where string is the string to escape.

The following examples demonstrate various inputs and outputs for the RegExp.escape() method:

RegExp.escape("Buy it. use it. break it. fix it.");
// "\\x42uy\\x20it\\.\\x20use\\x20it\\.\\x20break\\x20it\\.\\x20fix\\x20it\\."
RegExp.escape("foo.bar"); // "\\x66oo\\.bar"
RegExp.escape("foo-bar"); // "\\x66oo\\x2dbar"
RegExp.escape("foo\nbar"); // "\\x66oo\\nbar"
RegExp.escape("foo\uD800bar"); // "\\x66oo\\ud800bar"
RegExp.escape("foo\u2028bar"); // "\\x66oo\\u2028bar"

The primary use case of RegExp.escape() is when you want to embed a string into a bigger regex pattern, and you want to ensure that the string is treated as a literal pattern, not as a regex syntax. Consider the following naïve example that replaces URLs:

function removeDomain(text, domain) {
  return text.replace(new RegExp(`https?://${domain}(?=/)`, "g"), "");
}

const input =
  "Consider using [RegExp.escape()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string.";
const domain = "developer.mozilla.org";
console.log(removeDomain(input, domain));
// Consider using [RegExp.escape()](/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/escape) to escape special characters in a string.

To fix this, we can use RegExp.escape() to ensure that any user input is treated as a literal pattern:

function removeDomain(text, domain) {
  return text.replace(
    new RegExp(`https?://${RegExp.escape(domain)}(?=/)`, "g"),
    "",
  );
}

Now this function will do exactly what we intend to, and will not transform developer-mozilla.org URLs.

`RegExp` Instance Properties

These properties are defined on RegExp.prototype and shared by all RegExp instances.

RegExp.prototype.constructor: The constructor function that created the instance object. For RegExp instances, the initial value is the RegExp constructor.
RegExp.prototype.dotAll: Whether . matches newlines or not.
RegExp.prototype.flags: A string that contains the flags of the RegExp object.
RegExp.prototype.global: Whether to test the regular expression against all possible matches in a string, or only against the first.
RegExp.prototype.hasIndices: Whether the regular expression result exposes the start and end indices of captured substrings.
RegExp.prototype.ignoreCase: Whether to ignore case while attempting a match in a string.
RegExp.prototype.multiline: Whether or not to search in strings across multiple lines.
RegExp.prototype.source: The text of the pattern.
RegExp.prototype.sticky: Whether or not the search is sticky.
RegExp.prototype.unicode: Whether or not Unicode features are enabled.
RegExp.prototype.unicodeSets: Whether or not the v flag, an upgrade to the u mode, is enabled.

The foregoing properties are own properties of each RegExp instance

lastIndex: The index at which to start the next match.

`RegExp` Instance Methods

RegExp.prototype.compile() (Deprecated): (Re-)compiles a regular expression during execution of a script.
RegExp.prototype.exec(): Executes a search for a match in its string parameter.
RegExp.prototype.test(): Tests for a match in its string parameter.
RegExp.prototype.toString(): Returns a string representing the specified object. Overrides the Object.prototype.toString() method.
RegExp.prototype[Symbol.match](): Performs match to given string and returns match result.
RegExp.prototype[Symbol.matchAll](): Returns all matches of the regular expression against a string.
RegExp.prototype[Symbol.replace](): Replaces matches in given string with new substring.
RegExp.prototype[Symbol.search](): Searches the match in given string and returns the index the pattern found in the string.
RegExp.prototype[Symbol.split](): Splits given string into an array by separating the string into substrings.

JavaScript Regular Expressions

Regex flags

Assertions

Atoms

Escape sequences

RegExp Static Properties

Static RegExp.escape(input_text)

RegExp Instance Properties

RegExp Instance Methods

`RegExp` Static Properties

Static `RegExp.escape(input_text)`

`RegExp` Instance Properties

`RegExp` Instance Methods