Known as “REs”, “regexes”, or “regex patterns”, Regular Expressions are a standard language for matching text patterns. They are used in many programming languages including Python, JavaScript, PHP, Perl, C#, and more. They are also built into text editors like Visual Studio and into command-line utilities like Grep, Sed, and Awk.
In this walkthrough, we will use Regular Expressions in Visual Studio Code to find and update text.
Resources
Getting Started
- Download SampleText.txt
- Open the file in Visual Studio Code
- Open the search bar (
Ctrl+F
orEdit > Find
) - Turn on “Use Regular Expression” by clicking the
.*
button on the search bar or by pressingAlt+R
- Turn on “Match Case” by clicking the
Aa
button on the search bar or by pressingAlt+C
Character Classes and Anchors
The following are basic character classes, anchors and such.
. | Any single character (except newline in some cases) |
\w | Word character: any letter, digit, or underscore (\W is the opposite) |
\b | Boundary between word and non-word |
\d | Decimal digit |
\s | Whitespace (space, tab) |
\n | Newline |
[aeiou] | Custom character set. This example matches any lower-case English vowel. |
[a-z] | Custom character range. This example matches any lower-case Roman letter. |
[^q] | Negative character set. This example matches anything but the letter 'q'. |
^ | Beginning of a line |
$ | End of a line |
\ | Escape (treat a special character as a literal) |
| | Or, may match one thing or another |
Try the following expressions in the Visual Stdio Code search box:
Expression | Matches |
---|---|
a | All occurrences of the letter 'a'. |
ab | All occurrences of the letter 'a' followed by 'b' |
^a | The letter 'a' at the beginning of a line. |
b$ | The letter 'b' at the end of a line. |
[0-9ijk] | Any digit or the letters 'i', 'j', and 'k'. |
Repetition
The following patterns let you match a specific number of repetitions.
* | Zero or more occurrences of the preceding pattern. |
+ | One or more occurrences of the preceding pattern. |
? | Zero or one occurrences of the preceding pattern. |
{6} | Exactly six occurrences of the preceding pattern. |
{2,5} | Between 2 and 5 occurrences of the preceding pattern. |
() | Parentheses encompass a pattern of more than one symbol. |
? | Makes the preceding repetition non-greedy (see below). |
Try the following expressions in the Visual Studio Code search box:
Expression | Matches |
---|---|
a+ | Any series of the letter 'a' |
(ab)+ | Any series of the letters 'ab' repeating at least once |
r.*m | Any sequence of characters, on a single line, that starts with an 'r' and ends with an 'm' |
By default, repetitions are “greedy”. That is, they match as many repetitions as possible. Following the repetition with a ?
causes it to be non-greedy. That is, match as few repetitions as possible.
Try this variation on the last pattern from the previous table:
r.*?m | How are the matches different from before? |
Putting things together
Try these patterns and make up your own:
Expression | Matches |
---|---|
[\w.-]+@[\w.-]+ | Email addresses (but may also match other stuff). |
\d[A-Z]{3}\d{3} | California license plates (one digit, three letters, three digits) |
^[A-Z][a-z]+ [A-Z][a-z]+$ | Most two-word capitalized names. |
^[A-Z][a-z]+ [A-Z]\. [A-Z][a-z]+$ | Most general authority names. |
(Fred)|(George) | Fred or George. |
The general authority names pattern didn’t match names that start with an initial and then the middle name. How can you combine two patterns with the |
operator to match both name styles?
Lookahead
Lookahead operators let you specify content that must or must not immediately follow a pattern.
(?=Joe) | The word, 'Joe' must immediately follow the pattern but it's not included in the match. |
(?!Joe) | The word, 'Joe' must NOT immediately follow the pattern. |
Expression | Matches |
---|---|
Isaac(?= Asimov) | Isaac if it is immediately followed by Asimov. |
Isaac(?! Asimov) | Isaac if it is NOT immediately followed by Asimov (e.g. Isaac Newton) |
Replacement
Patterns in parentheses are a ‘Group’ which may be referenced in the replacement text. In the replacement text, a dollar sign followed by a digit indicates the value of a group should be substituted. For example, $1
references the first group in the match.
In Visual Studio Code, press Ctrl+H
or File > Replace
to open the replace bar. Make sure regular expressions are turned on.
Expression | Replacement | Effect |
---|---|---|
^([A-Z][a-z]+) ([A-Z][a-z]+)$ | $2 $1 | Swaps first and last names. |
([\w.-]+)@[\w.-]+ | $1@gmail.com | Makes all email addresses be at gmail.com. |