Regular Expressions (RegEx) can be quite complex to understand. For a simple start, I have made a "Regex for Dummies" guide, which you will find below:
Simplifications
In order to make it an easy start, I made 2 assumptions:
- As an input string, we just look at one line. Multi-line strings are excluded from this tutorial.
- A result can only be "True" or "False". "True" means the Regex matches the input string at least once. "False" means the Regex does not match the input string at all.
Overview
In essence, we have an input string and a Regex. We can then check if the Regex matches the string. Example:
Input string: This is a test input string
Regex: npu
Result: True
Literal characters
In the example above, we were using the Regex consisting of the literal characters npu. The Regex is case sensitive, so NPU would give a result of False in the example above.
Literal characters are letters but also numbers and also dashes and many others. So most of the ASCII character set are literal characters in the sense of Regex. There are 12 exceptions, which are discussed below.
Special characters
The following 12 characters have a special meaning:
. matches any character (exactly one)
\ escape character
* 0, 1 or more times
+ 1, 2 or more times
? 0 or 1 time
^ Beginning of input string
$ End of input string
| Alternation
( Start grouping
) End grouping
[ Start character group
{ Start quantifier group
Many of those special characters cannot stand on their own, they relate to the character before. Example:
Regex: a+b would match ab, but also aaaaaaaab.
The + relates to character "a" and indicates how many times it should appear in a row.
Escape character
The escape character must be used to "convert" a special character into a literal character. So if you you would like, for example, to match the "plus" sign, you have to "escape" it first by inserting the escape character (backslash) just before the special character:
The Regex a\+b would match a+b.
Quantifiers
Quantifiers (* + ?) regulate how many times the preceding character must be matched. Example:
Input string: aaaa
Regex: a*
Result: True
Quantifier groups
The Quantifiers in the preceding chapter are shortcuts for Quantifier groups:
* or {0,} 0, 1 or more times
+ or {1,} 1, 2 or more times
? or {0,1} 0 or 1 time
Example: aaa would match the RegEx a{3}, but aa would not.
Character Groups
Also called Character Sets or Character Classes, they indicate a range of characters. Example:
[a-z] exactly 1 small letter
[A-Z] exactly 1 capital letter
[0-9] exactly 1 number
[ae] a or e
[0-9a-zA-z] exactly 1 number or character
Alternation
The last special character is the Pipe | . It allows you to alternate strings. Example:
Input string 1: I have a cat
Input string 2: I have a dog
RegEx: cat|dog
Result: True for both Input strings
Regex online tester
The web site below is an excellent resource for testing your expressions with regex:
Example for testing a valid email address
RegEx: [a-zA-Z0-9._%-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]+$
- We first check for one or more characters/numbers, including ._%-
- Then we expect the @ sign
- We then check for the domain name (or subdomain name) (characters/numbers, including -)
- Then we expect a dot
- Then we expect another subdomain/domain followed by a dot etc.
- At the end we expect the top-level domain name which can only contain characters
Matches:
Not matching:
- john.doe.example.com
- [email protected].
- john.doe@
- john.doe@.
- john.doe@example.