Regex: The simple approach

Regular Expressions (RegEx) can be quite complex to understand. For a simple start, I have made a "Regex for Dummies" guide, which you will find below:

Simplifications

In order to make it an easy start, I made 2 assumptions:

  1. As an input string, we just look at one line. Multi-line strings are excluded from this tutorial.
  2. A result can only be "True" or "False". "True" means the Regex matches the input string at least once. "False" means the Regex does not match the input string at all.

Overview

In essence, we have an input string and a Regex. We can then check if the Regex matches the string. Example:

Input string: This is a test input string
Regex: npu
Result: True

Literal characters

In the example above, we were using the Regex consisting of the literal characters npu. The Regex is case sensitive, so NPU would give a result of False in the example above.

Literal characters are letters but also numbers and also dashes and many others. So most of the ASCII character set are literal characters in the sense of Regex. There are 12 exceptions, which are discussed below.

Special characters

The following 12 characters have a special meaning:

.              matches any character (exactly one)

\              escape character

*             0, 1 or more times

+             1, 2 or more times

?             0 or 1 time

^             Beginning of input string

$             End of input string

|              Alternation

(              Start grouping

)              End grouping

[              Start character group

{              Start quantifier group

Many of those special characters cannot stand on their own, they relate to the character before. Example:

Regex: a+b would match ab, but also aaaaaaaab.

The + relates to character "a" and indicates how many times it should appear in a row.

Escape character

The escape character must be used to "convert" a special character into a literal character. So if you you would like, for example, to match the "plus" sign, you have to "escape" it first by inserting the escape character (backslash) just before the special character:

The Regex a\+b would match a+b.

Quantifiers

Quantifiers (* + ?) regulate how many times the preceding character must be matched. Example:

Input string: aaaa
Regex: a*
Result: True

Quantifier groups

The Quantifiers in the preceding chapter are shortcuts for Quantifier groups:

* or {0,}            0, 1 or more times

+ or {1,}            1, 2 or more times

? or {0,1}            0 or 1 time

Example: aaa would match the RegEx a{3}, but aa would not.

Character Groups

Also called Character Sets or Character Classes, they indicate a range of characters. Example:

[a-z]               exactly 1 small letter
[A-Z]              exactly 1 capital letter
[0-9]              exactly 1 number
[ae]               a or e
[0-9a-zA-z]    exactly 1 number or character

Alternation

The last special character is the Pipe | . It allows you to alternate strings. Example:

Input string 1: I have a cat
Input string 2: I have a dog
RegEx: cat|dog
Result: True for both Input strings

Regex online tester

The web site below is an excellent resource for testing your expressions with regex:

https://regex101.com

Example for testing a valid email address

RegEx: [a-zA-Z0-9._%-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]+$

  • We first check for one or more characters/numbers, including ._%-
  • Then we expect the @ sign
  • We then check for the domain name (or subdomain name) (characters/numbers, including -)
  • Then we expect a dot
  • Then we expect another subdomain/domain followed by a dot etc.
  • At the end we expect the top-level domain name which can only contain characters

Matches:

  • john.doe@example.com
  • great@my-example.de
  • go@mail.example.com

Not matching:

  • john.doe.example.com
  • john.doe@example.com.
  • john.doe@
  • john.doe@.
  • john.doe@example.

Leave a Reply

Your email address will not be published. Required fields are marked *