Table of content:

As a software engineer with over a decade of experience, I've witnessed firsthand the transformative power of regular expressions. While their syntax may appear daunting at first glance, regular expressions are an incredibly valuable tool that can optimize code, enhance performance, and simplify complex string manipulation tasks.

Regular expressions, often abbreviated as regex, are patterns used to match, search, and manipulate strings. They provide a concise and efficient way to perform complex text processing operations that would otherwise require verbose and error-prone code.

One of the primary benefits of using regular expressions is their ability to handle complex matching scenarios with ease. For example, a regular expression can be used to extract specific data from a large block of text, validate user input, or perform advanced search and replace operations.

Moreover, regular expressions can significantly improve code performance. By leveraging the power of pattern matching, regular expressions can quickly identify and manipulate specific parts of a string, eliminating the need for inefficient loops or conditional statements.

Despite their power, regular expressions have often been met with reluctance due to their perceived complexity. However, with a little practice and understanding of their syntax, developers can harness the full potential of regular expressions to enhance their software projects.

In this blog post, we will delve into the fundamentals of regular expressions, exploring their syntax, common use cases, and best practices. By the end of this post, you will have a solid understanding of how to leverage regular expressions to optimize your code and elevate your software development skills.

So, let's dive into the world of regular expressions and unlock the power of efficient and effective string manipulation!

Introduction

Source: https://gist.github.com

Regular expressions (regex) are not just cool 😉 they are an incredibly powerful tool for software developers. They provide a flexible and efficient way to search, replace, match, extract, and split text-based data based on a provided pattern.

With the help of regular expressions, software developers can:

  • Validate user input to ensure it meets specific criteria
  • Extract meaningful information from large blocks of text
  • Perform complex search and replace operations with precision
  • Automate repetitive text manipulation tasks

Despite their power, regular expressions have often been shrouded in mystery and perceived as complex and difficult to master. However, with a little practice and understanding of their syntax, developers can harness the full potential of regular expressions to enhance their software projects.

To dispel the misconception that regular expressions are some kind of "cat magic," let's break down their syntax into manageable chunks:

  • Pattern: A regular expression is essentially a pattern that defines the criteria for matching specific text.
  • Metacharacters: Regular expressions use special characters, called metacharacters, to represent specific elements within the pattern. For example, the "." metacharacter matches any single character, while the "*" metacharacter matches zero or more occurrences of the preceding element.
  • Quantifiers: Quantifiers are used to specify how many times a particular element can occur within the pattern. For example, the "?" quantifier matches the preceding element zero or one time, while the "+" quantifier matches the preceding element one or more times.

By combining these elements, developers can create powerful regular expressions that can handle complex matching scenarios with ease.

In the next section, we will explore some common use cases of regular expressions and provide practical examples to help you better understand their application in software development.

What Are Regular Expressions?

Regex, short for Regular Expressions, is a set of characters used to create a search pattern. They are a powerful tool often used to match a specific String, but could also be used to search, replace, split, and extract patterns from any text String.

Regex can be used in any programming language and is based on the mathematical concept of regular sets and regularity.

It's essentially a sequence of characters used to create search patterns and can perform four primary operations:

  • Extraction/Find: look through an input String to find pieces that match the provided pattern.
  • Subtract/Replace: look through an input String to find Substrings that match the provided pattern and replace them with the replace String
  • Split: remove portions of the String and return an array with remaining values.
  • Validation/Matching: determine if the provided pattern matches the String, returning a boolean.

Most Used Regex Patterns In JavaScript

Before going into detail and explaining the different operations doable with Regex, you should know the most used Regex Patterns in JavaScript.

  • ^: Defines the start of the pattern.
  • $: Defines the end of the pattern.
  • [] Defines a character set.
  • () Defines a capturing group that groups multiple characters.
  • [abcdef]: Matches any single occurrence of any of the characters abcdef.
  • [a-z]: Matches any single lowercase letter.
  • [A-Z]: Matches any single uppercase letter
  • [a-zA-Z]: Matches any single letter from a-z ignoring the case
  • .: Matches any character except line breaks.
  • ?: Matches 0 or 1 occurrence of the preceding character, set, or group
  • *: Matches 0 or more occurrences of the preceding character, set, or group
  • +: Matches 1 or more occurrences of the preceding character, set, or group
  • {n}: Matches exactly n occurrences of the preceding character, set, or group
  • {n, m}: Matches exactly n or m occurrences of the preceding character, set, or group

How To Extract A Substring With Regex

Substrings can be extracted from a large String using the match() function and a Regular Expression in JavaScript. match is a native JavaScript String function that uses a provided Regex to retrieve the result of a matching String.

The JavaScript match function will always return an array containing any matching String or nothing.

See the following example to better understand match:

const paragraph = 'The quick brown fox jumps over the lazy dog.';
const regex = {
  uppercase: /[A-Z]/g,
  fox: /fox/
};

console.log(paragraph.match(regex.uppercase)); // Array ["T"]
console.log(paragraph.match(regex.fox)); // Array ["fox"]

In this example, a paragraph String, an uppercase Regex, and a fox Regex is defined:

  • paragraph: A String containing the text: The quick brown fox jumps over the lazy dog. It barked.
  • uppercase: A regex that matches any uppercase letter
  • fox: A regex that matches the String "fox"

Next, two console.log will use the Regex to retrieve all results from the provided paragraph String. Be aware that match returns an array of Strings, also if the result is only one String.

How To Replace A Substring With Regex

Another feature of Regex in JavaScript is replacing a single part of any String with a new value using the String replace function. Normally, every JavaScript developer should know how to use the replace function with two provided Strings, but using it with a regex is also possible and often not known.

One benefit of using Regex instead of a simple String within the replace function is that it is possible to ignore the case or set it to global replacement. See the following example for some examples:

const regex = /apples/g;
const regexIgnoreCase = /apples/gi
const regexSingle = /apples/i
const str = "Apples are round, and apples are juicy.";
const newstr1 = str.replace(regex, "oranges"); // "Apples are round, and oranges are juicy."
const newstr2 = str.replace(regexIgnoreCase, "oranges"); // "oranges are round, and oranges are juicy."
const newstr3 = str.replace(regexSingle, "oranges"); // "oranges are round, and apples are juicy."

This example contains three different Regular Expressions:

  • regex: This Regex will replace "apples" globally but will be case sensitive
  • regexIgnoreCase: This Regex will replace "apples" globally and will be case insensitive
  • regexSingle: This Regex will replace "apples" one time but will be case insensitive

In addition to simple replacements of Strings, it is possible to switch words with Regex in JavaScript:

const re = /(\w+)\s(\w+)/;
const str = "Hoa Nguyen";
const newstr = str.replace(re, "$2, $1");
console.log(newstr); // Nguyen, Hoa

This rather complex Regex will extract the first word $1 and the second word $2 from the String str and create the result String $2, $1.

How To Split A String With Regex

In JavaScript, any String can be split into an array of Substrings by using the split() function with a separator like " "",", or ".". If using split() with a separator, the input String will be split wherever the separator is present.

const str = 'The quick brown fox jumps over the lazy dog.';
const words = str.split(' ');
console.log(words); // Array ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."]

In this code snippet, the String str will be split wherever a space (" ") is present resulting in an output array  ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog."]

In addition to using a separator, the split() function works with a Regular Expression. By using a Regex, the split() function will split the String wherever the provided Regex matches the String. This comes in handy if you want to extract specific parts of an input String. For example, split an html String into an array containing any 

...

. Also, this is needed if multiple separators have to be used to split the String.

 

For example, let's take the previously defined String str and replace the whitespace with different whitespace characters (\n\t, and " "). By doing this the used split() function will not work because it only split the String on any " ". But, by providing the Regex /\s+/ the String will be split on ANY whitespace character.

const str = "The\tquick\tbrown\tfox jumps over\nthe lazy dog";
const words = str.split(/\s+/);
console.log(words); // Array ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]

A more complex example can be found in the next code snippet from the MDN web docs:

const names = "Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ";
const regex = /\s*(?:;|$)\s*/;
const nameList = names.split(regex); Array ["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand", ""]

In this example, a rather complex Regex is used to split the String. With the Regex, the split() function searches for zero or more spaces that are followed by a semicolon, followed by zero or more spaces. Then all found spaces/semicolons are removed from the String. The resulting array nameList will only return the names from the input String names.

By using Regular Expressions as a separator for the split() function, you can split Strings in many ways in a single line of code resulting in a well-structured array.

How To Match A String With Regex

The most known usage of Regular Expressions is testing if a String matches a provided pattern by using the JavaScript test() function with a Regular Expression.

For example, checking a String if it is an Email can be done with this code snippet:

const isEmailValid = (email) =>
    /^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$/.test(email);

isEmailValid('[email protected]'); // true
isEmailValid('etsta'); // false
isEmailValid(''); // false

In this example, a function isEmailValid() is created which will use a really complex Regular Expression to check if a provided String email is a valid Email address.

As the Regular Expression /^\w+([.-]?\w+)@\w+([.-]?\w+)(.\w{2,3})+$/ looks very complex, it will be explained in detail:

  • The first character / and the last character / delimit the Regular Expression pattern.
  • ^: matches the beginning of the String.
  • \w+: matches any word character (alphanumeric & underscore). The + indicates that there has to be at least 1 word.
  • ([.-]?\w+)*: this part is optional (*) and will match any String that can (didn't have to) start with a . or - (only one of them) and followed for at least one word (+).
  • @: will match the '@' character.
  • \w+: matches any word character (alphanumeric & underscore). The + indicates that there has to be at least 1 word.
  • ([.-]?\w+)*:  this part is optional (*) and will match any String that can (didn't have to) start with a . or - (only one of them) and followed for at least one word (+).
  • (.\w{2,3})+: this part matches any String starting with a . and is followed by 2 or 3-word characters.
  • $: Matches the end of the String.

Conclusion

Regular Expressions are powerful! Using them in your JavaScript software project can enhance it and optimize quality and performance. However, using them too often in your project could make it less readable for "Regular Expression unfamiliar" software developers.

By fully understanding Regular Expression, you should be well-equipped to tackle complex JavaScript projects and build robust, high-quality software.

However, mastering Regular Expressions could take a lot of time and practice, so please don't get discouraged if you do not fully understand them immediately. Keep experimenting with them to further increase your skill.

I recommend either regexr.com or regex101.com, which both have an easily understandable UI and will explain every Regex pattern in detail. Regexr also has developed a Chrome extension that can be installed and used within the browser.

Hopefully, this article gave you a quick and neat overview of how to use Regular Expressions in your JavaScript project.