Why You Should Learn Regex (Regular Expressions) - It's Not Just for Code

I spent a significant part of my career thinking of regular expressions as tools only for complex pattern matching in code—and the inevitable debugging nightmare when I had to revisit that code later. Now I know how helpful they are for tasks that support writing code, and I use them regularly for the massive productivity boost they can provide.

This isn’t a tutorial on how to use regular expressions—there are already plenty of those out there.

Using Regular Expressions in Code

Let’s start with some quick thoughts on using regular expressions in code before we cover the main topic.

Many developers are familiar with the famous quote from Jamie Zawinski about regular expressions from 1997:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

There is some truth to this. Using regular expressions in code can be problematic if not done wisely. This post isn’t about using regular expressions in code, but I will state the following about doing so:

  1. Always clearly document what a regular expression does, either with a comment, or preferably, by wrapping it in a function with a name that clearly documents what the regular expression does.

So, instead of doing this:

// Some C# code here.
// Assume we have a variable called accountNumber that contains a customer's account number.

if (Regex.IsMatch(accountNumber, @"^\d{4}-\d{3}-\d{4}-\d{2}-[A-Z]$"))
{
    // Some more logic here if validation passes.
}

Write something more like this:


// Some C# code here.
// Assume we have a variable called accountNumber that contains a customer's account number.

if (IsValidAccountNumber(accountNumber))
{
    // Some more logic here if validation passes
}

/// <summary>
/// <para>Ensures accountNumber matches pattern 9999-999-9999-99-X</para>
/// <para>Where 9 is any number 0 through 9 and X is any letter A-Z, upper case.</para>
/// </summary>
/// <param name="accountNumber">The account number to verify</param>
/// <returns>bool</returns>
public static bool IsValidAccountNumber(string accountNumber)
{
    return Regex.IsMatch(accountNumber, @"^\d{4}-\d{3}-\d{4}-\d{2}-[A-Z]$");
}
  1. Use regular expressions sparingly in code and favor library functions for pattern matching when possible.

In the previous example, we’re using an imagined, proprietary account number format. A regular expression makes sense in this case. However, for many common data validation tasks, many modern libraries include built-in functions for common pattern matching scenarios. Examples include date matching, using functions like .NET’s DateTime.TryParse() and PHP’s DateTimeImmutable::createFormFormat(), and parsing numeric values using functions like .NET’s Decimal.TryParse() and Java’s Double.parseDouble(), etc.

Favor whatever is available in your language or its libraries before resorting to using regular expressions in code. However, do use them when they make sense.

A More Powerful Reason to Learn Regex: Productivity

There is, in my opinion, a much more important reason to learn regular expressions than for pattern matching in code. That reason is productivity.

Let’s look at a quick example using Visual Studio Code. For this example, we’ll be using a CSV file called SampleAccounts.csv. In our fictional example, we’ll pretend we received this data from the Sales team and now we need to import it into a database table or some other system that supports CSV. Also, let’s add a little more to the puzzle and say that the last character in the account number is a code indicating the region the customer is in. Account numbers ending in “-A” represent a region called “East” and account numbers ending in “-B” represent a region called “West”.

The contents of SampleAccounts.csv

Name,AccountNumber,PremiumSubscription,Active
Customer One,1234-123-1234-12-A,True,True
Customer Two,2234-223-1234-12-B,True,True
Customer Three,3234-323-1234-12-A,True,True
Customer Four,4234-423-1234-12-A,True,True
Customer Five,5234-423-1234-12-B,True,True

Let’s look at the specs in the README.txt for our fictional import utility that we will use to load the CSV:

This import utility is used to import data into the customer management database.
Data must be in CSV format.
The columns must be in the following order: AccountNumber,Name,PremiumSubscription,Active,Region

We have two problems here—no pun intended based on the famous quote referenced earlier. First, the import utility requires the columns in the CSV to be in a different order than what we received from the Sales team. It also requires a column for the Region.

Since there are only 5 customer rows in this CSV file, it won’t take much time to edit it by hand. In a real-world scenario though, the CSV could have hundreds or thousands of rows in it.

We’re software professionals so data manipulation like this is easy, right? We’ll just write a quick utility program to reformat the data. We can probably get that done in under an hour! Sound familiar?

Let’s consider two scenarios:

Scenario #1:

What we don’t realize at the time is that this import is going to be a manual process that we’ll have to do for a long time. Every time the Sales team sends us the CSV data, the columns are going to be in a different order because new versions of the Sales team’s application are released regularly. More regions are going to be added every few months. Unbeknownst to our past selves, this pattern continues for years, and leadership keeps bumping full automation of this process in favor of more important projects. If we knew all of this up-front, maybe writing a custom utility would make sense, but right now we don’t know any of that. So, let’s just write our utility to reformat the data and call it good. We end up spending an extra hour updating the logic every time we get a new export from the Sales team.

Scenario #2:

We only have to do this once and never again. So, we just spent an hour building a throw-away utility.

This is how I used to think about problems like this—until I learned what regular expressions can do. Writing a custom utility might take 15 minutes to an hour to write in code, depending on your experience. It will also require debugging time to troubleshoot any problems. When it’s done, because we aren’t aware that the format of the data from the Sales team is going to change frequently, it won’t be robust, and will have to be updated every time we need to use it.

All of this can be done in a few seconds with regular expressions. This feature is already built into most text editors and IDEs.

Yes, I know I could ask an AI agent like ChatGPT, Claude, or Gemini to help with this type of data manipulation, but this isn’t always an option. Maybe the CSV contains sensitive personal information that we ethically or legally can’t give to an AI agent. Furthermore, data manipulation like this can sometimes still take longer to get right with AI assistance than it would to just use regular expressions in a text editor that supports them.

So, let’s take a look at how to reformat this data using regular expressions in Visual Studio Code.

Here is a screenshot of the file, opened in Visual Studio Code:

CSV file opened in Visual Studio Code showing account data

First, let’s get the existing columns in the right order. We need to go from:

Name,AccountNumber,PremiumSubscription,Active

To:

AccountNumber,Name,PremiumSubscription,Active

We’ll deal with the Region column in a second pass.

NOTE: Always make a backup copy, or ensure the existence of one, before making changes.

The first thing to do is press Ctrl+H to open the Find and Replace option in Visual Studio Code:

Find and Replace option in Visual Studio Code

If we hover over the option with the “.*” icon, we can see from the tooltip “Use Regular Expressions (Alt+R)” that this is the button to activate regular expressions.

Tooltip for Use Regular Expressions in Visual Studio Code

Click that button to activate regular expression functionality in the find/replace feature.

Then enter the following regular expression into the find box:

(.+?)(,)(.+?)(,)(.+?)(,)(.+)

While it might look a bit intimidating at first, this pattern is approachable once you understand the basics of regular expressions.

Each pair of parentheses creates a capturing group, which stores the matched text. You can refer to these groups using placeholders like $1, $2, etc., in the Replace box.

The . matches any single character except newlines by default. The + matches one or more occurrences of the previous character, in this case the “.” The ? makes the preceding + “non-greedy” (also called lazy), meaning it will match as few characters as possible before the next part of the pattern matches. If we didn’t use the ? after the +, the pattern would match everything up to the last occurrence of the “,” character in each line.

The comma is not a metacharacter in this context, so simply writing a comma in the regular expression will match a literal comma in the data.

We basically continue this pattern until we have a regular expression that matches our entire line.

In our example, which is fairly simple, we need to replace the positions of the first two columns. To do this, we’ll use the variable placeholders and write the following in the Replace box:

$3$2$1$2$5$2$7

This uses the variable placeholders to tell the regular expression engine in Visual Studio Code how to re-order the data:

$3 matches the contents matched in the 3rd set of parentheses in the regular expression used in the Find box. In our case, this is column 2 in the CSV.

$2 matches the contents in the second pair of parentheses, which is the “,”. Since this is the delimiter for the entire string, we can reuse it as the delimiter between columns in the Replace expression.

$1 matches the contents in the 1st pair of parentheses, which is column one.

We repeat this pattern until we have all of our matches accounted for in the regular expression that we’ll be using in the Replace box.

Now, we’ll click the replace all button to apply the find/replace, assisted with regular expressions.

And just like that, our columns have been reordered.

Reordered columns in Visual Studio Code

It did take a little trial and error for me to get the regular expression format right for the find and replace, especially figuring out the correct placement of the placeholders in the replace box regular expression. This still only took me about 5 minutes. That’s much faster than programming a custom utility to do the same.

Now, let’s add the Region column using regular expressions too. This one is a little bit simpler.

First, let’s add the Region column header to the data:

AccountNumber,Name,PremiumSubscription,Active,Region
1234-123-1234-12-A,Customer One,True,True
2234-223-1234-12-B,Customer Two,True,True
3234-323-1234-12-A,Customer Three,True,True
4234-423-1234-12-A,Customer Four,True,True
5234-423-1234-12-B,Customer Five,True,True

Now, enter the following in the find box:

(.+-A.+)

This matches any line where ("-A") appears somewhere in the middle of the account number. This works in our example, as the data is consistent.

And enter this into the Replace field:

$1,East

Click Replace All and our East regions have been added to the appropriate lines:

AccountNumber,Name,PremiumSubscription,Active,Region
1234-123-1234-12-A,Customer One,True,True,East
2234-223-1234-12-B,Customer Two,True,True
3234-323-1234-12-A,Customer Three,True,True,East
4234-423-1234-12-A,Customer Four,True,True,East
5234-423-1234-12-B,Customer Five,True,True

This works because $1 is a capturing group that matches the entire line where “-A” occurs anywhere in the line, and then adds the literal “,East” to the end of any matched line.

To do the same for the west region, enter this in the find field to match accounts that contain “-B”:

(.+-B.+)

And enter this into the replace field:

$1,West

Click Replace All again, and our west region has been added:

AccountNumber,Name,PremiumSubscription,Active,Region
1234-123-1234-12-A,Customer One,True,True,East
2234-223-1234-12-B,Customer Two,True,True,West
3234-323-1234-12-A,Customer Three,True,True,East
4234-423-1234-12-A,Customer Four,True,True,East
5234-423-1234-12-B,Customer Five,True,True,West

Now, our CSV is reformatted and meets the requirements of our fictional import utility. It only took a few short minutes to work out the regular expression format and we were able to do this without having to use any special tools, other than Visual Studio Code. Given this example, you can see how much time regular expressions can save.

A few caveats:

  1. There are a few different types of regular expression engines, referred to as “flavors”. The syntax between them varies to some degree.
  2. Make a backup of the original file, or ensure the existence of one, before starting the manipulation phase.
  3. “Find” and “Undo” are your friends. Work through the regular expression incrementally, periodically using Find to make sure it’s matching what you expect. Use Undo if the replacement didn’t do what you expected it to do.
  4. If the data to be manipulated is too large or complex, a custom written transformation utility may be a better choice.

I hope this gave you an idea of how powerful regular expressions are and the types of things they can be used for beyond using them in code. This type of utility purpose, when used in text editors and IDEs, is where regular expressions really shine and understanding how they work can pay dividends in the long run. Once I took the time to learn just the very basics of regular expressions, my productivity improved significantly. If this has inspired you to learn more, I would encourage you to check out the same tutorial I stumbled upon years ago at regular-expression.info , which is still available to this day.

 
 


The content on this blog is for informational and educational purposes only and represents my personal opinions and experience. While I strive to provide accurate and up-to-date information, I make no guarantees regarding the completeness, reliability, or accuracy of the information provided.

By using this website, you acknowledge that any actions you take based on the information provided here are at your own risk. I am not liable for any losses, damages, or issues arising from the use or misuse of the content on this blog.

Please consult a qualified professional or conduct your own research before implementing any solutions or advice mentioned here.