Searches are a fundamental part of life on the web. It is such an essential feature some people don’t even think about the “search” in the engine anymore. Ponder for a second the fortune Google has amassed, all from the basic need to quickly scan the ocean of data we call the Internet. While you probably won’t challenge Google’s dominance any time soon, it shouldn’t deter you from learning about regular expressions and how they can help you no matter what coding language you prefer.
According to Wikipedia,
A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern. Usually such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. (https://en.wikipedia.org/wiki/Regular_expression)
Regular expressions are used in search engines, word processors, text editors, and a whole slew of other applications. Their inherent usefulness has resulted in Regex being accessible by most programming languages whether built-in or through libraries.
How it works
To get the most out of Regex, it is important to know how it works. Basically, a pointer moves progressively through the search string. When it comes across a character which matches the beginning of the regular expression it stops. Then a second pointer is started and moves forward character by character checking with each step if the pattern still matches. If we get to the end of the pattern and it still holds true then we have found a match. If it fails at any point then the second pointer is discarded and the main pointer continues through the string. Let’s look at some examples to illustrate the process.
Below are a few samples of how Regex analyzes strings for patterns.
Matching a specific sequence of characters.
Here I would like to check for every instance of :
Freedom isn’t free. It’s a hefty fricking fee.
You will notice it found two examples of “f” followed by “r”. Why didn’t it trigger on the word “Freedom”? Regex is specifically looking for a lowercase “f” as per your instructions so it won’t recognize “F” as part of the query.
Hint: You can make a query case-insensitive by adding /i to the end.
Searching for any character
Metacharacters are characters that have a special meaning during pattern processing. The dot is a basic example of this. It allows for any character to be an acceptable match to the query. Here I am searching for “b”, followed by any character, followed by “t”:
Baggins begged for his butter but instead I battled the bastard.
We can look for multiple “any characters” but stacking the dots.
Baggins begged for his bagel but instead I blessed his burger.
Metacharacters allow for flexible search options and are the backbone for which Regex is structured.
Other Simple Metacharacters
Let’s see how other metacharacters can be used to search for specific instances.
Search the beginning of a string with ^
This might be that. That isn’t this. That could be this. This is not that.
Search the end of a string with $
Are you ready Freddy? Freddy said he’s ready. That punctuation ruined ready
Exact String Match
^Love Me Tender$
I want you to love me tender. Love Me Tender is song. I think Elvis sang Love Me Tender
As you can see, regular expressions are integral to exploring data. There are so many ways to query for patterns. Above are just a few examples to get you started. Below are a few helpful links to help you get into the intermediate and more advanced techniques. Given Regex’s usefulness and availability, there is no excuse to ignore one of the most essential programming tools. Happy coding!
Official Regex docs https://regexr.com/
Regex Cheatsheet https://www.rexegg.com/regex-quickstart.html