Categories:Viewed: 86 - Published at: 6 months ago

Introduction

When working with strings or large amounts of text, you are probably going to encounter situations where you need to count how many times a specific substring occurred within another string.

In this article, we'll take a look at how to use JavaScript to count the number of substring occurrences in a string. We will look at the various approaches and methods for obtaining that number.

But before we begin, let's first define what a substring is.

What Is a Substring?

A substring is a clearly defined sequence of consecutive characters in a string. For example, if we have the string "My name is John Doe", then "name is" is a substring, but "is name" is not because it is no longer a consecutive sequence (we've changed the order of words). Individual words such as "is" and "name" are always substrings.

Note: "y name is Jo" is a valid substring of the "My name is John Doe" as well. In other words, substrings are not always whole words, they can be much less readable.

There are many ways to accomplish this in JavaScript, but two major methods are the split() method and regular expressions.

Count the Number of Substrings in String With split() Method

The split() is a JavaScript method for splitting strings into an array of substrings while preserving the original string. This method accepts a separator and separates a string based on it. If no separator is supplied, the split() returns an array with only one element - the original string.

Note: Probably the most obvious example of the separator is the blank space. When you provide it as a separator for the split() method, the original string will be sliced up whenever a blank space occurs. Therefore, the split() method will return an array of individual words from the original string.

In this article, we'll use one handy trick to get the number of occurrences of a substring in a string. We'll set the substring to be the separator in the split() method. That way, we can extract the number of occurrences of the substring from the array that the split() method returned:

let myString = "John Doe has 5 oranges while Jane Doe has only 2 oranges, Jane gave Mike 1 of her orange so she is now left with only 1 Orange.";
let mySubString = "orange";

let count = myString.split(mySubString).length - 1;
console.log(count); // 3

The code above returned 3, but the myString has only one instance of the string "orange". Let's inspect what happened by examining the array created after we've split the original string with the "orange" as the separator:

console.log(myString.split(mySubString));

This will give us:

['John Doe has 5 ', 's which Jane Doe has only 2 ', 's, Jane gave Mike 1 of her ', ' so she is now left with only 1 Orange.']

Essentially, the split() method removed all occurrences of the string "orange" from the original string and sliced it in those places where the substring was removed.

Note: Notice how that applies to the string "oranges" - the "orange" is its substring, therefore, split() removes "orange" and leaves us only with "s".

Since we've found three occurrences of the string "orange", the original string was sliced in three places - therefore we've produced four substrings. That's why we need to subtract 1 from the array length when we calculate the number of occurrences of the substring. That's all good, but there is one more orange in the original string - the last word is "Orange". Why haven't we counted it in the previous example? That's because the split() method is case-sensitive, therefore it considers "orange" and "Orange" as different elements. If you need to make your code case-insensitive, a good solution would be to first convert the entire string and substring to a particular text case before checking for occurrences:

let myString = "John Doe has 5 oranges while Jane Doe has only 2 oranges, Jane gave Mike 1 of her orange so she is now left with only 1 Orange.";
let mySubString = "ORANGE";

let myStringLC = myString.toLowerCase();
let mySubStringLC = mySubString.toLowerCase();

let count = myStringLC.split(mySubStringLC).length - 1;
console.log(); // 4

Additionally, the one last thing we could do is to make our code reusable by wrapping it witha a function:

const countOccurence = (string, word) => {
    let stringLC = string.toLowerCase();
    let wordLC = word.toLowerCase();

    let count = stringLC.split(wordLC).length - 1;

    return count
};

Count the Number of Substrings in String With RegEx

Another method for counting the number of occurrences is to use regular expressions (RegEx). They are patterns of characters used to search, match, and validate strings. Probably the most common use case for regular expressions is form validation - checking whether the string is a (valid) email, a phone number, etc. But in this article, we'll use it to count the number of occurrences of a substring in a string.

If you want to get to know more about regular expressions in JavaScript, you should read our comprehensive guide - "Guide to Regular Expressions and Matching Strings in JavaScript".

First of all, we need to define a regular expression that will match the substring we are looking for. Assuming we want to find the number of occurrences of the string "orange" in a larger string, our regular expression will look as follows:

let regex = /orange/gi;

In JavaScript, we write a regular expression pattern between two forward slashes - /pattern/. Optionally, after the second forward slash, you can put a list of flags - special characters used to alternate the default behavior when matching patterns. For example, by default, regular expressions match only the first occurrence of the pattern in a search string. Also, matching is case-sensitive, which is maybe not what we want when searching for substrings. Because of that, we'll introduce two flags we'll be using for the purpose of this article:

  • g - makes sure that we get all occurrences of the pattern (not just the first one)
  • i - makes sure that matching is case-insensitive
Note: Based on your needs, you can choose what flags you will use. These are not mandatory.

Now, let's use a previously created regular expression to count the number of occurrences of the string "orange" in the myString:

let myString = "John Doe has 5 oranges while Jane Doe has only 2 oranges, Jane gave Mike 1 of her orange so she is now left with only 1 Orange.";

let regex = /orange/gi;
let count = (myString.match(regex) || []).length;

console.log(count); // 4
Note: We've added || [] in returns an empty array if there is no match. Therefore, the number of occurrences will be set to 0.

Alternatively, we can use the RegExp() constructor to create a regular expression. It accepts a search pattern as the first argument, and flags as the second:

let myString = "John Doe has 5 oranges while Jane Doe has only 2 oranges, Jane gave Mike 1 of her orange so she is now left with only 1 Orange.";

let regex = new RegExp("orange", "gi");
let count = (myString.match(regex) || []).length;

console.log(count); // 4

Additionally, we can make make this a reusable by wrapping it in a separete function:

let countOcurrences = (str, word) => {
    var regex = new RegExp(word, "gi");
    let count = (str.match(regex) || []).length;
    return count;
};

Strict Matching Exact Phrases

Sometimes, you want to match for a strict phrase or word - so that "oranges" isn't included in your counts, or any word that includes "orange" in itself, but isn't strictly "orange". This is a more specific use case of searching for strings within strings, and is fortunately fairly easy!

let regex = /\Worange\W/gi;

By wrapping our term within \W \W, we're matching strictly for "orange" (case-insensitive) and this regex would match only twice in our sentence (both "oranges" aren't matched).

Benchmarking Performance

When we run both methods using the JS Benchmark, the split method will always come out faster than the regex method, though this is not really noticeable even for fairly large text corpora. You'll probably be fine using either.

Note: Do not rely on these benchmarks as your final decision. Instead, test them out yourself to determine which one is the best fit for your specific use case.

Conclusion

In this article, we learned about two standard methods for calculating the number of occurrences of substrings in a string. We also benchmarked the results, noting that it doesn't really matter which approach you take as long as it works for you.

Reference: stackabuse.com

TAGS :