Replace Occurrences of a Substring in String with Python
Replacing all or n occurrences of a substring in a given string is a fairly common problem of string manipulation and text processing in general. Luckily, most of these tasks are made easy in Python by its vast array of built-in functions, including this one. Let's say, we have a string that contains the following sentence:
The brown-eyed man drives a brown car.
Our goal is to replace the word
"brown" with the word
The blue-eyed man drives a blue car.
In this article, we'll be using the
replace() function as well as the
subn() functions with patterns to replace all occurrences of a substring from a string.
The simplest way to do this is by using the built-in function -
string.replace(oldStr, newStr, count)
The first two parameters are required, while the third one is optional.
oldStr is the substring we want to replace with the
newStr. What's worth noting is that the function returns a new string, with the performed transformation, without affecting the original one.
Let's give it a try:
string_a = "The brown-eyed man drives a brown car." string_b = string_a.replace("brown", "blue") print(string_a) print(string_b)
We've performed the operation on
string_a, packed the result into
string_b and printed them both.
This code results in:
The brown-eyed man drives a brown car. The blue-eyed man drives a blue car.
Again, the string in memory that
string_a is pointing to remains unchanged. Strings in Python are immutable, which simply means you can't change a string. However, you can re-assign the reference variable to a new value.
To seemingly perform this operation in-place, we can simply re-assign
string_a to itself after the operation:
string_a = string_a.replace("brown", "blue") print(string_a)
Here, the new string generated by the
replace() method is assigned to the
Replace n Occurrences of a Substring
Now, what if we don't wish to change all occurrences of a substring? What if we want to replace the first n?
That's where the third parameter of the
replace() function comes in. It represents the number of substrings that are going to be replaced. The following code only replaces the first occurrence of the word
"brown" with the word
string_a = "The brown-eyed man drives a brown car." string_a = string_a.replace("brown", "blue", 1) print(string_a)
And this prints:
The blue-eyed man drives a brown car.
By default, the third parameter is set to change all occurrences.
Substring Occurrences with Regular Expressions
To escalate the problem even further, let's say we want to not only replace all occurrences of a certain substring, but replace all substrings that fit a certain pattern. Even this can be done with a one-liner, using regular expressions, and the standard library's
Regular expressions are a complex topic with a wide range of use in computer science, so we won't go too much in-depth in this article but if you need a quick start you can check out our guide on Regular Expressions in Python.
In its essence, a regular expression defines a pattern. For example, let's say we have a text about people who own cats and dogs, and we want to change both terms with the word
"pet". First, we need to define a pattern that matches both terms like -
Using the sub() Function
With the pattern sorted out, we're going to use the
re.sub() function which has the following syntax:
re.sub(pattern, repl, string, count, flags)
The first argument is the pattern we're searching for (a string or a
repl is what we're going to insert (can be a string or a function; if it is a string, any backslash escapes in it are processed) and
string is the string we're searching in.
Optional arguments are
flags which indicate how many occurrences need to be replaced and the flags used to process the regular expression, respectively.
If the pattern doesn't match any substring, the original string will be returned unchanged:
import re string_a = re.sub(r'(cat|dog)', 'pet', "Mark owns a dog and Mary owns a cat.") print(string_a)
This code prints:
Mark owns a pet and Mary owns a pet.
Case-Insensitive Pattern Matching
To perform case-insensitive pattern matching, for example, we'll set the flag parameter to
import re string_a = re.sub(r'(cats|dogs)', "Pets", "DoGs are a man's best friend", flags=re.IGNORECASE) print(string_a)
Now any case-combination of
"dogs" will also be included. When matching the pattern against multiple strings, to avoid copying it in multiple places, we can define a
Pattern object. They also have a
sub() function with the syntax:
Pattern.sub(repl, string, count)
Using Pattern Objects
Let's define a
Pattern for cats and dogs and check a couple of sentences:
import re pattern = re.compile(r'(Cats|Dogs)') string_a = pattern.sub("Pets", "Dogs are a man's best friend.") string_b = pattern.sub("Animals", "Cats enjoy sleeping.") print(string_a) print(string_b)
Which gives us the output:
Pets are a man's best friend. Animals enjoy sleeping.
The subn() Function
There's also a
subn() method with the syntax:
re.subn(pattern, repl, string, count, flags)
subn() function returns a tuple with the string and number of matches in the String we've searched:
import re string_a = re.subn(r'(cats|dogs)', 'Pets', "DoGs are a mans best friend", flags=re.IGNORECASE) print(string_a)
The tuple looks like:
('Pets are a mans best friend', 1)
Pattern object contains a similar
Pattern.subn(repl, string, count)
And it's used in a very similar way:
import re pattern = re.compile(r'(Cats|Dogs)') string_a = pattern.subn("Pets", "Dogs are a man's best friend.") string_b = pattern.subn("Animals", "Cats enjoy sleeping.") print(string_a) print(string_b)
This results in:
("Pets are a man's best friend.", 1) ('Animals enjoy sleeping.', 1)
Python offers easy and simple functions for string handling. The easiest way to replace all occurrences of a given substring in a string is to use the
If needed, the standard library's
re module provides a more diverse toolset that can be used for more niche problems like finding patterns and case-insensitive searches.