This tutorial will help us see ways we can remove punctuation from a string python.

How do we remove punctuation from a string in python? We might have data in a text file that has a lot of punctuation marks in it. Processing this data might become difficult if we do not remove the punctuation marks from the data.

What is a punctuation character?

A punctuation character is any of the characters in the string below

!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

In this tutorial, you will learn how to remove punctuation from a string in python making use of the following techniques

  • Accumulating characters with a for-loop
  • Making use of the replace() method
  • Using regular expressions
  • With the translate() method

The tutorial will then round up with an example of how to use one of the techniques to remove punctuations from a text file.

Remove Punctuation from String by Accumulating Characters

We will start by writing the code, before explaining what is happening


from string import punctuation as punct

# the string to remove punctuations from
punct_string = "H[e&ll]o W*o(rl)d!"

# variable to hold final string without punctuation
final_string = ""

# loop over the string with punctuation
for char in punct_string:
    if char not in punct:
        # accumulate character if it is not a punctuation character
        final_string += char

# display the results
print("Remove punctuation by accumulating non punctuation characters in a loop:")
print(f"\tBefore removing punctuation : {punct_string}")
print(f"\tAfter removing punctuation : {final_string}")

Output:

Remove punctuation by accumulating non punctuation characters in a loop:
    Before removing punctuation  : H[e&ll]o W*o(rl)d!
    After removing punctuation : Hello World

All the possible punctuations are defined in the string module in a variable called punctuation. The input string is stored in a variable called punct_string. The for-loop iterates over the string, and accumulates the current character into a variable called final_string if the current character is not a punctuation character. As you can see from the output, this technique removes all the punctuation marks from the input string.

Remove Punctuation from String using replace() method

The replace() method is used to replace occurrences of a substring with another string. We want to make use of the replace() method to remove all punctuations in a string by replacing all the punctuations with empty strings.

To demonstrate how to use python’s replace() method to remove punctuations, we will make slight changes to the previous code. This time, the condition in the loop will check if the current character is punctuation. If it is, it calls replace() to replace all occurrences of that punctuation with an empty string.


from string import punctuation as punct

# the string to remove punctuations from
punct_string = "H[e&ll]o W*o(rl)d!"

# variable to hold final string without punctuation
final_string = punct_string

# loop over the string with punctuation
for char in final_string:
    if char in punct:
        # replace all occurences of char
        final_string = final_string.replace(char, "")

# display the results
print("Remove punctuation by using replace() method:")
print(f"\tBefore removing punctuation : {punct_string}")
print(f"\tAfter removing punctuation : {final_string}")

Output:

Remove punctuation by using replace() method:
    Before removing punctuation  : H[e&ll]o W*o(rl)d!
    After removing punctuation : Hello World
 

Remove Punctuation from String using Regular Expressions

The regex module in Python provides search and replace functionalities for strings. We can also remove punctuation characters from a string using the regex module. To carry out this task, we first need to import the regex module, re, then make use of the sub() method in the module. The pattern to match within punct_string should neither be a word character (letters or digits), nor should it be a whitespace character (like spaces or tabs). The next example shows how to specify the pattern, and make use of the sub() method to replace all the punctuations with empty strings.


import re

# the string to remove punctuations from
punct_string = "H[e&ll]o W*o(rl)d!"

# special characters are neither word characters nor are they whitespace characters
pattern = r"[\W\S]"

replacement = ""

# use the sub() method for string substitution
final_string = re.sub(pattern, replacement, punct_string)

# display the results
print("Remove punctuation by using regular expressions:")
print(f"\tBefore removing punctuation : {punct_string}")
print(f"\tAfter removing punctuation : {final_string}")

Output:

Remove punctuation by using regular expressions:
    Before removing punctuation : H[e&ll]o W*o(rl)d!
    After removing punctuation : Hello World

The pattern to search for within punct_string is defined as all characters that are neither word characters, nor whitespace characters. In regex notation, we write it as [\W\S]. The pattern in square brackets is a way of giving the regex engine a choice. We can read the pattern like this; “match characters that are EITHER Not(Word Character) OR Not(Whitespace Character)”. As with the previous examples, the code still carries out the replacements.

Remove punctuations using the translate() method

For this technique of removing punctuations, we will need to make use of the translate() method. The translate() method is part of the string module (which we have been using throughout this tutorial), and thus, is already accessible to us. As the example will illustrate, the translate() method will remove all occurrences of punctuation characters by making use of a translation table.


from string import punctuation as punct

# the string to remove punctuations from
punct_string = "H[e&ll]o W*o(rl)d!"

# create the translation table
translation_table = punct_string.maketrans('', '', punct)

# remove the punctuation using translate()
final_string = punct_string.translate(translation_table)

# display the results
print("Using translate() method:")
print(f"\tBefore removing punctuation : {punct_string}")
print(f"\tAfter removing punctuation : {final_string}")

Output:

Using translate() method:
    Before removing punctuation : H[e&ll]o W*o(rl)d!
    After removing punctuation : Hello World

The maketrans() method creates the translation table needed by the translate() method. The first and second arguments are empty strings since we were not carrying out string replacement. The third argument is a string representing all the characters we want to delete from the string variable, punct_string. In this case, the variable containing all the punctuation characters  (that is, thepunct variable) was passed as the third argument.

Removing punctuation from a text file

Some machine learning tasks involve cleaning up text data before it can become useful for analysis. Removing punctuations from the text data is one of the essential pre-processing tasks that need to be done in order to prepare the text for analysis. This final example shows how to remove punctuations from a text file. The technique used here is the one involving the translate() method.


from string import punctuation as punct

# read the input.txt file into a string variable
input_from_file = open('input.txt', 'r').read()

# display the input
print("input.txt file contents:\n")
print(input_from_file, "\n\n\n")

# create the translation table
translation_table = input_from_file.maketrans('', '', punct)

# remove the punctuation using translate()
final_string = input_from_file.translate(translation_table)

# display the input
print("final string without punctuations: \n")
print(final_string)

Output:

input.txt file contents:

Python is an interpreted high-level general-purpose programming language. 
Its design philosophy emphasizes code readability with its use of significant indentation. 
Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
Python is dynamically-typed and garbage-collected. 
It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. 
It is often described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s, as a successor to the ABC programming language, and first released it in 1991 as Python 0.9.0.
Python 2.0 was released in 2000 and introduced new features, such as list comprehensions and a cycle-detecting garbage collection system (in addition to reference counting). 
Python 3.0 was released in 2008 and was a major revision of the language that is not completely backward-compatible. 
Python 2 was discontinued with version 2.7.18 in 2020. 



final string without punctuations: 

Python is an interpreted highlevel generalpurpose programming language 
Its design philosophy emphasizes code readability with its use of significant indentation 
Its language constructs as well as its objectoriented approach aim to help programmers write clear logical code for small and largescale projects
Python is dynamicallytyped and garbagecollected 
It supports multiple programming paradigms including structured particularly procedural objectoriented and functional programming 
It is often described as a batteries included language due to its comprehensive standard library
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 090
Python 20 was released in 2000 and introduced new features such as list comprehensions and a cycledetecting garbage collection system in addition to reference counting 
Python 30 was released in 2008 and was a major revision of the language that is not completely backwardcompatible 
Python 2 was discontinued with version 2718 in 2020

The text file (input.txt) contains text with a lot of punctuations. After reading the file into memory, the contents (with punctuation) are printed. Then, after using translate() to remove the punctuations, the final result is displayed.

Conclusion

This tutorial showed four different ways to remove punctuation from a string. We considered making use of a for-loop to accumulate non-punctuation characters. Also, we used the replace() method, made use of regexes, and finally, used the translate() method with translation tables. Then, an example of how to remove punctuations from a file was given.

If you need help with python homework or looking for someone who can help with python then feel free to drop us a message.

Like this article? Follow us on Facebook and LinkedIn.