![]() The operator denotes the multiple occurrences of this character class. In this pattern denotes a class of characters from a to z. The pattern will be as follows: words_pattern = ' ' To extract only the names of the fruits/vegetables that were bought, you can create a pattern using the class containing only characters. A good example for this will be if you get a text document containing the names of all the fruits and vegetable along with the quantity in kilogram that a person bought in the following format: text = "\ Banana 1.051 48.25\ Apple 1.024 180.54\ Carrot 0.524 47.20\ Radish 0.251 27.14\ Tomato 0.508 41.05" There are times when you want to extract the words containing only alphabets. Extracting Words Containing only Alphabets If you want to include more cities in your search, you can again include them using the | operator. ![]() All these cases would be captured, as long as the spelling of the city is written correctly. So with this search, it doesn’t matter if the name of the city is written as “mUMBAI”, “MUMBAI”, “CHENNAI” or “cHENNAI” in your document. The backslash \ essentially tells regex to read it as a character without inferencing its special meaning. What if you want to search for occurrence of '|' in your document? Since, '|' serves has an special meaning hence, you need to give it in your pattern with a backslash as \|. So essentially the | is a ‘special character’ telling regex to search for pattern one 'or' pattern two in the provided text. You can simply do this by using | operator to create your pattern: cities_record = 'Chennai|Mumbai' re.findall(cities_record, text, flags=re.IGNORECASE) Now, along with Chennai, you want to extract all occurrences of the city name “Mumbai” from this paragraph of text. On running this code, you will get the following output: You can set its value to 're.IGNORECASE' as follows: cities_record = 'Chennai' re.findall(cities_record, text, flags=re.IGNORECASE)īy setting the flags parameter to re.IGNORECASE, you are telling interpreter to ignore the case while performing the search. So how do you capture 'chennai' too within the one go itself? This gives us an opportunity to introduce you to the third parameter 'flags' of 'findall' method. If you look carefully in the paragraph, you will see that the third time, the name of the city was written as "chennai" with a 'c' in lower case.īy default, regular expressions are case sensitive. Our document had Chennai occurring 4 times though but the list only show 2. Hence, the above code cell will return a list of all the occurrences of the word 'Chennai' in our string and would therefore return the following list:īut wait a second. The method returns all non-overlapping matches of the pattern, which is in cities_record variable, from the second parameter string, which is in variable text in our case, as a list of strings. Here, findall is a method in re that takes two parameters - first the pattern to be searched, in this case it is 'Chennai' and second parameter is the content in string, from which it will search for the pattern. Now, you want to extract all the occurrences of Chennai, for which, you can do something like this: cities_record = 'Chennai' re.findall(cities_record, text) Whereas, it is about 2200 kilometers away from Delhi, the capital of India." By road, Chennai is about 1500 kilometers away from Mumbai. Well chennai is not as large as mumbai which has an area of 603.4 kilometer squares. ![]() ![]() Chennai has an area close to 430 kilometer squares. It’s the capital of the state of Tamil Nadu. Let’s assume that say you have the following text paragraph which describes various cities and you want a list of all occurrences for the particular city. ![]() Using " |" Operator to Extract all Occurrence of Specific Words We will be using the findall function provided in re module throughout this post to solve our problems. We have divided this post into 3 sections that are not strictly related to each other and you could head to any one of them directly to start working, but if you are not familiar with RegEx, we suggest you follow this post in order. To start using Regular Expressions in Python, you need to import Python’s re module. In this post we are focusing on extracting words from strings. Let’s understand how you can use RegEx to solve various problems in text processing. In this post, we will show you how you can use regular expressions in Python to solve certain type of problems.įor going through this post, prior knowledge of regular expressions is not required. Regular Expressions are fast and helps you to avoid using unnecessary loops in your program to match and extract desired information. Regular expression (RegEx) is an extremely powerful tool for processing and extracting character patterns from text. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |