Chapter 7 – Pattern Matching with Regular Expressions Flashcards Preview

Automate The Boring Stuff python > Chapter 7 – Pattern Matching with Regular Expressions > Flashcards

Flashcards in Chapter 7 – Pattern Matching with Regular Expressions Deck (23):
1

Let's say we codes a function to check if a string was a phone number



def isPhoneNumber(text):
➊ if len(text) != 12:
return False
for i in range(0, 3):
➋ if not text[i].isdecimal():
return False
➌ if text[3] != '-':
return False
for i in range(4, 7):
➍ if not text[i].isdecimal():
return False
➎ if text[7] != '-':
return False
for i in range(8, 12):
➏ if not text[i].isdecimal():
return False
➐ return True

2

And then we wanted to have it read any string to find phone numbers

message = 'Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.'
for i in range(len(message)):
➊ chunk = message[i:i+12]
➋ if isPhoneNumber(chunk):
print('Phone number found: ' + chunk)
print('Done')

3

Now let's do that same function to search for phone numbers in one step
Using regular expression

\d\d\d-\d\d\d-\d\d\d\d

It's called regex
It would look for 3 integers followed by a dash followed 3 more a dash as four more

4

{ } and regex

\d{3}-\d{3}-\d{4}

Is the same as the

\d\d\d-\d\d\d-\d\d\d\d

5

What module is used for regular expressions
Regex

Import re

6

re.compile()
Why would we use raw

>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d)
Tune a sting value into a regex

So we don't have to escape every backslash which is common in regex

7

\d

Digit

8

search()

>>> phoneNumRegex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d')
>>> mo = phoneNumRegex.search('My number is 415-555-4242.')
>>> print('Phone number found: ' + mo.group())
Phone number found: 415-555-4242

9

Search()

A Regex object’s search() method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object

10

What does search() return

Match objects have a group() method that will return the actual matched text from the searched string.

11

Group()
Once you have searched for and found your match(Ed) object...you use


This method to takes out the group that is in the argument. In this case there is no argument so it takes out everything

>>> print('Phone number found: ' + mo.group())
Phone number found: 415-555-4242

12

Grouping with parentheses
\d\d\d-\d\d\d-\d\d\d\d

(\d\d\d)-(\d\d\d-\d\d\d\d)

> phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My number is 415-555-4242.')
>>> mo.group(1)
'415'
>>> mo.group(2)
'555-4242'

13

Group method for multiple groups

groups()
notice the PLURAL
Returns a tuple

>>> mo.groups()
('415', '555-4242')
>>> areaCode, mainNumber = mo.groups()
>>> print(areaCode)
415
>>> print(mainNumber)
555-4242

14

What if your group regex has a parenthesis

You have to escape it \(



>>> phoneNumRegex = re.compile(r'(\(\d\d\d\)) (\d\d\d-\d\d\d\d)')
>>> mo = phoneNumRegex.search('My phone number is (415) 555-4242.')
>>> mo.group(1)
'(415)'
>>> mo.group(2)
'555-4242'

15

What is a pipe

|
That is

r'Batman|Tina Fey'
will match either 'Batman' or 'Tina Fey'.

What if they are both in it?
Returns first occurrence

16

Quote about regular expressions

“Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you’re a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through.”[1]

17

What if you wanted to search for a string with mtiplea of the same base
Like Spiderman apiderwoman and spidercar
String="spidercar has lost a wheel"

Spiderreg=re.compile(r'spider(man|woman|car)')

Mo=Spiderreg.search(String)

Mo.group()
Spidercar

18

Search a part of a pipe group

>>> batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
>>> mo = batRegex.search('Batmobile lost a wheel')
>>> mo.group()
'Batmobile'
>>> mo.group(1)
'mobile'

19

Basic pipe

>>> heroRegex = re.compile (r'Batman|Tina Fey')
>>> mo1 = heroRegex.search('Batman and Tina Fey.')
>>> mo1.group()
'Batman'

20

>>> batRegex = re.compile(r'Bat(man|mobile|copter|bat)')
>>> mo = batRegex.search('Batmobile lost a wheel')
>>> mo.group()
'Batmobile'
>>> mo.group(1)
'mobile'

The method call mo.group() returns the full matched text 'Batmobile', while mo.group(1) returns just the part of the matched text inside the first parentheses group, 'mobile'.

21

What if there is an optional thing you want to look for

Use ?

Match zero or one of the group preceding this question mark.”

22

Using ? To search for batman or (optionally) bat woman

batRegex = re.compile(r'Bat(wo)?man')
>>> mo1 = batRegex.search('The Adventures of Batman')
>>> mo1.group()
'Batman'

>>> mo2 = batRegex.search('The Adventures of Batwoman')
>>> mo2.group()
'Batwoman'

23

What if you want to search for a phone number that is listed as with or without and area code

>>> phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
>>> mo1 = phoneRegex.search('My number is 415-555-4242')
>>> mo1.group()
'415-555-4242'

>>> mo2 = phoneRegex.search('My number is 555-4242')
>>> mo2.group()
'555-4242'