Chapter 11: Regex Flashcards

1
Q

Code that works when the input data is in a particular format but is prone to breakage if there is some deviation from the correct format. Aka easily broken.

A

brittle code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

regex + and * characters expand outward to match the LARGEST possible string

A

greedy matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A command available in most Unix systems that searches through text files looking for lines that match regular expressions.

A

grep
General Regular Expression Parser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A language for expressing more complex search strings. May contain special characters that indicate that a search only matches at the beginning or end of a line or many other similar capabilities.

A

regular expression
(regex)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A special character that matches any character. In regular expressions it’s the period.

A

wild card

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

regular expression module

A

re
import re

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

regex method that finds a specified regular expression in text, returns match object

A

re.search(regex, search string)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

regex that matches beginning of line

A

’^’
re.search(‘^From:’, line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

regex that matches any character (a wildcard)

A

. (period/full stop)

re.search(‘F..m’, line) = From, Flam, F#om, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

regex that applies to the immediately preceding character(s) and indicates to match zero or more times.

A

*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

regex that applies to the immediately preceding character(s) and indicates to match one or more times.

A

+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

regex method that returns a list of substring(s) that matches a regular expression

A

re.findall(substring, search string)
For loop: [‘substring1’][‘substring2’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

regex that matches non-whitespace character

A

\S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

regex format to accept specific characters

A

’[]’
Set notation

re.findall(‘[a-zA-Z0-9]’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

regex format to match an actual period

A

[.] or \.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When added to a regular expression, they are ignored for the purpose of matching, but allow you to extract a particular subset of the matched string rather than the whole string when using findall().

A

()

re.findall(‘substring(part I want)’, string)

17
Q

technique to insert regex as literal character

A

backslash
\$ = ‘$’

18
Q

regex that anchors to end of line

19
Q

regex that matches a whitespace character

20
Q

regex that applies to the immediately preceding character(s) and indicates to match zero or more times in “non-greedy mode”.

21
Q

regex that applies to the immediately preceding character(s) and indicates to match one or more times in “non-greedy mode”.

22
Q

regex that applies to the immediately preceding character(s) and indicates to match zero or one time.

23
Q

regex that applies to the immediately preceding regular expression and indicates to match zero or one time in “non-greedy mode”.

24
Q

regex that matches a single character as long as that character is in the specified set. In this example, it would match “a”, “e”, “i”, “o”, “u”, or “-“ but no other characters.

A

[aeiou-] or [-aeiou]

25
You can specify ranges of characters using the minus sign. This example is a single character that must be a lowercase letter or a digit.
[a-z0-9]
26
When the first character in the set notation is a caret, it inverts the logic. This example matches a single character that is anything other than an uppercase or lowercase letter.
[^A-Za-z]
27
regex that asserts where a word begins or ends. This means that r'\bat\b' matches 'at', 'at.', '(at)', and 'as at ay' but not 'attempt' or 'atlas'.
\b
28
regex that asserts where a word does NOT begin or end. This means that r'at\B' matches 'athens', 'atom', 'attorney', but not 'at', 'at.', or 'at!'.
\B
29
regex that matches any decimal digit; equivalent to the set [0-9].
\d
30
regex that matches any non-digit character; equivalent to the set [^0-9].
\D
31
In Unix/Linux, command-line program similar to the search() function
Generalized Regular Expression Parser (grep) $ grep '^From:' mbox.short.txt
32
Unix linux regex for non-blank character
[^ ]
33
regex + and * characters expand outward to match the SMALLEST possible string
non-greedy matching
34
Specifies that exactly m copies of the previous regular expression should be matched; fewer matches cause the entire regular expression not to match
{m}
35
Causes the resulting regular expression to greedily match from m to n repetitions of the preceding regular expression.
{m,n}
36
Causes the resulting regular expression to non-greedily match from m to n repetitions of the preceding regular expression
{m,n}?
37
Creates a regular expression that will match either A or B. This operation is never greedy
| cat|dog = cat or dog
38
Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group). (abc)\\1 matches abcabc. In which, (abc) is a capturing group, and \\1 is a backreference that matches the same text as captured by the capturing group, so, \\1 matches abc too.
\number