2 Python Regular Expressions Flashcards

(48 cards)

1
Q

Regex Syntax Overview

A

. (dot) → Matches any character except a newline

^ → Matches the beginning of the string

$ → Matches the end of the string

\b

\d → Matches any digit (0-9)

\w → Matches any word character (letters, digits, and underscores)

\s → Matches any whitespace character (spaces, tabs, newlines)

  • → Zero or more occurrences of the preceding element

+ → One or more occurrences of the preceding element

? → Zero or one occurrence of the preceding element

\ Signals a special sequence (can also be used to escape special characters)

| Either or

[ ] → Matches any one of the characters inside the brackets

{} Exactly the specified number of occurrences

() Capture and group

→ OR (alternation) operator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RegEx Functions

A

findall Returns a list containing all matches
search Returns a Match object if there is a
match anywhere in the string
split Returns a list where the string has
been split at each match
sub Replaces one or many matches with a
string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Match 3-letter words like cat, bat, rat

A

. (dot) in regex — matches any one character (except newline)

import re

text = “cat bat rat mat flat”
matches = re.findall(r”.at”, text)
print(matches)

Output:
[‘cat’, ‘bat’, ‘rat’, ‘mat’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Find any 2-letter pattern end with a

A

text = “ba be bi bo bu ka ke ki ko ku”
matches = re.findall(r”.a”, text)
print(matches)

Output:
[‘ba’, ‘ka’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Find phone numbers like 9X9 pattern

A

text = “Call me at 909 or 929 or 939”
matches = re.findall(r”9.9”, text)
print(matches)

Output:
[‘909’, ‘929’, ‘939’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Match any character followed by ing (like sing, ring, etc.)

A

text = “sing ring ping king string”
matches = re.findall(r”.ing”, text)
print(matches)

Output:
[‘sing’, ‘ring’, ‘ping’, ‘king’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Extract all numbers from a sentence.

A

import re

text = “The address is 1234 Main Street, 5th floor.”
numbers = re.findall(r”\d+”, text)
print(numbers)

Output:
[‘1234’, ‘5’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

^ (Caret)

A

^ pattern checks if that pattern is right at the start.
Use re.match() or re.findall(…, re.MULTILINE) depending on single vs multi-line text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to check if a sentence starts with “Hello”?

A

import re

text = “Hello world, NLP is powerful.”
result = re.match(r”^Hello”, text)
print(bool(result)) # True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

From a multi-line string, get lines that start with a digit.

A

import re

text = “"”123 data points collected
Text starts here
56 more samples”””

matches = re.findall(r”^(\d+.*)”, text, re.MULTILINE)
print(matches)

Output:
[‘123 data points collected’, ‘56 more samples’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Get lines where the first character is uppercase.

A

import re

text = “"”apple is healthy
Banana is yellow
grapes are green
Mango is sweet”””

matches = re.findall(r”^[A-Z].*”, text, re.MULTILINE)
print(matches)

Output:
[‘Banana is yellow’, ‘Mango is sweet’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Validate if a sentence begins with a specific phrase

A

import re

sentence = “The model achieved high accuracy.”
if re.match(r”^The model”, sentence):
print(“✅ Starts with ‘The model’”)
else:
print(“❌ Doesn’t start with ‘The model’”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Filter words starting with a specific letter (in list)

A

import re

words = [“Apple”, “Banana”, “Avocado”, “Mango”, “Apricot”]

for word in words:
if re.match(r”^A”, word):
print(word)

Output:
Apple
Avocado
Apricot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

$

A

$ – Matches only if the pattern appears at the end of the string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does the sentence end with “India”?

A

import re

text = “I love my country India”
match = re.search(r”India$”, text)
print(bool(match))

Output:
True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Check if a file ends with “.txt”

A

text = “notes_backup_2025.txt”
match = re.search(r”.txt$”, text)
print(bool(match))

Output:
True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Check if a sentence ends with a number

A

text = “Token count is 99”
match = re.search(r”\d+$”, text)
print(bool(match))

Output:
True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Check if a line ends with punctuation (like .)

A

text = “This is a full sentence.”
match = re.search(r”.$”, text)
print(bool(match))

Output:
True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Filter strings that end with a specific word (like ‘done’)

A

text = “Task completed and marked done”
match = re.search(r”done$”, text)
print(bool(match))

Output:
True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can I find all vowels in a sentence?

A

import re

text = “NLP is fun and powerful!”
vowels = re.findall(r”[aeiouAEIOU]”, text)
print(vowels)

Output:
[‘I’, ‘u’, ‘a’, ‘o’, ‘e’, ‘u’]

21
Q

How do I find all punctuation marks like ., !, or ,?

A

import re

text = “Hello, world! NLP is amazing.”
punct = re.findall(r”[.,!]”, text)
print(punct)

Output:
[’,’, ‘!’, ‘.’]

22
Q

How can I find all characters that are NOT vowels?

A

Use [^aeiouAEIOU] — the ^ inside brackets means NOT.

import re

text = “ChatGPT”
non_vowels = re.findall(r”[^aeiouAEIOU]”, text)
print(non_vowels)

Output:
[‘C’, ‘h’, ‘t’, ‘G’, ‘P’, ‘T’]

23
Q

How do I match only digits from 1 to 5?

A

Use [1-5] to match any single digit from 1 through 5.

import re

text = “My scores: 1, 2, 5, 7, 9”
result = re.findall(r”[1-5]”, text)
print(result)

Output:
[‘1’, ‘2’, ‘5’]

24
Q

How do I match all lowercase letters?

A

Use [a-z] to match any lowercase letter.

import re

text = “Deep Learning!”
lowers = re.findall(r”[a-z]”, text)
print(lowers)

Output:
[‘e’, ‘e’, ‘p’, ‘e’, ‘a’, ‘r’, ‘n’, ‘i’, ‘n’, ‘g’]

25
How can I match alphanumeric characters only (both letters and digits)?
import re text = "User123@ai.com" alphanumeric = re.findall(r"[a-zA-Z0-9]", text) print(alphanumeric) Output: ['U', 's', 'e', 'r', '1', '2', '3', 'a', 'i', 'c', 'o', 'm']
26
How can I extract all numbers from a sentence?
Use \d+ with re.findall() to find all digit sequences (like "123", "99"). import re text = "I have 2 cats, 1 dog, and 12 fishes." numbers = re.findall(r"\d+", text) print(numbers) Output: ['2', '1', '12']
27
How can I remove all digits from a string?
import re text = "Version 2.0 released on 2025-05-01" cleaned = re.sub(r"\d", "", text) print(cleaned)
28
How can I count how many digits are in a sentence?
import re text = "Room 401 has 3 chairs and 2 tables." digits = re.findall(r"\d", text) print("Total digits:", len(digits)+1) Output: Total digits: 5
29
How to extract only the years from a sentence?
import re text = "We launched in 2019, updated in 2021, and expanded in 2023." years = re.findall(r"\d{4}", text) print(years) Output: ['2019', '2021', '2023'] \d{4} matches exactly four digits (common year format).
30
How can I extract words starting with a followed by any characters?
import re text = "apple ant banana anchor ball" matches = re.findall(r"a\w*", text) print(matches) Output: ['apple', 'ant', 'anchor'] a\w* matches a followed by zero or more word characters (so "a", "ap", "apple" etc.).
31
How can I match everything between and ?
import re text = "Hello NLP! Data Science" matches = re.findall(r".*?", text) print(matches) Output: ['Hello NLP!', 'Data Science'] .*? is non-greedy: it matches the shortest content between two tags.
32
How can I match the word “color” whether it’s written as “color” or “colour”?
import re text = "color or colour — both are valid." matches = re.findall(r"colou?r", text) print(matches) Output: ['color', 'colour'] u? means: the letter "u" may or may not be there.
33
import re text = "My numbers are +91-9876543210 and 9876543210" matches = re.findall(r"(\+91-)?\d{10}", text) print(matches) Output: ['+91-', ''] matches = re.findall(r"\+91-\d{10}|\d{10}", text) print(matches) Output: ['+91-9876543210', '9876543210']
34
What does () mean in regex?
It groups patterns together. It can capture matched text (used in findall(), finditer(), etc.). You can apply quantifiers (*, +, ?, etc.) to the entire group.
35
How can I extract the country code and number separately from +91-9876543210?
import re text = "+91-9876543210" match = re.search(r"(\+91-)(\d{10})", text) print(match.groups()) Output: ('+91-', '9876543210')
36
How do I match “ha” repeated 3 times like "hahaha"?
import re text = "hahaha hehe" match = re.search(r"(ha){3}", text) print(match.group()) Output: 'hahaha'
37
What does \b mean?
\b matches the position between a word character ([a-zA-Z0-9_]) and a non-word character (like space, punctuation, or line start/end). It does not match any character, just a position. Think of it like: "start of a word" → \bword "end of a word" → word\b "exact word" → \bword\b Pattern Meaning \bword\b Match exact word “word” \bun\w+ Words starting with “un” \w+ed\b Words ending in “ed” \b\w+\b Match complete words (tokenizing) \b\w{3}\b Match all 3-letter words
38
How do I find the word cat but not scatter or catalog?
import re text = "I saw a cat. It was not in scatter or catalog." matches = re.findall(r"\bcat\b", text) print(matches) Output: ['cat'] Only exact cat is matched. scatter and catalog are ignored.
39
What does \s match?
A space A tab (\t) A newline (\n) A carriage return (\r) Basically, any whitespace (invisible spacing)
40
How can I find all the spaces in a sentence?
import re text = "Natural Language Processing" matches = re.findall(r"\s", text) print(matches) [' ', ' ']
41
How can I remove all extra spaces, tabs, or newlines from a sentence?
text = "Hello \tWorld \n NLP " cleaned = re.sub(r"\s+", " ", text).strip() print(cleaned) Output: 'Hello World NLP' \s+ removes all extra spacing and replaces with just one space.
42
What does \w match?
Any letter → A to Z, a to z Any digit → 0 to 9 The underscore → _ ⚠️ Note: It does not match special characters or spaces.
43
How can I find all the word characters in a sentence?
import re text = "NLP_101 is #awesome!" matches = re.findall(r"\w", text) print(matches) Output: ['N', 'L', 'P', '_', '1', '0', '1', 'i', 's', 'a', 'w', 'e', 's', 'o', 'm', 'e']
44
How can I extract full words made of word characters only?
text = "AI_2025 is great!" words = re.findall(r"\w+", text) print(words) Output:['AI_2025', 'is', 'great']
45
How to remove everything that's not a word character (like punctuation)?
text = "AI is amazing! #future-ready_2025" cleaned = re.sub(r"[^\w\s]", "", text) print(cleaned) Output: 'AI is amazing futureready_2025'
46
How many letters, digits, or underscores are in the string?
text = "Email: ai_dev2025@example.com" count = len(re.findall(r"\w", text)) print(count) Output: 22
47
\A
Match at the start import re re.search(r'\AData', 'Data Science is powerful') # ✅ Match: 'Data' No match if not at start re.search(r'\AScience', 'Data Science is cool') # ❌ No match Even with multiline, still only matches start of full string text = '''AI is rising Machine learning is growing''' re.search(r'\AAI', text) # ✅ Match: 'AI'
48
\Z
Match at the end re.search(r'end\Z', 'This is the real end') # ✅ Match: 'end' No match if not at end re.search(r'world\Z', 'Hello world!\nGoodbye world') # ❌ No match Match at true end even with newlines text = 'Durga is learning\nRegex is fun\nEND' re.search(r'END\Z', text) # ✅ Match: 'END' Tip: When dealing with full documents, .strip() helps clean trailing spaces/newlines before using \Z.