Cleaning Text Data Flashcards

1
Q

What is text data?

A

Text data is one of the most common types of data types. Examples of it range from names, phone numbers, addresses, emails and more. Common text data problems include handling inconsistencies, making sure text data is of a certain length, typos and others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you fix columns with different str values?

A

dataframe[‘column’]=dataframe[‘column’].str.replace(‘current string’, ‘new string’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how do you fix columns with different length values?

A
  1. variable= dataframe[‘column’].str.len()

2. dataframe.loc[variable (< or >) Number, ‘column’] = np.nan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the .any()?

A

function returns True if any item in an iterable are true, otherwise it returns False. If the iterable object is empty, the any() function will return False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the .contain()?

A

the function is used to test if a pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly