Section 5 - Data Analysis and visualization with Python Flashcards

(14 cards)

1
Q

What is Kaggle?

A

Website with many datasets. Regularly used by Data Scientists and Students.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the full form of CSV?

A

Comma Separated Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is loc method in pandas?

A

Loc method is used to retrieve rows from a dataframe that match a certain condition.
eg :- us_babies.loc[use_babies[‘Year’] ==2014, :]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write a sort_values statement.

A

us_babies_2014.sort_values(‘Count’,ascending = False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is iloc statment?

A

An iloc statement is used to retrieve rows which fall under a range. For example the first 5 rows.
eg- dataframe.iloc[0:5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you check for null values in a dataset?

A

dataset.isnull()
Returns the dataset with True or False values in all cells. True indicates that the cell has null value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a panda series?

A

A column in a panda dataframe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does dataset[series name].unique() do? Provide one application.

A

Returns an array of unique values in a series.
Eg - crime[offense].unique()

It can be used to check for misspelling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some good practices for data cleaning?

A

Data cleaning decides the quality of data analysis making it an essential step.

  1. Note all changes which have been done to the series to keep track during data analysis.
  2. Use caution when using series not cleaned during analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Using seaborn create a bar graph displaying the number of airbnb listings in each neighbourhood group of New York.

Dataframe is called “listing”
Coloumn for neighbourhood group is called “neighbourhood_group”
Table consists of rows of airbnb listings with a coloumn for neighbourhood_group.

A

sn.countplot(x = “neighbourhood_group”, data = listing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Use seaborn to create a bar graph with x axis as “neighbourhood_group” and y axis as “price”.

A

sn.barplot(x = “neighbourhood_group”, y = “price”, data = listings)

Price will be the average price since there are multiple enteries for each neighbourhood group with different prices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain a histogram in simple terms.

A

A histogram is the representation of a distribution of data, i.e., the data is divided into multiple sets and the amount in each set is displayed. X axis - distributed sets of data. Y axis - Quanity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Make a scatterplot with matplotlib.plt

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Make a histogram with matplotlib.plt

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly