Lesson 2: Basic data exploration Flashcards
(5 cards)
P—– is the primary tool data scientists use for exploring and manipulating data
Pandas
The most important part of the Pandas library is the D—-F—
DataFrame
A DataFrame holds the type of data you might think of as a table. This is similar to a sheet in Excel, or a table in a SQL database.
How would you get the data in the file path “path-houses” into a DataFrame called “df-houses”
df-houses = pandas.read_csv(path-houses)
How can you get a summary of the data held in the “df-houses” DataFrame
df-houses.describe()
What is standard deviation?
Step 1: Find the mean.
Step 2: For each data point, find the square of its distance to the mean.
Step 3: Sum the values from Step 2.
Step 4: Divide by the number of data points.
Step 5: Take the square root.