G4. Manipulating data Flashcards

1
Q

Operations that can be applied on top of tabular data structures and what is the result type?

A

Projection
Selection (retrieving a subset of records)
Filter (retrieving a subset of records given a condition).
Result type: DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • Retrieving a subset of columns/attributes
A

Projection~read (muestra todo con lo que voy a trabajar)

Realizar una proyección para seleccionar solo las columnas ‘Nombre’ y ‘Edad’
proyeccion = df[[‘Nombre’, ‘Edad’]]

print(proyeccion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Retrieving a subset of records

A

Selection. (muestra un rango en el que me interesa)

edu.loc[90:94][[‘TIME’,’GEO’]]
Selection=df[df[‘Nombre’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Another way to select a subset of data is by applying Boolean indexing.

A

Filtering (Muestra lo que cumpla con una condición lógica)

edu[edu[‘Value’] > 6.5].tail()
filtered_data = df[df[‘column’] > value][[‘column1’, ‘column2’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Boolean indexes

A

Uses the result of a Boolean operation over the data, returning a mask with True or False for each row. The rows marked True in the mask will be selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(not a number) to represent missing values.

A

NaN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give examples particularly the way null values can be filtered. How does this work in R?

A

edu[edu[“Value”].isnull()].head()
# R Filtra filas sin valores faltantes en una columna específica (por ejemplo, ‘columna1’)
new_data <- original_data[!is.na(original_data$columna1), ]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which is the form of the expressions for adding columns to a DataFrame? and Rows?

A

assign a Seriesto a selection of a column that does not exist.
edu[‘ValueNorm’] = edu[‘Value’]/edu[‘Value’].max()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which is the form of the expressions for adding rows to a DataFrame?

A

This function receives as argument the new row, which is represented as a dictionary where the keys are the name of the columns and the values are the associated value.
edu = edu.append({“TIME”: 2000, “Value”: 5.00, “GEO”: ‘a’},
ignore_index = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can rows or columns be deleted?

A

Now, if we want to remove this column from the DataFrame, we can use the function drop. This removes the indicated rows if axis=0, or the indicated columns if axis=1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Do these operators belong to the data definition or the data manipulation language?

A

Data manipulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can default values be added to attributes containing missing or null values?

A

fillna(), specifying which value has to be used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example of the use of the group() method applied on a DataFrame.

A

group = edu[[“GEO”, “Value”]].groupby(‘GEO’).mean()
group.head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are manipulation operators associated to DataFrames related and useful
for implementing Data Science processes?

A

They provide a powerful and flexible set of tools for data scientists to explore, clean, and analyze data efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly