Week 2: Data Extraction Flashcards

1
Q

How do you get the unique values in a list?

A

list.unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the degree of a relation?

A

The number of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you get column sums in python?

A

df.sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the requirement of attribute values in a relation?

A

attribute values are normally required to be atomic: indivisible (not containing more than one value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 2 ways to create dataframes in python pandas?

A
data = {‘State’: [‘Ohio, ‘Ohio’, ‘Ohio’, ‘Nevada’, ‘Nevada’, ’Nevada'], ‘Year’: [2000, 2001, 2002, 2001, 2002, 2003], ‘Population’: [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]} 
df = pd.DataFrame(data) 
data = [[‘Ohio’, 2000, 1.5], [‘Ohio’, 2001, 1.7], [‘Ohio’, 2002, 3.6], [Nevada, 2001, 2.4], [‘Nevada, 2002, 2.9], [‘Nevada’, 2003, 3.2]] 
cols = [‘State’, ‘Year’, ‘Population’] 
df = pd.DataFrame(data, columns = cols)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you get a vector containing the column names in python?

A

df.columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you add a new column to a dataframe in python?

A

df[‘newcolumn’] = …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you do a left outer join in python?

A

pd.merge(df1, df2, on=‘attribute’, how = ‘left’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you find mismatches of strings?

A

WHERE … NOT LIKE …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you match an exact substring and the beginning of a string?

A

‘Intro%’ #matches any string beginning with “Intro”
‘%Comp%’ #matches any string containing “Comp” as a substring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the distributive laws?

A
A ^ (BvC) = (A^B) v (A^C) 
A v (B^C) = (AvB) ^ (AvC)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the basic form of a select from where statement?

A

SELECT desired attributes
FROM one or more tables
WHERE condition about tuples of the tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you concatenate dataframes row-wise in python?

A

pd.concat([df1, df2])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does 5> unknown return?

A

Unknown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 2 ways to extract a single column by name in python?

A

df.director_name
df[“director_name”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you extract the first 5 rows and column 2 in python?

A

df.iloc[0:5,1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you force the elimination of duplicates in an SQL query?

A

SELECT DISTINCT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the comparison signs in SQL?

A

=, <> (not equal), , <=, >=

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you get the first and last rows of a dataframe in python?

A

df. head(n) #get first n rows
df. tail(n) #get last n rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a tuple comparison?

A

WHERE (instructor.ID, dept_name) = (teaches.ID, ‘Biology’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you display the number of columns in python?

A

print(len(df.columns))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you extract multiple columns by name in python?

A

df[[‘actor’, ‘facebook_likes’, ‘content_rating’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a superkey of R?

A

K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the absorptive laws?

A
A v (A^B) = A 
A ^ (AvB) = A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a self join?

A

Self join is where a table joins with itself on two columns with different names but storing data from the same domain. distinguish copies by renaming the relations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you display the number of rows in python?

A

print(len(df))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the cardinality of a relation?

A

The number of rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you select all attributes in an SQL query?

A

SELECT *

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do you rename an attribute in SQL? (2 ways)

A

using “as”
e.g. SELECT ID, name, salary/12.0 as monthly_salary
FROM instructor
or simply write salary/12.0 monthly_salary
do not need the “as”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What happens to unknown values in the where clause?

A

treated as false if it evaluates to unknown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the relational algebra symbol for renaming?

A

rho (p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the advantages of composing relational algebra operators?

A
  • the output of a relational algebra (RA) is a relation that can be input for another RA operator
  • don’t need heaps of functions for each line of code
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How do you order the results of an SQL query?
What order does it put them in?

A

SELECT *
FROM instructor
WHERE dept_name = ‘Physics’
ORDER BY name
-puts results in ascending order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What aggregate functions can be used in SQL? (5)

A
  • avg: average value
  • min: minimum value
  • max: maximum value
  • sum: sum of values
  • count: number of values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the double negation law?

A

~(~A) = A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the complement laws?

A

Av ~A = T
A^ ~A = F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the 4 boolean laws including true and false

A
A^F = F 
AvF = A 
A^T = A 
AvT = T
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How do you get a true/false vector for whether the value is null in a column in python?

A

df[‘column’].isnull()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the result of SELECT ‘A’ FROM instructor

A

returns a relation with one column and N rows(the number of tuples in instructor) where each row will contain the value ‘A’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How do you delete a row in python?

A

df.drop([‘row1’])

41
Q

What is a left outer join and its code in SQL?

A

All attributes from left table are listed. Matching attributes from the right table are added where they are available, null when not available
SELECT *
FROM course
LEFT OUTER JOIN prereq
ON course.course_id = prereq.course_id

42
Q

How do you add values to the end of a list?

A

list.append(c)

43
Q

How can you do a full outer join in python?

A

pd.merge(df1, df2, on=‘attribute’, how = ‘outer’)

44
Q

When do you use WHERE vs HAVING?

A

Having must be used after a GROUP BY clasue

45
Q

What is a full outer join and its code in SQL?

A

takes all attributes from left and right tables, fills in all matching attributes and puts null where not available
SELECT *
FROM course
FULL OUTER JOIN prereq
ON course.course_id = prereq.course_id

46
Q

How do you find the average salary of instructors in each department

A

SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name

47
Q

What does a cartesian product do?

A

Gets all the tuples from relation R and pair them with all the tuples from relation S, end up with huge table

48
Q

What do you use in the where clause to meet more than one criteria?

A

SELECT ID, name
FROM student
WHERE dept_name = ‘Math’ AND tot_cred >= 24

can also use OR

49
Q

What are the three boolean operators?

A
  • and (^): retrieves records that satisfy both conditions
  • or (v): retrieves records that satisfy one of the conditions
  • not (~): retrieves records that don’t satisfy the condition
50
Q

How do you get the summary statistics of dataframe columns in python?

A

df.describe()

51
Q

What is an SQL query for getting the name of instructors who have the same salary, avoids pairs like (miller, miller) and produces pairs in alphabetical order. From the relation instructor

A

SELECT T.name AS N1, S.name AS N2
FROM instructor AS T, instructor AS S
WHERE T.salary = S.salary AND T.name < S.name

52
Q

What are 3 ways to extract the rows with the condition that content rating = PG and the columns with names facebook likes and budget or index 3 and 5?

A

df.loc[df[‘content_rating’]==‘PG_13’, [‘facebook_likes’, ‘budget’]]
#or
df.iloc[(df[‘content_rating]==‘PG_13’).values,[3,5]]
#or
df[df.content_rating == ‘PG_13’][[‘facebook_likes’, ‘budget’]]

53
Q

How do you match a string in the where clause?

A

WHERE name LIKE ‘%Van der%’

54
Q

How do you use union, intersection and set difference in relational algebra?

A

Can do the union, intersection or set difference (-) of full relational algebra statements - but not include them within the statements

55
Q

What does a natural join do?

A

set of all combinations of tuples in R and S that are equal on their common attribute names. and includes all other attributes from both relations filled in

56
Q

What is a right outer join and its code in SQL?

A

All attributes from the right table are listed. Matching attributes from the left table are added where they are available, null when not available
SELECT *
FROM course
RIGHT OUTER JOIN prereq
ON course.course_id = prereq.course_id

57
Q

What is data manipulation language?

A

Language for accessing and manipulating the data in a model
DML is used for adding, deleting and modifying the records in a database

58
Q

What is data definition language?

A
  • used for defining the database schema
  • DDL is used to create and modify database objects such as tables, indexes and users
59
Q

How do you check for null values using SQL?

A

WHERE salary IS NULL

60
Q

What is the arity of a relation?

A

The number of columns

61
Q

How do you match a string of 3 characters and a string of at least 3 characters?

A

‘___’ #matches any string of exactly three characters
‘___%’ #matches any string of at least three characters

62
Q

What are the 5 basic types in SQL?

A
  • char(n) or character(n): character string with fixed length n
  • varchar(n): variable length character string
  • int or integer: integer of any length
  • numeric(p,d): fixed point number with p digits and d of the digits are to the right of the decimal
  • float(n): floating point number with precision at least n digits
63
Q

What is another way to write
Where dept_name = ‘Comp Sci’ or dept_name = ‘Biology’

A

WHERE dept_name IN (‘Comp sci’, ‘Biology’)

64
Q

How do you concatenate dataframes column-wise in python?

A

pd.concat([df1, df2], axis=1)

65
Q

What is the form of relational algebra statement for the general sql query:
select A1, sum (A3)
from r1, r2
where P
group by A1, A2

A

Pi(A1, sum(A3)) (A1, A2 G sum(A3) (sigma P (r1, r2)))

66
Q

How can you extract row 5 in python by index (2) and rowname (1)?

A

df. iloc[5]
df. iloc[5, :]
df. loc[‘rowname’]

67
Q

What is a candidate key of R?

A

Superkey K is a candidate key if K is minimal

68
Q

How do you do a natural join in SQL?

A

SELECT *
FROM course
NATURAL JOIN prereq
ON course.course_id = prereq.course_id

69
Q

What is the primary key of R?

A

A selected candidate key used to represent the relation

70
Q

What are the unknowns combined with the and operator?

A
  • true AND unknown = unknown
  • false AND unknown = false
  • unknown AND unknown = unknown
71
Q

How do you transpose a dataframe in python?

A

df.T

72
Q

How do you get the index and data types of a dataframe in python?

A

df.info()

73
Q

What are de morgans laws?

A
~(A^B) = ~A v ~B 
~(AvB) = ~A ^ ~B
74
Q

How do you find the difference of two SQL queries?

A

(SELECT course_id FROM section WHERE sem = ‘Fall’ AND year = 2009)
EXCEPT
(SELECT course_id FROM section WHERE sem = ‘Spring’ AND year = 2010)

75
Q

How would you match the string “100%”?

A

LIKE ‘100\%’ ESCAPE ‘'

76
Q

How do you get the minimum of column 1 in python?

A

df.columns[1].min()

77
Q

What type of subqueries can be nested in each of the select, from and where clauses?

A

SELECT A1, A2, …, An
FROM r1, r2, …, rn
WHERE P

as follows:

  • Ai can be replaced by a sub query that generates a single value
  • ri can be replaced by any valid subquery
  • P can be replaced with an expression of the form: B (subquery)
78
Q

How can you extract multiple columns by index in python?

A

df[df.columns[[5,7,9]]][:] #extracts columns 5,7,9

79
Q

What are the 3 main components of data models?

A

-structures: e.g. rows and columns, nodes and edges, key-value pairs
-constraints: e.g. all rows must have same number of columns, age must be positive, c
* operations: e.g. return the values of record x, find rows where the column “name” is “Jordan”

80
Q

What is a foreign key of R?

A

column or set of columns in table whose values correspond to the primary key in another table

81
Q

How can you find the names of all instructors with salary between $90,000 and $100,000 (this includes the values equal to)

A

SELECT name
FROM instructor
WHERE salary BETWEEN 90000 and 100000

82
Q

What are the 3 database levels of abstraction?

A
  • Physical level: describes how a record is stored
  • Logical level: describes what data stored in the database and the relationships among the data e.g. name: string, salary: real
  • View level: application programs which hide details of data types. Views can also hide information for security/privacy purposes
83
Q

What is the relational algebra symbol for grouping and aggregation?

A

G or gamma

84
Q

Code to list all departments along with the number of instructors in each department in SQL? where there is an instructor relation and department relation

A

SELECT dept_name,
(SELECT COUNT(*) FROM instructor WHERE department.dept_name = instructor.dept_name) AS num_instructors
FROM department

85
Q

How do you group by a particular attribute and count how many in each group in python?

A

df.groupby([‘column’]).count()

86
Q

What are the unknowns combined with the or operator?

A
  • unknown OR true = true
  • unknown OR false = unknown
  • unknown OR unknown = unknown
87
Q

What is a subquery in SQL?

A

A subquery is a SELECT-FROM-WHERE expression that is nested within another query

88
Q

What is A^A and AvA?

A
A^A = A 
AvA = A
89
Q

What is a natural join and an inner join?

A

takes only the attributes that are common in both tables
Inner join keeps both the columns in the output even though they will be the same, while natural join will only keep one common column

90
Q

How do you get the count for each column in python?

A

df.count()

91
Q

What is the result in python of adding two dataframes df1+df2

A

Where columns match, it will add the data in that column from each df, e.g. “BIOBIO” because strings will concatenate. Where columns don’t match, will include all columns from both dataframes but entries will all be NaN

92
Q

How do you find the average salary of instructors in each department which has average salary greater than 80000

A

SELECT dept_name, AVG(salary) AS avg_salary
FROM instructor
GROUP BY dept_name
HAVING avg_salary > 80000

93
Q

What is the result of
SELECT *
FROM instructor, teaches

A

generates every possible instructor - teaches pair, with all attributes from both relations/ common attributes in the resulting table are renamed using the relation name
cartesian product

94
Q

What is a query?

A

a statement requesting the retrieval of information

95
Q

What are the 2 parts that a relation is made up of?

A
  • Schema: specifies name of relation, plus name and type of each column
  • Instance: a table with rows and columns. rows = records = tuples, column = attributes = fields = features (instance is a snapshot of the schema at a point in time)
96
Q

How do you do an inner join in SQL?

A

SELECT *
FROM course
JOIN prereq
ON course.course_id = prereq.course_id

INNER JOIN is also the same as JOIN

97
Q

What are the 6 main relational algebra operators?

A

Union (U): total set of records in both relations, only records each record once even if it is in both
intersection: common records
difference (-): records in one table but not the other
selection (sigma): a specific set of rows
projection (pi): a specific set of columns. Duplicate tuples are removed
join: the natural join of two tables

98
Q

How do you delete a column in python? (2 ways)

A

del df[‘column1’]
df.drop(‘column1’, axis=1)

99
Q

What is the result of SELECT ‘542’

A

results in a relation with one column and one row with value ‘542’