Which is a limit of relational databases?
Joins can be costly.
Which type of NoSQL database is based on sets of nodes and edges between nodes?
Graph database
Which is not an advantage of NoSQL databases?
Support for joins
T/F. Expressiveness of query language is important to data science tasks.
TRUE
T/F. Data quality should be taken into account during data preparation.
TRUE
What technique is used to visualize correlations between two variables?
Scatter plots
Which is not a type of data set used in data science model building?
What markup language can be used for sharing models between tools?
Predictive Modelling Markup Language
CSV/TSV
JSON
Python
MongoDB
cursor
a pointer to a record - helpful when an entire result set can’t fit into memory
outlier
find a dirty data set
Documents consist of keys and values.
TRUE
In JSON structures, keys are usually what data type?
String
csv.DictReader Python function can be used to read tabular data into a Python dictionary.
TRUE
Which driver is used in Python to work with MongoDB?
pymongo
When converting data to MongoDB format, we can use Python dictionary functions.
TRUE
Data type conversions rarely need to be performed during data loads.
FALSE
In MongoDB, what can we create to reduce the time needed to find a document based on the value of an attribute?
Index
Data frames provide what type of abstract structure?
Table
Embedded documents are used to avoid joins.
TRUE
What is the top level data structure in Cassandra called?
Keyspace
Duplicating data in wide column databases is uncommon.
FALSE
Which command is used to specify the nodes in a cluster to connect to?
Cluster
One way to catch errors when preparing data is to use which Python statement?
Except
Spark and Cassandra can run on the same cluster nodes.
TRUE
A Cassandra data model should be based on how you will query the database.
TRUE
Graphs consists of rows and tables
FALSE
Directed edges are not used in hierarchical relations.
FALSE
Using separate files for nodes and edges can simplify data loading.
TRUE
What driver is used with Python to work with Neo4j databases?
Py2Neo
What data structure can be used to map from nodes and edges to a table data structure?
DataFrame
Graphs are especially useful for modeling networks like social networks and road systems.
True