Comp 1314 Data Management 1 Flashcards

(108 cards)

1
Q

What are the three key features of an OS?

A

Multi-user: Many users same system at the same time
Multi-processing: Multiple processors at the same time
Multi-tasking: Multiple processes at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the philosophy of UNIX?

A

Set the cultural norms for minimalistic modular software development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is special about UNIX?

A

Programs can be stringed together but it is secure as programs do not know about each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Linux based on?

A

UNIX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is piping?

A

Redirecting the input and/or output of a program

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the symbol for piping?

A

|
Program1 | Program2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the Input/Output redirection symbol?

A

<: Input
>: Output
»: Append

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List some programs that you can use

A

Head
Tail
sort
wc
uniq
du
xargs
more
cut
find
tar
gzip
nohup
parallel
basename

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the environmental variables?

A

A set of variables that every running process has access to

Set using the export command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you write a for loop in bash?

A

for var in directionory;
do
something
done;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you do a while loop in bash?

A

cat file | while read line;
do
something
done;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does grep do?

A

Searches for the input provided in the text provided

Can even use regular expressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does SED stand for?

A

Text Stream Editor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does SED do?

A

Reads the input provided and modifies it as specified by the command and then writes that to the standard input

sed [options] command [file]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does scp do?

A

Securely copies files from a secured server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does awk do?

A

For processing structured text files into rows and columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are wildcards for?

A

Allows multiple arguments for commands (accessing multiple files becomes easy) using regular expressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the three permission categories?

A

u: users
g: group
o: others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you change the permissions of a file?

A

chmod u+=x
chmod g-w*

You can also specify a decimal number which will be converted into binary and then assigned where each bit is a 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Who can change the permissions of a file?

A

The owner of a file and the superuser

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is everything in UNIX?

A

Either a file or a process, this include directories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can you move a process to the background?

A

bg

Adding a & symbol will begin the process in the background

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the two options with the kill command?

A

SIGTERM: A gentle request to kill, giving the process time to close
SIGKILL: A hard-request with no clean up time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How can you prevent all processes terminating when you log off?

A

With screens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
What is a CSV file?
Comma separate values Easy to manipulate and process but new lines and commas in text can be problematic
25
What does YAML stand for?
Yet Another Markup Language Fixed Row and column format Widely used within config files and passing messages between applications Made of key-value pairs
26
What does JSON stand for?
JavaScript Object Notation
27
What are the benefits of being understandable as a machine?
Searching Aggregation and Summarisation Prediction
28
What is metadata?
Data that is useful for the machine, but not for the human.
29
What is markup?
Contains a bunch of semantic links to other pages.
30
What is SGML?
Standard Generalised Markup Language A superset of all markup language Separate structure from content
31
What is XML?
Extensible Markup Language Designed to carry data, not display data
32
What must XML have?
A root element
33
What is the issue with namespaces?
XML files can reference other XML files and people can define two tags with the same name
34
How is the namespace problem resolved?
Using the special namespace tag to separate the two documents apart.
35
What is a URI?
A unique full name of the namespaces, by convention this is used as the URL but they are not the same.
36
What does a schema enable us to do?
Gives meaning to the structured data we work with, lots of people have already done this. So there are a lot of different schemas
37
Give an example schema
Pan Cheddar
38
How can we specify the number of occurrences in a sequence?
minOccurs maxOccurs
39
What does DTD stand for?
Document Type Definition
40
What is XSD?
XML Schema Definition Every element is either simple or complex
41
How many parsers does HTML5 contain?
2 One which can parse HTML or XML
42
Why is XML not used for large scale solutions?
We could easily loose all of the data, database can mitigate this.
43
What is a DBMS?
Database Management System A collection of software that manages a database
44
Why do we need DBMS?
Handles all the data exchange and creates data independence Applications don't need to care about the database
45
What is logical and physical independence?
Logical: Protect from change in the data structure Physical: Protect from changes in how the data is stored
46
What does the DBMS manage?
The data model Store large amounts of data persistently, conveniently and efficiently Transaction management Concurrency control Access Control Resiliency
47
What is a data model?
A collection of mathematical concepts for describing data Every model has a data language
48
What are the two parts of a data language?
A data definition language A data manipulation language
49
What is XPATH?
The Data manipulation language for XML
50
What properties does a relation have?
Each row represents a k-tuple of R The ordering of rows is immaterial All rows are distinct The order of attributes should not be significant The significance of each column is conveyed by the name we give it.
51
What is a k-ary relation scheme?
A relation name and an ordered sequence of k attributes
52
What is an instance of a relation scheme?
A relation that conforms to the schema Arities match Data types of attributes match
53
What is a key?
A set of attributes where any two different tuples cannot have the same value
54
What is a superkey?
For every relation R, X is a set of attributes of R, X is a superkey of R, if X -> A, for every attribute A of R The set of all attributes will always be a superkey
55
What is a candidate key?
X is a candidate key of R where X is a minimal superkey of R Where X is a superkey and there is no superkey Y such that Y is a subset of X
56
Describe the steps of the closure algorithm
X+ = {Ai : F ⊧ X → Ai } is the closure of X with respect to F * F ⊧ X → Y if and only if Y ⊆X+ INPUT: R, attribute set U, F, X is a subset of U OUTPUT: X+ = {Ai : F⊧X → Ai } Repeat until: X_n+1 == X_n R(ABCDEFG) F = {AB -> CD, C -> EG, D->H} 0. X={A,C} 1. X_0={A,C} only path is to EG 2. X_1={A
57
Why is bad database design a problem?
It can lead to anomalies
58
What is a functional dependency?
When two tuples within a relation agree on the values of A1,...,An then they also agree on B A --> B So the right side cannot change where the left side is the same.
59
What is splitting/combining in Functional dependencies?
We can split bigger functional dependencies into smaller ones and vice versa.
60
What is normalisation for?
Avoiding anomalies
61
Give examples of anomalies
Redundancy Update anomalies Insert anomalies Deletion anomalies
62
What is 1st Normal Form?
A relation that contains only atomic values with no repeating groups
63
What is 2nd Normal Form?
No partial key dependencies Every non-key attributes is dependant on all attributes of all candidates keys
64
What is 3rd Normal Form?
All attributes are determined only by the keys
65
What is transitive dependence?
A -> B -> C but B does not determine A
66
What is Boyce-Codd Normal Form?
Every determinant is a candidate key
67
What are the benefits and drawbacks of BCNF?
Advantages: No redundancy Efficiency No duplication Changes can cascade across relations Disadvantages: More tables More complex More relationships Queries become more complex
68
What is SQL?
Structured Query Language Converts a data model to a physical databases by specifying a DDL and a DML
69
What is SQLite?
All of the database contained within a single file. Simple databases
70
What types does SQL use?
INTEGER REAL TEXT BLOB NULL
71
What joins are possible in SQLite?
LEFT: All of 1 and matches in 2 RIGHT: All of 2 and matches in 1 FULL OUTER: Everything from both
72
What are views?
Virtual tables which have a name and can run queries. Pre-joined for convenience
73
What are indexes?
Data structures associated with tables to support queries, logically ordered by the values of the key.
74
Why are indexes used?
To improve the performance of looking up data
75
How is an index created?
CREATE INDEX ON
76
What does NO SQL stand for?
Not Only SQL Less adherence to ACID and schemas
77
What does NOSQL do?
Storage and retrieval in a non-tabular format More flexible with various types of data.
78
Why is NoSQL more flexible?
As it has no fixed schema and no fields are necessary
79
What is scalability?
Ability of a system to handle increasing amounts of workload or data by adding resources to the system.
80
What is vertical scalability?
Scaling by adding more CPUs to handle increased workload.
81
What is horizontal scalability?
As workload increases, we distribute between multiple nodes and so we add more nodes.
82
What architecture does NoSQL operate as?
Distributed Architecture
83
What is sharding?
Partitioning data across multiple nodes to distribute the workload
84
What does sharding require and allow?
Every nodes is responsible for the data on it, and a request can be handled in parallel. The system will require a shard management system to keep track of shards.
85
What is a shard key?
An attribute used to determine how data is distributed between nodes
86
Why is data replicated between multiple nodes?
To provide fault tolerance and high availability The replication factor determines the number of replicas
87
What is CAP theorem?
Consistency - Every read is the most recent Availability - Every request receives a response without guaranteeing consistency Partition Tolerance - The system continues to operate despite network partitions.
88
What are trade offs with CAP?
You can't have CAP, so we have CP or AP. If down then availability is sacrificed - CP If down then consistency is sacrificed - AP
89
When is CA possible?
In a single-node system or partition free environment.
90
What is BASE?
BA: Basically Available System guarantees availability, will always respond to a request S: Soft State The state of the database changes over time even with no new input E: Eventual Consistency System does not guarantee immediate consistency across all nodes after a write operation, instead it will ensure that id no new updates are made, all nodes will eventually converge to the same state.
91
What are the types of NoSQL database?
Key-value Document Graph Column-family
92
What is key-value style NoSQL?
Storing data as a collection of key-value pairs. Each item has a unique key used to access it. With no schema
93
What is Document style NoSQL?
Semi-structured format, typically in JSON, BSON or XML. Each document is self-contained unit of data that can contain key-value pairs. Allows hierarchical data.
94
What is graph style NoSQL?
Storing data in the form of a graph with nodes and edges. This is good for interconnected data with lots of relationships.
95
What is a column-family style NoSQL?
Organises data into columns rather than rows. Handles large volumes of data with support for distributed architectures, Each row can have a different set of columns
96
What are hybrid NoSQL databases?
Combining features from multiple types to offer flexibility to handle diverse data models.
97
What factors should be considered when picking a type of NoSQL database?
Current Data Application requirements Evaluate NoSQL database types Consider consistency models Access scalability and performance Examine operational conditions Evaluate ecosystem and integration Perform a proof of concept.
98
Explain the basics of using XPATH
// Skip to node / To next node @Attribute [Filter by something] text() Only the text stored within the attribute
99
How do indexes work?
The chosen attribute for the index becomes an index key. The system is implemented as either a binary tree or hash indexes. DBMS plans the best way to execute a query with it's indexes.
100
What is the drawback of indexes?
They slow down INSERT, UPDATE and DELETE queries as they must account for the indexes and update them as well.
101
How do you create views?
CREATE VIEW Name AS SELECT ...
102
What is ACID in relational database transactions?
Atomicity Consistency Isolation Durability
103
What is atomicity in a transaction?
Each statement within a transaction is treated separately. Either the entire statement is executed though, or none of it is.
104
What is consistency in a transaction?
Changes to tables can only happen in predefined, predictable ways.
105
What is isolation in a transaction?
Isolation of transactions between multiple users to ensure that transactions from multiple users don't affect others.
106
What is durability in a transaction?
Ensures that changes to your data model by successful transactions are saved, even in the event of system failure.
107
Give an example of each type of NoSQL database.
Column-family: Cassandra Key-value: Dynamo DB Document: MongoDB Graph: Neo4j