Data Management Flashcards

(209 cards)

1
Q

Operating System

A

Intermediary between software and hardware, managing hardware allocation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

UNIX Philosophy

A
  1. Each program does 1 thing well
  2. Output of every program expected to be input of another
  3. Try software early, expect wasted effort
  4. Use tools to help program over unskilled help
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Linux Benefits

A
  1. Largely virus free as limited user system access + not many viruses are made for Linux
  2. Kernel separate from rest of OS preventing bugs elsewhere in OS from crashing whole system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Index Node (inode)

A

Describes a file-system object (file/directory). Stores attributes and location of data (metadata).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Inode number

A

References an inode. Associated with a file object name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Separating metadata benefits.

A
  1. Allows for fast moving of files
  2. Can alter file while opened by another applicaiton.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

pwd

A

Gives the current absolute path.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ls

A

Lists the files at the current location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

cd

A

Move Directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Meaning of UNIX files starting with a dot?

A

They are hidden.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

man

A

man [cmd] - Gives help with command.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mkdir

A

mkdir [directory name] - Creates a directory(folder)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

rmdir

A

rmdir [directory name] - removes a directory. Must be empty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

touch

A

touch [filename] - creates empty file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

cat

A

displays the contents of entire file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

less

A

Displays part of file allowing for forwards and backwards movement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

head

UNIX

A

Top (default) 10 lines of file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

tail

A

Bottom (default) 10 lines of file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

program to program piping

A

program1 | program2 (program 1 output goes to program2 input)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

program to file piping

A

program > file - program output written to file. > > used to append (no space)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

file to program piping

A

program < file - program takes input from file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

filter

A

Program that accepts text and changes it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

pipe

A

Connection between two filters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

wc

A

Prints the number of lines, words and characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
uniq
Removes duplicate adjacent lines
26
Version Control System
Records changes to files over time so they can be undone and viewed
27
Local VCS
Database on your computer holds all the changes. Does not allow collaboration
28
Centralised VCS
Single server stores changes, with users checking out files. Allows for collaboration but single point of failure
29
Distributed VCS
Changes (repository) stored in server and locally, with changes copied to each other.
30
VCS add
put file in local repository
31
VCS commit
commit changes to file to local repository
32
VCS commit message
Message describing a commit
33
VCS check in/push
Upload local repository content to remote repository
34
VCS check out/pull
Download file from remote repository
35
Conflict
When changes made cannot be merged automatically, resolved by manually applying changes to latest version
36
reverse integration
Copies new features from a branch to main. keeps new code out of main
37
Forward Integration
Copies latest changes from main to branch, keeping branch up to date
38
ps
Views current processes
39
top
CPU usage of processes
40
kill | UNIX
kill PID is process ID options: SIGTERM - requests process to stop, time for graceful shutdown SIGKILL - forces process to stop execution
41
bg/fg | UNIX
Moves a process to the background/foregorund
42
screen | UNIX
Allows for creation of screens to run processes in the background
43
Environment Variables
Accessible by all processes run in the shell
44
PATH
Ordered List of directories that store executables to be run
45
export
Sets environment variables export variableName='value'. Gives all environment variables if no argument
46
grep
Searches for lines containg the given input. grep [pattern] [input]
47
Special Characters | Regular Expressions (grep)
\* - zero or more of previous ? - zero or one of previous + - one or more of previous . - wildcard [] - range of characters
48
sed
Takes in text and modifies it. sed [command] [file]. commands e.g. 's/Hello/Hi/g' (replaces first Hello on each line with Hi, g means it affects every instance on a line), '/Run/d' (deletes all lines that contain Run)
49
awk
Allows for processing of tables. awk [pattern] {action}. By default actions are run on every line. $number used to give column
50
BEGIN | for awk
action run once at the start
51
END | for awk
action run once at the end
52
applying conditions | for awk
(condition){action}, action only run on lines that meet condition
53
LaTeX benefits over Word
1. Easily allows for displaying of complex equations 2. Can compile large documents easily 3. Placement of figures and tables is easy 4. Automatic referencing 5. OS independant
54
Creating LaTeX documents
Typed as a .tex file and the LaTeX engine compiles to .pdf
55
Math Mode | LaTeX
Open and close with $. Allows for mathematical symbols
56
Wildcards | UNIX
Allows for operating on multiple files at once \* Any characters ? Any singular character [] One character out of those given
57
chmod | UNIX
Changes permssions for files/directories. If using number each number represents 3 digit binary
58
ls -l permission information
first column shows permission infromation as 10 character string. First character shows directory/file, then split into 3 character chunks for each accessor (owner, group and other). The three characters represent whether the file/directory is readable/writable/executable.
59
Directory Permissions | UNIX
Executable directories can be opened
60
TSV
Tab Seperated values, form of structured data
61
Benefits of machine readable data
* Searching * aggregation and summation * Prediction * Linking - links info from different sources
62
Relationship Modelling
A way to make human data machine readable. Involves creating a model that shows relationships between elements
63
Hierarchical | Ralationship Modelling
Entities are connected with each other and attributes in a tree structure
64
Network | Relationship Modelling
Entities and attributes are connected in a directional graph.
65
Object Oriented | Relationship Modelling
Entities and attributes are connected in a directional graph. Additionally, Entities are classes, with attributes values coming with a pointer to the attribute
66
Entity Relational | Relationship Modelling
Entites are now tables. They are linked by a key property
67
Markup
Addition of metadata to document. Allows for structure and additional meaning to text. Allows for machine to gleam meaning from text
68
YAML
Uses whitespace for structure to allow for easy reading but harder writing. Written with key-value pairs, i.e. variableName: data
69
Uses of YAML
* Config Files * Passing data between application * Storing simple application states
70
Issues with YAML
The syntax can be ambiguous, so may get different results with different parsers (code that splits up text). Not widely used
71
JSON
Stores objects. Subset of YAML. Can be read by most languages. Contains: * Objects * Values - "object": "value" * Lists - "object": [value1, value2]
72
JSON uses
Sending data on the web/between programs. Sometimes used for config data
73
HTML
A markup language used for documents with hypertext (links). Tags say how to display data i.e. TEXT HERE <\text\>
74
Liquid | structured data
Markup language for Shopify
75
SGML | Markup
Standard Generalised Markup Language - A standard for defining markup languages. Super set of all markup languages e.g. XML, HTML
76
SGML issues | Markup
* complex * no strict structure * Requires a definition of structure
77
XML benefits | Markup
* Easier to parse * Simplifies SGML * Don't need to define structure
78
XML | Markup
eXtensible Markup Language - Hierarchical with only tags, attributes and content. Made to carry data not display data. No defined tags
79
XML Syntax | Markup
Defines how it is written: * closing tags for all tags * case sensitive * must have root element * attributes are quoted
80
Schema | XML
Used as a template to ensure an XML file is written in a certain way
81
SimpleType | XML element
Only contains text
82
ComplexType | XML element
Can contain attributes and children
83
Namespaces | XML
Gives a prefix to tags with the same name, allowing for distinguishing between tags with the same name. xmlns:\="someurl". Then all tags in the namespace have \:tag
84
XML + CSS
Can apply a stylesheet to add presentation to the XML file
85
XPath
XML path language. The way to query data from an XML file
86
/path/to/element | XPath
Gives the node at the path given
87
// | XPath
Gets all nodes with that match
88
* | XPath
Matches any node
89
@attribute | XPath
Gets nodes with the given attribute. i.e. //tag[@attribute = 'value']
90
text() | XPath
Gets the text directly inside of the node
91
node[n] | XPath
Gets the nth occurence of that node
92
node[last()] | XPath
Gets the last occurence of that node
93
node1[node2] | XPath
Gets node1s that contain a node2
94
node1[@attribute="value"] | XPath
Gets node1s with the given attribute being equal to the given value
95
.. | XPath
Gets the parent of the current node
96
and/or/not | XPath
used in square brackets, also use brackets
97
| | XPath
Gets both the nodes returned from the query on the left and right
98
contains(location, "text") | XPath
Used in square brackets, true for nodes containing the given text in the given location. location examples: @attribute, text()
99
string-length(string) | XPath
Gets the length of the given string
100
starts-with(location, "text")
True if the location (attribute/text) starts with the text.
101
ends-with(string, "text")
True if the string ends with the text.
102
DBMS
Database Managment System - Collection of software that manages a database
103
Data Independance
Idea that other applications and users should be insulated from data structure (logical independace) and storage (physical independance).
104
Logical Independance
Protection from changes to logical structure of data i.e. the schema of it
105
Physical independance
Protection from changes to physical storage of data i.e. whether it's stored on a hard drive
106
Data Model
Defines how data is represented, organised and structured. Relational model is most widely used. Has a data language containing DDL and DML
107
Data Language
used to modify and retrieve data. Contains data definition language (DDL) and data manipulation langage (DML)
108
DDL
Data definition language - syntax for describing database templates. Includes creating tables and defining keys e.g. XML schema
109
DML
Data manipulation language - used for querying data e.g. XPath
110
Relation | Relational Database Model
A table, formally defined in set theory R ⊆ P(S1 × S2)
111
k-tuple | relations
ordered sequence of k elements
112
k-ary relation | relations
unordered subset of cartesion product of k sets(attributes). Contains many k-tuples. An instance of a k-ary relation schema
113
k-ary relation schema | relations
An ordered sequence of k-attributes. A template for a k-ary relation
114
relational database schema | relations
set of relation schemas
115
relational database instance | relations
set of relations, each of which being an instance of a relation schema. Called a database
116
Intension | relations
schemas. Changes rarely
117
Extension | relations
instances i.e. relations
118
Key
A set of attributes which is unique for all tuples. Can be made by combining attributes
119
Functional Dependancy
A relation r satisfies a functional dependancy A -> B if all tuples in r with the same value for attributes in A have the same value for attributes in B. Allows deduction of value of B for a given value of A
120
Determinant | Functional Dependancy
The set of attributes on the left hand side of a functional dependancy
121
Dependant Set | Functional Dependancy
The set of attributes on the right hand side of a functional dependancy
122
Splitting/Combining Rule | Functional Dependacy
A -> B is equivalent to A -> every element of B
123
Trivial Dependancy | Functional Dependancy
If B ⊆ A then A -> B | Obvious as e.g. if height and weight known, then height known
124
Implication | Functional Dependancy
Some functional dependancies S ⊨ (implies) A -> B if every relation instance that satisfies S also satisfies A -> B. A relation that fits the requirements for all of S also follows A -> B
125
Equivalence | Functional Dependancy
S is equivalent to T if S ⊨ T and T ⊨ S
126
superkey
If the attribute(s) are the determinant for every attribute. The set of all attributes is always a superkey
127
Surrogate Key | Database
Uniquely identifies each attribute, created for this purpose
128
Candidate Key
A superkey that has no other superkey included in it. e.g. if {height, weight} and {height} were superkeys, only {height} would be a candidate key
129
Closure Algorithm | Functional Dependancy
Determines if a set of attributes is a superkey. Steps: 1. Get a dependant set that can be reached with the current attributes 2. If there are no new attributes in any dependant sets, the original attributes are not a superkey so stop 3. Else, union the current attributes with the dependant set 4. Repeat 1-3 until the current attributes is the set of all attributes
130
Poor Relation | Functional Dependancy
Poor relation if X -> A and X is not a super key as it can lead to redundant data
131
Anomalies | Databases
Issues that can happen in a bad database: * Redundancy - same data in multiple places * Updates - updates can cause data to be inconsistent * Inserts - Forced to fill in extra irrelevant attributes * Deletion - Extra data that wasn't intended to be deleted deleted
132
0th Normal Form | Databases
Unormalised, all data stored in one table
133
1st Normal Form | Databases
* Cannot have multiple values for one attribute * Cannot have the same attribute in multiple columns
134
Minimal Set of Functional Dependancies
* Each FD has 1 attribute on the right hand side * Minimal amount of attributes on the left hand side * No redundant FDs, i.e. implied by other FDs
135
Partial Key Dependancy | Normalisation
Where an attribute only depends on part of any candidate key
136
2nd Normal Form | Databases
* 1st Normal Form * All attributes not part of any candidate key are dependant on all parts of all candidate keys
137
1st Normal Form Creation | Databases
Take the illegal data and place it in a new table using the key
138
2nd Normal Form Creation | Databases
If values (not part of any candidate keys) only depend on (need to be determined) part of the candidate keys then split into tables with the key being the depended on attributes
139
3rd Normal Form | Databases
* 2nd Normal Form * All non-key attributes are only determined by the keys, not anything else
140
3rd Normal Form Creation | Databases
If a non-superkey attribute depends on non-super key attribute(s) place the depended on attribute as the key for a new table with the attributes that depend on it
141
Boyce-Codd Normal Form | Databases
* 3rd Normal Form * Every determinant is a candidate key
142
Boyce-Codd Normal Form Creation | Databases
If a functional dependancy exists with the determinant being non-key, a new table is made with the key being the non-key determinant
143
Normalisation Benefits
* Less Redundancy, so less storage * More efficient to query * No duplication so no inconsistency
144
Relation | Databases
A collection of tuples. Visually represented as a table
145
Relationship | Databases
A link between relations
146
Conceptual Modelling | Databases
Type of modelling that Identifies entity names and relationships, sometimes attributes. Made from requirements directly. No database design.
147
Logical Modelling | Databases
Type of modelling that identifies attributes and attribute types(e.g. int)
148
Physical Modelling | Databases
Type of modelling Aiming to represent database structure. Has actual tables and attributes. Implements relationships, i.e. keys, join tables, indexes
149
SQLite Benifits
* Serverless * No configuration
150
SQLite issues
* Not multi-user * No concurrency * Just a file
151
creating a table | SQL
CREATE TABLE table ( column TYPE (NOT) NULL, ... PRIMARY KEY (column,...) )
152
types | SQL
INTEGER REAL TEXT BLOB - any uninterpretable data, e.g. image NULL
153
Renaming a table | SQL
ALTER TABLE oldTable RENAME TO newTable
154
Deleting a table | SQL
DROP TABLE table
155
Retrieving data | SQL
Done with SELECT column,... followed by FROM tables and then any number of optional further constraints.
156
ORDER BY | SQL
ORDER BY column ASC/DESC
157
LIMIT | SQL
LIMIT x. Result only shows the frist x results
158
WHERE | SQL
WHERE condition. Result only shows the rows that match the given condition. At most basic level is column = value. Can use AND, OR and NOT, comparison(>), !=.
159
LIKE | SQL
Used in a WHERE condition when using % is a string as a wildcard. e.g. WHERE id LIKE '%1%'
160
(NOT) IN | SQL
Used in a WHERE condition when checking if the column values is in a given list. e.g. WHERE id NOT IN (1,3,5)
161
JOIN | SQL
Allows for selecting data from multiple tables. Done with: JOIN table2 ON table1.foreignkey=table2.primarykey. If primary key composite must use ON for every column in the primary key
162
LEFT JOIN | SQL
Returns all combined rows and the rows from the first table that do not match a row in the second table. e.g. first table people and second table banks, if a person has a bank not in the second table still display that persons row
163
using multiple tables in SQL statement | SQL
must reference columns by their table. e.g. table1.column
164
AS | SQL renaming
SELECT table.column1 AS column1, table2.column2 AS column2. Allows for renaming of columns for a query
165
INSERT | SQL
INSERT INTO table (column,...) VALUES (value,...) Adds an entry to the table. The list of columns can be omitted if the entry has values for all columns. Can use a SELECT query in place of VALUES
166
View | SQL
Acts as a virtual table. Is a query allowing for data to be seen from tables in a specific way but does not store any data it self
167
View Commands | SQL
CREATE VIEW view AS SELECT ... DROP VIEW view
168
UPDATE | SQL
Specifies new values for columns in the table. UPDATE table SET column=value... WHERE conditions WHERE statement determines which rows the changed will be applied to but is optional
169
DELETE | SQL
DELETE FROM table WHERE conditions WHERE statement determines which rows will be deleted but is optional
170
functions | SQL
Allows for applying many different opertaions to a database. Replaces a column after SELECT.i.e. SELECT function(column) FROM table
171
GROUP BY | SQL
Allows for function output for rows with the same given column value to have the function applied to separately. e.g. SELECT student, avg(mark) FROM scores GROUP BY student gives the average mark of each student rather than the total average mark
172
defining foreign keys | SQL
In CREATE TABLE: FOREIGN KEY (column) REFERENCES table(column) This ensures that any values in the current table in the foreign key column is in the other table
173
Ensuring Referential Integrity | SQL
Add actions when doing CREATE TABLE after a FOREIGN KEY ON DELETE action - happens when parent record deleted ON UPDATE action - happens when parent key updated
174
actions | SQL foreign key referencing
CASCADE - delete/update happens to current table RESTRICT - prevent delete/update SET DEFAULT - change value to default SET NULL - change value to NULL
175
Index | SQL
Data structure associated with a table to improve query speed. Ordered by value of a key accessed regularly (think of it being stored in a binary tree). Increases table modifaction time
176
CREATE | SQL Index
CREATE (UNIQUE) INDEX index ON table(column,...) UNIQUE ensures only 1 entry of each value will be in the index, is optional
177
Relational Databases Limitations
* Inflexible - changing requirements are tough and e.g. lists are hard to do * ACID costs - limits performance and scalability * JOINs complexity - creates complex queries * Structure issues - Optional data makes bad tables
178
Vertical Scaling | Databases
Add more capacity to the server, not great as has a limit and is expensive. Relational databases have to mainly use this
179
Horizontal Scaling | Databases
* Replication - duplicate data to be stored on many server, expensive and has synchronisation issues * Partitioning - store parts of the database on many server, prevents joins across tabls in different servers * Both not good
180
NoSQL | Databases
Not Only SQL. Can use SQL languages on their databases. Less strict schemas.
181
Document | Type Of NoSQL
Typically JSON/BSON. Schema free (mostly?). Structured as a tree with nodes being documents
182
Key-Value | Type Of NoSQL
Contains keys that are linked to values. Often contains partition keys, splits data into partitions, and sort key, gives a single entry
183
Column | Type Of NoSQL
A row represents an entry but only has columns where there is data for it. Like RDBMS and Key-Value
184
Graph | Type Of NoSQL
Entries represented as nodes with edges connexting the data together
185
Sharding | NoSQL
Distributing data across multiple nodes, allows for easy scaling
186
Hot Partition | NoSQL
When one databse is overloaded with traffic while others are underused
187
Consistency Levels | Databses
strict/strong - all reads must wait for writes to be consistent sequential - all writes happen in order causal - opertations that can change the outcome of each other happen in order eventual - data will eventually be stored correctly
188
CAP | databases
consistency - all users see the same data availabilty - all users can always get a response partition tolerance - the database works if communication breaks between nodes
189
CAP theorem | databases
For databases with data split across multiple servers only 2 of the three attributes of CAP can be met
190
NoSQL Advantages
* fast and simple * Flexible structure choices * Easily scale horizontally infinately
191
NoSQL Issues
* Design has to be done right early * Can only access data in the way it's designed * Functions are very difficult to use * Changes later on get very expensive
192
MongoDB | NoSQL
Document-based using JSON/BSON documents. The documents are key value pairs, but MongoDB is not a key-value type database. Can have schema
193
MongoDB relationships | NoSQL
Can add sections to certain documents referencing another document
194
MongoDB Indexes | NoSQL
Needed to allow for searching by certain fields and/or sorting by them
195
DynamoDB | NoSQL
AWS key-value based serverless NoSQL database. Everything stored in 1 table
196
DynamoDB issues | NoSQL
* Only allows basic lookups * The data has to be modelled specifically for this purpose, so difficult to leave
197
DynamoDB secondary indexes | NoSQL
Replace the partition and/or sort key automatically with a secondary index allowing for entries to be searched by some of it's data.
198
DTD | XML
Document Type Definition - Defines the structure of an XML document. (kinda like a schema)
199
XSD | XML
XML Schema Definition - Using XML defines a structure for an XML document.
200
XML Suite | XML
Syntax - How XML is written Namespaces - Gives IDs to tags to make them unique Schema - Defines how an XML document is structured XPATH - Finds data in an XML document
201
Schema Benefits | XML
* Ensures valid file * Can be used as template * Identifies errors * Eases Parsing
202
Normalisation Issues
* More Complex * More tables/relationships * Longer Queries
203
Denormalisation
Improves query speed by adding data to uneeded location but increases operation complexity
204
Identifying/non-identifying | Databases
When a foreign key is/is not part of the primary key of a child table
205
Cardinality | Databases
Maximum number of times an entity can be related to another entity
206
Modality | Databases
Minimum number of times an entity can be rated to another
207
Turning on foreign key constraints | SQL
PRAGMA foreign_keys = ON; Ensures that values in a foreign key must be in the table that it is referencing
208
OS Potential Features
* Multi-user - Allows for multiple people to use a computer concurrently/at different times * Multi-processing - Utilizing multiple processors * Multi-tasking - Running multiple processes at the same time
209
~ | UNIX
Used in a file path to go to the home directory e.g. ~/downloads