Midterm Flashcards
1
Q
python language
A
- Simple
- Intuitive and Open Source
- Scripting Language Interpreted
- Multi-platform
- Multi-paradigm
2
Q
A
3
Q
interpreted language
A
- no cross-compiling
- computational costs
- multi platform
4
Q
module
A
- A module is a collection of functions that can be imported in other programs
5
Q
Strongly typed language
A
the language has strong constraints on the data type
6
Q
Dynamically typed language
A
the type can be changed
7
Q
Python interpreter
A
- An interpreted language at runt-time
- Code -> python code syntax checker translator -> (input ->) python execution environment (python virtual machine PVM) -> output
- Will work independently from the platform
8
Q
random function
A
- Generate pseudo-random numbers
- The seed allows the identification of a sequence (every time the function is called it returns the next random value)
- Two execution with the same seed will return the same sequence
- It is possible to generate real numbers, random.random() or integer numbers, random.randint(from,to) - (from,to) define the range)
9
Q
for loop example print 1 to 9
A
10
Q
information systems
A
- Collection, Storage, and processing of data
- Include a computer system for task automation
- A database management system (DBMS) is a software to support data management.
11
Q
data modeling
A
- The process of creating a data model for an information system
- Necessary to confer to data a structure useful in the context of a specific business process
- It can have several abstraction layers
- conceptual (drawing) -> logic (deatils of tables) -> physical (mySQL)
12
Q
Entity-Relationship model
A
- The Entity-Relationship model is an abstract model to represent the structure of data for a business process
- Typically used of relational databases
- Can represent concepts and relations among them
13
Q
entities
A
object categories with shared properties
14
Q
1.
relationships
A
connections among entities.
15
Q
many-to-many relationship
A
16
Q
one-to-many relationship
A
17
Q
one-to-one relationship
A
18
Q
attributes
A
- An entity always has a primary key (black bullet)
- Surrogate Key
- Composed Key
- Foreign Key
19
Q
Logic data Model
A
A logic data model allows the translation of a conceptual models into data structures
20
Q
SQL
A
- The Structure Query Language is the most used language for querying relational databases
- Three sub-languages:
– DDL (Data Definition Language)
– DML (Data Manipulation Language)
– DCL (Data Control Language)
21
Q
Query Language: DDL
A
- Define metadata of structures and objects in a data base
- Useful in the design phase
- Create and manipulate a schema
22
Q
MySQL
A
- An open-source DBMS An open-source DBMS
- Community Edition is open-source
- Community Edition is open-source Community Edition in the Ubuntu Repository Community Edition in the Ubuntu Repository
23
Q
MySQL -Workbench
A
- The Workbench handles connection towards running MySQL Instances
- Once a connection is activated, it allows the management of the database
- Once a connection is activated, it allows the management of the database.
24
Q
MySQL -Schema
A
- A schema is a collection of tables
- In MySQL schema and database are synonyms
25
Indexing
* An additional data structure designed to get fastest query response times
* Different type of indexing: BTREE, HASH
26
Binary tree
A self-balancing tree data structure (generalization of binary search tree)
In a B-tree of order n: Every node has at most n children. Every internal node except the root has at least n/2 children. Every non-leaf node has at least 2 children. All leaves appear on the same level and carry no information. A non-leaf node with k children contains k−1 keys.
27
Strcutured data
* prefiend schema
28
Unstructured data
* Unstructured data have no referring schema and cannot be queried easily
* Examples: Text Files, Media data, Emails
29
Semi-structured data
* Semi-structured data impose some form of structure Less constraints
* Tags can be used to define a partial structure– XML– JSON– CSV
30
Pandas objects
Three main data structures:
* Series
* DataFrame
* Index
31
Pandas Series
* Indexes can be customized
* Element are, then, accessible as in NumPy arrays
* Can be used as a dictionary mapping typed key to typed values
* Typing allows for high performance
32
Pandas dataframe
* A DataFrame can be seen as an enhanced twodimensional array
* Both row indexes and column names are flexible
* A sequence of Series sharing the same indexes
* Can be created from loops, arrays
* Columns of a DataFrame can be accessed in different ways: As Individual Series, As a multi-dimensional array
33
Access rows in indexed data
* Using custom indexes (the one defined by the data structure) --> loc
* Using native indexes (the implicit ones from underying arrays) --> iloc
* can also be used defining ranges [from:to]
34