Big data & Databases Flashcards

DVA260 Exam (43 cards)

1
Q

(cruical)What is a container pod?

A

wrapper (folder) that groups one or more containers together so they can share the same environment and resources. Kubernetes specific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(cruical) What is a container Image?

A

container Images are read-only templates containing instructions for creating a container. A Docker image creates containers to run on the Docker platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a container replica?

A

A clone of a container pod to ensure backup of an important container in a cluster. Controllers monitor and insert or delete replicas with help of “replica sets”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(cruical) What is MongoDB and when is it used?

A

MongoDB is a NOSQL database that stores multiple JSON files which makes searching for data easy. Used when handling unstructured and semi-structured data since fields may vary from different JSON files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(cruical) What is Neo4j and when is it used?

A

Neo4j is a graph database, ideal for data with rich relationships, such as social networks, recommendation systems, and even esports ecosystems. A node is a person or thing, a relationship shows how nodes are connected (same school, esport team, company)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(cruical) What is Memcached?

A

Memcached is a memory caching system used to enhance the performance and scalability of web applications by reducing the load on databases. It stores frequently accessed data in memory, allowing for faster retrieval compared to traditional storage methods like disk-based databases. When a user request data the systems checks Memcached first. It can be vulnerable to attacks if not inside a protected network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(cruical)What is Saltzer and shroader design principle?

A

Saltzer and Schroeder’s design principles are foundational guidelines for building secure computer systems. 8 principles:
Least Privilege – Give only the access needed.
Fail-Safe Defaults – Deny by default, allow explicitly.
Economy of Mechanism – Keep it simple.
Complete Mediation – Check every access, every time.
Open Design – Don’t rely on secrecy of design.
Separation of Privilege – Require multiple conditions for access.
Least Common Mechanism – Minimize shared components.
Psychological Acceptability – Make security easy to use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

(important) What is virtualization?

A

Virtualization is the partitioning of physical resources by a hypervisor to create multiple isolated virtual machines (VMs), each acting like a full computer with its own OS. These VMs only access the resources allocated to them. This enables vertical scaling by maximizing resource utilization on a single physical server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

(important) What is containerization?

A

technology that packages an application and all its dependencies into a single, isolated, and portable environment called a container. This allows applications to run consistently across different infrastructures without compatibility issues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(important) What is enabler software?

A

A software that helps other softwares to run: OS, Hypervisor, API, docker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

(important) What is VM migration?

A

VM/Live migration is the process of moving a running virtual machine (VM) from one physical host to another, usually without shutting it down. Reasons for doing this: Maintenance - move off from inactive server so vm can keep running, Load balancing - moving vm to other server that is full. Low downtime and high flexibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(important) What is VM scaling?

A

VM scaling means adjusting the resources (CPU, RAM, storage) or number of virtual machines to meet performance needs. 2 types: Vertical scaling and horizontal scaling. it’s used when: Traffic spikes -Handle increased user load,
Cost control - Scale down during low-usage periods.

It’s flexible but not all apps scale well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(important) What is semi structured data?

A

Semi-structured data has some organization, like tags or key-value pairs, but doesn’t follow a strict table structure like SQL databases. Can have nested or inconsistent fields. Uses tags, keys, or labels to identify data. Great for web data and API, bad for querying.

Examples: XML, JSON, YAML, NO-SQL data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

(important) What is structured data?

A

Data that have common identifiers (relational), that can be put in tables together by SQL, MySQL, PostgreSQL, Oracle, SQL Server. Used when strict schema is needed, easy to validate and fast searching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(important) What is unstructured data?

A

Unstructured data is information that doesn’t follow a predefined format or schema. It can’t be stored in rows and columns like in a traditional database. This makes it hard to query and search. It’s stored in noSQL databases or HDFS (Hadoop distrubuted file systems).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How and why is the cloud a good security control?

A

The cloud is a good security control because it offers strong built-in protections like encryption, identity management, and continuous monitoring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

GDPR compliance for cookies and data privacy

A

Under GDPR, websites must ensure transparent and informed consent for cookies and data collection. This means:

  1. No cookies before consent (except strictly necessary ones).
  2. Clear opt-in: Users must actively agree (no pre-ticked boxes).
  3. Specific consent: Users must know what data is collected, why, and by whom.
  4. Easy withdrawal: Users must be able to change or withdraw consent anytime.
  5. Data privacy: Collected data must be stored securely and used lawfully.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is fog computing?

A

Fog computing pushes computing closer to where data is created — for faster response, better efficiency, and less reliance on centralized cloud data centers.

these computing nodes can be gateways or routers

19
Q

what is edge computing?

A

Computing closest to the data source, this is good for fast response time but hard to manage and updating. These computations on edge are made by actuators, raspberry pi

20
Q

what is cloud computing?

A

Cloud computing is the on-demand delivery of computing services (servers, storage, software) over the internet. It allows users to access and utilize these resources without directly managing the underlying infrastructure. It operates using a pay-as-you-go business model.

21
Q

What is a cloud data center?

A

A cloud data center is a large facility filled with servers, storage systems, and networking equipment that runs cloud services and stores data for users and businesses — all managed by a cloud provider like Amazon (AWS), Google (GCP), or Microsoft (Azure).

Thousands of physical servers
Massive storage systems
High-speed networking
Cooling systems
Power backup (generators, batteries)
Security systems (both physical and digital)

22
Q

major characteristics of a data center

A

7 major characteristics

Reliable and redundant infrastructure,
Robust security measures,
Effective cooling systems:
Flexible, scalable design:
Wide connectivity:
Disaster recovery plan:
Data management:

23
Q

Key challenge in esuring GDPR i cloud services?

A

The main challenge is maintaining control and accountability over personal data when it’s stored, processed, or transferred in cloud environments — especially across borders.

What makes it so hard: Data location & sovereignty, Accountability & transparency,

24
Q

What is a hypervisor?

A

The hypervisor is a hardware virtualization technique that allows multiple guest operating systems (OS) to run on a single host system at the same time. A hypervisor is sometimes also called a virtual machine manager(VMM).

25
What services does cloud computing offer?
Software-as-a-Service (SaaS): delivers software applications. Platform-as-a-Service (PaaS): offers a platform for developing and managing software applications without managing the underlying infrastructure. Infrastructure-as-a-Service (IaaS): provides access to fundamental computing infrastructure such as servers, storage, and networking.
26
Cloud computing model limitations?
1. Public Cloud Less security and privacy, Limited customization Compliance challenges 2. Private Cloud High cost Scalability limits Management complexity 3. Hybrid Cloud Complex integration Security management Data transfer issues 4. Community Cloud Limited availability Shared cost and governance Scalability challenges
27
3 main components of cloud data center
COMPUTE: Runs applications and processes data. STORAGE: Stores data, backups, and system images. NETWORKING: Connects servers internally and to the internet.
28
What is NoSQL, and what types are there?
is a database management system (DBMS) designed to handle large volumes of unstructured and semi-structured data. 4 categories Document databases -JSON, XML Key-value stores - redis, memcache column family store - Apache graph database - Neo4j
29
Hadoop and clustering
Hadoop is a distributed cluster framework handling unstructured data, it takes big block of data, and distribute it over different nodes and compute the data. This is faster than letting one computer process the whole block by itself.
30
Components of hadoop?
HDFS for splitting and distributing data to nodes. YARN Manages system resources and schedules jobs. Mapreduce algorithm that consist of mapping and reducing data, every node uses it for processing the data. Hadoop Common Shared utilities and libraries used by all Hadoop modules.
31
Difference between bare metal node and virtual node?
Virtual nodes live inside physical (bare metal) nodes, managed by virtualization software. You can run multiple virtual nodes on one bare metal node.
32
Vendor lock in?
when a customer becomes heavily reliant on a specific vendor's products or services, making it difficult and costly to switch to a different provider or technology without significant effort and resources.
33
what is kubernetes?
An open-source platform that automates the deployment, scaling, and management of containerized applications across clusters.
34
what is a Bare metal server?
A physical server dedicated to one customer
35
what is a Virtual servers?
resource partioned server which hosts Multiple VM's for paying customers.
36
What is a cubernetes cluster?
Kubernetes (K8) cluster, a group of computing nodes/worker machine, or worker machines, that run containerized application. a cluster consist of several nodes and a single node can contain multiple pods.
37
explain Bare metal hypervisor
Runs directly on the hardware, it is the deepest level OS and contains multiple VMS
38
Hosted Hypervisor
Runs on top of an existing OS, like VMware that you start on your computer that might run windows itself.
39
What is HDFS
HDFS (Hadoop distributed file system) breaks big files into blocks, spreads them across multiple computers, and keeps copies for safety — so you can store and process massive data reliably.
40
characteristics of cloud computing
Broad network access: Services are available and can be accessed over the network. On-demand self-service: Cloud users can rent resources themselves anytime/anywhere Resource pooling: Cloud resources are offered in a virtualized form and shared between multiple users. Rapid elasticity: Cloud users can automatically provision and release resources whenever required. Measured service: Cloud users paid only for the used resources based on a selected business model.
41
Cloud storage types
Object storage: data stored as objects with unique identifiers; best for unstructured data. File storage: data stored in hierarchical file structure; best for shared access. Block storage: data stored in fixed-size blocks; best for databases and VMs.
42
4 categories of cloud computing Deployment model
Deployment model means your cloud infrastructure depending on the need. The infrastructure rights may vary: access level and creation Different models are: Public model - is a service offered by third-party providers over the internet, shared by multiple users. Private model - is dedicated infrastructure operated solely for one organization, either on-premise or hosted. Hybrid model - combines public and private clouds, allowing data and applications to move between them. Community model - shared infrastructure for several organizations with common concerns, like security or compliance.
43
Cloud adoption framework
A Cloud Adoption Framework is a structured guide that helps organizations plan, implement, and manage their move to the cloud.