Big data & Databases Flashcards
DVA260 Exam (43 cards)
(cruical)What is a container pod?
wrapper (folder) that groups one or more containers together so they can share the same environment and resources. Kubernetes specific.
(cruical) What is a container Image?
container Images are read-only templates containing instructions for creating a container. A Docker image creates containers to run on the Docker platform.
What is a container replica?
A clone of a container pod to ensure backup of an important container in a cluster. Controllers monitor and insert or delete replicas with help of “replica sets”.
(cruical) What is MongoDB and when is it used?
MongoDB is a NOSQL database that stores multiple JSON files which makes searching for data easy. Used when handling unstructured and semi-structured data since fields may vary from different JSON files
(cruical) What is Neo4j and when is it used?
Neo4j is a graph database, ideal for data with rich relationships, such as social networks, recommendation systems, and even esports ecosystems. A node is a person or thing, a relationship shows how nodes are connected (same school, esport team, company)
(cruical) What is Memcached?
Memcached is a memory caching system used to enhance the performance and scalability of web applications by reducing the load on databases. It stores frequently accessed data in memory, allowing for faster retrieval compared to traditional storage methods like disk-based databases. When a user request data the systems checks Memcached first. It can be vulnerable to attacks if not inside a protected network.
(cruical)What is Saltzer and shroader design principle?
Saltzer and Schroeder’s design principles are foundational guidelines for building secure computer systems. 8 principles:
Least Privilege – Give only the access needed.
Fail-Safe Defaults – Deny by default, allow explicitly.
Economy of Mechanism – Keep it simple.
Complete Mediation – Check every access, every time.
Open Design – Don’t rely on secrecy of design.
Separation of Privilege – Require multiple conditions for access.
Least Common Mechanism – Minimize shared components.
Psychological Acceptability – Make security easy to use.
(important) What is virtualization?
Virtualization is the partitioning of physical resources by a hypervisor to create multiple isolated virtual machines (VMs), each acting like a full computer with its own OS. These VMs only access the resources allocated to them. This enables vertical scaling by maximizing resource utilization on a single physical server.
(important) What is containerization?
technology that packages an application and all its dependencies into a single, isolated, and portable environment called a container. This allows applications to run consistently across different infrastructures without compatibility issues.
(important) What is enabler software?
A software that helps other softwares to run: OS, Hypervisor, API, docker
(important) What is VM migration?
VM/Live migration is the process of moving a running virtual machine (VM) from one physical host to another, usually without shutting it down. Reasons for doing this: Maintenance - move off from inactive server so vm can keep running, Load balancing - moving vm to other server that is full. Low downtime and high flexibility.
(important) What is VM scaling?
VM scaling means adjusting the resources (CPU, RAM, storage) or number of virtual machines to meet performance needs. 2 types: Vertical scaling and horizontal scaling. it’s used when: Traffic spikes -Handle increased user load,
Cost control - Scale down during low-usage periods.
It’s flexible but not all apps scale well.
(important) What is semi structured data?
Semi-structured data has some organization, like tags or key-value pairs, but doesn’t follow a strict table structure like SQL databases. Can have nested or inconsistent fields. Uses tags, keys, or labels to identify data. Great for web data and API, bad for querying.
Examples: XML, JSON, YAML, NO-SQL data
(important) What is structured data?
Data that have common identifiers (relational), that can be put in tables together by SQL, MySQL, PostgreSQL, Oracle, SQL Server. Used when strict schema is needed, easy to validate and fast searching.
(important) What is unstructured data?
Unstructured data is information that doesn’t follow a predefined format or schema. It can’t be stored in rows and columns like in a traditional database. This makes it hard to query and search. It’s stored in noSQL databases or HDFS (Hadoop distrubuted file systems).
How and why is the cloud a good security control?
The cloud is a good security control because it offers strong built-in protections like encryption, identity management, and continuous monitoring.
GDPR compliance for cookies and data privacy
Under GDPR, websites must ensure transparent and informed consent for cookies and data collection. This means:
- No cookies before consent (except strictly necessary ones).
- Clear opt-in: Users must actively agree (no pre-ticked boxes).
- Specific consent: Users must know what data is collected, why, and by whom.
- Easy withdrawal: Users must be able to change or withdraw consent anytime.
- Data privacy: Collected data must be stored securely and used lawfully.
what is fog computing?
Fog computing pushes computing closer to where data is created — for faster response, better efficiency, and less reliance on centralized cloud data centers.
these computing nodes can be gateways or routers
what is edge computing?
Computing closest to the data source, this is good for fast response time but hard to manage and updating. These computations on edge are made by actuators, raspberry pi
what is cloud computing?
Cloud computing is the on-demand delivery of computing services (servers, storage, software) over the internet. It allows users to access and utilize these resources without directly managing the underlying infrastructure. It operates using a pay-as-you-go business model.
What is a cloud data center?
A cloud data center is a large facility filled with servers, storage systems, and networking equipment that runs cloud services and stores data for users and businesses — all managed by a cloud provider like Amazon (AWS), Google (GCP), or Microsoft (Azure).
Thousands of physical servers
Massive storage systems
High-speed networking
Cooling systems
Power backup (generators, batteries)
Security systems (both physical and digital)
major characteristics of a data center
7 major characteristics
Reliable and redundant infrastructure,
Robust security measures,
Effective cooling systems:
Flexible, scalable design:
Wide connectivity:
Disaster recovery plan:
Data management:
Key challenge in esuring GDPR i cloud services?
The main challenge is maintaining control and accountability over personal data when it’s stored, processed, or transferred in cloud environments — especially across borders.
What makes it so hard: Data location & sovereignty, Accountability & transparency,
What is a hypervisor?
The hypervisor is a hardware virtualization technique that allows multiple guest operating systems (OS) to run on a single host system at the same time. A hypervisor is sometimes also called a virtual machine manager(VMM).