Own questions Flashcards
(32 cards)
What does ETL stand for?
Extract, Transform and Load.
What are the aggregate functions of the Cosmos DB SQL API?
COUNT, SUM, AVG, MAX, MIN
Unlike the ANSI SQL query language it does not support GROUP BY!
What are the three supported blob types in an Azure blob storage?
Block blobs: Up to 50’000 Blocks with size <100MB, the smallest unit for a r/w is one whole Block
Page blobs: A Collection of fixed 512-byte pages, total size up to 4.7TB, optimized for random r/w opperations, used for virtual disk storage for Azure virtual machines
Append blobs: Blob size <4MB, total size up to 195GB, only supports appending of blobs, no UPDATE or DELETE
Where are single blobs stored inside an Azure blob storage?
Blobs are stored in containers.
A container provides a convenient way of grouping related blobs together, and you can organize blobs in a hierarchy of folders inside a container, similar to files in a file system on disk.
What is the difference between DDL and DML?
DDL (Data Description Language): DDL statements are used to create or change databases, schemes, constraints etc.
E.g. CREATE, ALTER, DROP, RENAME
DML (Data Manipulation Language): DML statements are used to insert update or delete records.
E.g. SELECT, INSERT, UPDATE, DELETE
What is IaaS?
IaaS (Infrastructure as a Service): Azure enables you to create a virtual infrastructure in the cloud that mirrors the way an on-premises data center might work.
You take responsibility for installing and configuring the software, such as the DBMS, on these virtual machines.
Very similar to how you would run a system inside an organization but you don’t have to concern yourself with the hardware.
E.g. SQL Server running on a Azure virtual machine.
What is PaaS?
PaaS (Platform as a Service): You specify the resources that you require and Azure creates the necessary virtual machines, networks, and other devices for you.
IaaS + database management and operating systems.
E.g. Azure SQL Databases
What is SaaS?
SaaS (Software as a Service): SaaS services are typically specific software packages that are installed and run on virtual hardware in the cloud.
E.g. Office 365
What are the tasks performed by an analytical data processing system?
Data ingestion > Data transformation / Data processing > Data querying > Data visualization
What is the difference between a relational and a non-relational database?
Relational database: Relational database store the data in rows and columns similar to how a spreadsheet would work. The data is easily structured into categories and the relationships between the tables are well defined. Most of the time the data has to go through a normalization process so that it can fit inside the structure of the database.
E.g. MySQL, postgreSQL, MariaDB etc.
Non-Relational Databases: Non-relational databases enable you to store the data in a format closer to the original structure. Queries can be executed faster as you don’t have to join as many tables to get all the necessary data. Data with a variable amount of data-points can be easily saved inside a non-relational database as the number of data-points inside on row may vary.
E.g MongoDB, Cassandra, Azure Cosmos DB
What are the ACID properties and when are they used?
The ACID properties are there to ensure that a transactional database remains consistent while processing transactions.
Atomicity: Atomicity guarantees that each transaction is treated as a single unit. This means that it is not possible that the first half of the unit succeeds and the second half does not. This has to be true even in case of an error, crashes or power outages during the transaction.
Consistency: Consistency ensures that the data used in the transaction is valid before and after the transaction.
E.g It is not possible to deduct the funds of a bank account to below zero.
Isolation: Isolation guarantees that concurrent transactions have the same result as if when they would have been done sequentially.
Durability: Durability guarantees that once a transaction has been finished, it doesn’t get reverted by a crash or a power outage.
What are the three key roles in the world of data?
Database Administrators, Data Engineers and Data Analysts
What is an index?
An Index contains a sorted list of references from one column of a table. When using a query for that specific column the database management system can use the index to fetch the data more quickly.
You can create as many indices of a table as you want, however every index consumes more storage space and each time you either INSERT, UPDATE or DELETE the data all indices have to be updated as well.
What is a view?
A view is a virtual table based on the result set of a query. In the simplest case, you can think of a view as a window on specified rows in an underlying table.
What is semi-structured data?
Semi-structured data is data that contains fields. The fields don’t have to be the same in every entity. You only define the fields that you need on a per-entity basis. The Customer entities shown in the previous unit are examples of semi-structured data. The data must be formatted in such a way that an application can parse and process it. One common way of doing this is to store the data for each entity as a JSON document.
What is unstructured data?
Unstructured data is data that doesn’t naturally contain fields. Each item is an amorphous blob of binary data. You can’t search for specific elements in this data.
E.g. video, audio and other media streams
What is the difference between a Azure Single Database deployment and an Azure Elastic Pool
In an elastic pool by default multiple databases can share the same resources (memory, data storage space, processing power). Those resources are then referred to as a pool.
An elastic pool can help to reduce costs for databases with variable resource requirements.
E.g. You have two databases one for payroll processing and one for running reports. The first one will be used a lot at the end of the month while the other will be used more during the month.
What is the difference between an Azure SQL Database managed instance and an Single Database or Elastic Pool?
Both the single database and the elastic pool restrict some of the administrative features available to SQL Server. A managed instance runs an effectively fully controllable SQL Server in the cloud.
What is automated in a Azure SQL Database managed instance.
The Managed instance service automates backups, software patching, database monitoring, and other general tasks, but you have full control over security and resource allocation for your databases.
What entails the Azure Database option for PostgreSQL Hypersacle?
Hyperscale (Citus) is a deployment option that scales queries across multiple server nodes to support large database loads. Your database is split across nodes. Data is split into chunks based on the value of a partition key or sharding key.
What is Provisioning?
Provisioning is the act of running series of tasks that a service provider, such as Azure SQL Database, performs to create and configure a service. Behind the scenes, the service provider will set up the various resources (disks, memory, CPUs, networks, and so on) required to run the service. You’ll be assigned these resources, and they remain allocated to you (and charged to you), until you delete the service.
What are the different tools you can use for the provisioning of an azure service?
The Azure portal: Website / you are prompted for the necessary parameters.
The Azure command-line interface (CLI): CMD for creating services. Can be used to create scripts and automate service creation.
Azure PowerShell: Azure provides some Azure-specific commands that can be used in PowerShell
Azure Resource Manager templates: An Azure Resource Manager template describes the service that you want to deploy as a JSON file.
What is the default connectivity for Azure relational data services?
The default connectivity for Azure relational data services is to disable access to the world.
What are the 4 core functionalities of the SQL Database gateway service (DoSGuard)
- DoSGuard actively tracks failed logins from IP addresses. If there are multiple failed logins from a specific IP address within a period of time, the IP address is blocked from accessing any resources in the service for a short while.
- It validates all connections to the database servers, to ensure that they are from genuine clients.
- It encrypts all communications between a client and the database servers.
- It inspects each network packet sent over a client connection. The gateway validates the connection information in the packet, and forwards it to the appropriate physical server based on the database name that’s specified in the connection string.