Definitions Flashcards

1
Q
  • asterisk
A

“All columns” and useful when you need to retrieve all the columns at once.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

5 V’s of Big Data

A

Volume
Velocity
Variety
Veracity
Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

@ (at) symbol

A

Used for variable identifiers in SQL Server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ABFS (Azure Blob Filesystem)

A

One of the primary access methods for data in Azure Data Lake Storage Gen2 is via the Hadoop FileSystem. Data Lake Storage Gen2 allows users of Azure Blob Storage access to a new driver, the Azure Blob File System driver or ABFS. ABFS is part of Apache Hadoop and is included in many of the commercial distributions of Hadoop. Using this driver, many applications and frameworks can access data in Azure Blob Storage without any code explicitly referencing Data Lake Storage Gen2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ACID

A

OLTP concept:

-Atomicity
-Consistency
-Isolation
-Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

AD (Active Directory)

A

Azure integrated active directory, using the same SID assigned to the current user logged in to the client computer. Using this authentication method allows you to define security boundaries by Active Directory groups, by creating SQL logins for the group, instead for users one by one. This way, the Active Directory admin could assign or revoke access permissions, managing the group memberships directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ADF (Azure Data Factory)

A

PaaS data movement and orchestration engine, and it shines in cloud or hybrid scenarios. It has a handy web UI for developing your pipelines. ADF has a strong integration with Azure DevOps, it provides a rich set of RESET APIs to interact with, and it has a prebuilt monitoring dashboard that lets you keep track of execution outcomes and resource consumption. You can also monitor activities through the Azure Monitor service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

AES (Advanced Encryption Standard)

A

Algorithm that uses the same key to encrypt and decrypt protected data. Uses an automatically generated certificate, which is rotated as needed, and there is no need to manage it from your side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ALTER

A

Used to add, delete, or modify columns in an existing table. Changes part of the definition of an object, but not all changes are permitted. Can be used to add a column to a table but you cannot change the type of an existing column from a string type to a numeric one, if the data contains chars other than numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Always On availability groups

A

SQL Server solution implemented for high availability and recovery. It uses Windows Server Failover Cluster and implements replicas between the members of the cluster. The replicas can be asynchronously committed when long distances must be covered, but usually the synchronous method is used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Apache Hive

A

Data warehouse system for Apache Hadoop. Hive enables data summarization, querying, and analysis of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Apache Kafka

A

Open-source distributed streaming platform that can be used to build real-time streaming data pipeline and applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Apache Oozie

A

Workflow and coordination system that manages Hadoop jobs. Oozie is integrated with the Hadoop stack, and it supports the following jobs:
- Apache Hadoop MapReduce
- Apache Pig
- Apache Hive
- Apache Sqoop

You can also use Oozie to schedule jobs that are specific to a system, like Java programs or shell scripts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Apache Spark

A

Unified analytics engine for large-scale data processing. It provides high-level APIs in Java, Scala, Python, and R, and an optimized engine supports general execution graphs. See also Azure Databricks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

ARM (Azure Resource Manager)

A

Used to perform administrative tasks. ARM is the service that uses the portal to perform the tasks. The actions and parameters you choose are sent back to ARM for the portal to get the work done.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Atomicity

A

Derives from the concept of an atom, something that must be together, “all or nothing.” Ensuring that all the information is stored as one block, including all the parts at the same time.

17
Q

Attributes

A

Columns

18
Q

HiveQL

A

Query language similar to SQL used to write Hive queries in Apache Hive.

19
Q

WASB (Window Azure Storage Blob)

A

Azure Blob Storage is Microsoft’s object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data.

20
Q

Access Management

A

Azure provides various procedures that you can use to allow access to your resources, Authentication and Authorization.

21
Q

Acquisition

A

Produced data is pushed to one or multiple endpoints, where a stream transport and/or processing engine is listening for incoming data events.

22
Q

Adjacency List

A

A collection of unordered lists used to represent a finite graph.

23
Q

Adjacency Matrix

A

A square matrix used to represent a finite graph.

24
Q

Aggregation

A

Aggregation is usually performed over time, grouping events by windows. Tumbling, hopping, sliding, and session windows are commonly used to identify specific rules for aggregation.

25
Q

Transformation

A

Filtering out unwanted values, enriching it by joining it with static data sets or other streams, or passing it to a machine learning service to be scored or a target of prediction.

26
Q

AI (Integrating in Pipelines)

A

Offerings like Azure Cognitive Services make it possible to integrate AI in your pipelines with just API calls, saving the burden of building a complex platform yourself.

27
Q

ALIAS

A

In the FROM part of the SELECT, each table can be followed by AS expression with a letter or a couple letters. This makes it easier to identify the tables by the ALIAS instead of the full name.

*In practice, the AS is not needed in the query when aliasing tables.