Extra Flashcards

Question

Azure Data Lake Analytics

Answer 1

Data Lake Analytics is an on-demand analytics job service that simplifies big data. Instead of deploying, configuring, adn tuning hardware... you write queries (viaU-SQL) to transform you data and extract valuable insights. -U-SQL: Is a structured query language included within Data Lake Analytics to perform queries on you data lake.

Answer 2

When a connection from a server to an Azure SQL database the client will connect to a "gateway" that listens on port 1443. Based on the connection policy the gateway will grant traffic and route access to the appropriate database. 1- Proxy = outside the Azure Network 2- Redirect = inside the Azure Network 3- Default = It will default to either proxy or redirect based on where is the workload

Answer 3

Azure Defender for SQL is a unified package for advanced SQL security capabilities for Vulnerability Assessment and Advanced Threat Protection. Available for: Azure SQL Database, SQL Managed Instance and Synapse Analytics What it does: -Discovering and classifying sensitive data -Surfacing and mitigating potential database vulnerabilities -Detecting anomalies activities

Answer 4

-Azure databases are protected by server firewalls. -A server firewall is an internal firewall that resides on the database server. -All connections are rejected by default to database. -You can set server firewall rules via T-SQL

Answer 5

Feature that encrypts columns in an Azure SQL Database or SQL Server. Uses two types of keys: -Column encryption keys = used to encrypt data in an encrypted column -Column master keys = a key protecting key that encrypts one or more column encryption keys Transparent Data Encryption (TDE) encrypts data-at-rest for Microsoft Databases. Can be applied to: SQL Server, Azure SQL Database and Synapse Analytics.

Answer 6

SQL DB Contributor: -Manage SQL databases, but not access to them -Can't manage their security-related policies on their parent SQL servers SQL Managed Instance Contributor: -Manage SQL Managed Instances and required network configuration -Can't give access to others SQL Security Manager: -Manage the security-related policies of SQL servers and databases -But not access to SQL servers SQL Server Contributor: -Manage SQL servers and databases -But not access to SQL servers

Answer 7

MongoDB is an open-source document database which stores JSON-like documents. The primary data structure for MongoDB is called BSON. -Binary JSON (BSON) = Designed to be efficient both in storage space and scan-speed compared to JSON. -MongoDB supports primary and secondary indexes -High availability can be obtained via replica sets -Scales horizontally using sharding -Supports multi-document ACID transactions

Answer 8

A graph database is a database composed of a data structure that uses vertices (nodes, dots) which form relationship to other vertices via edges (arcs, lines). -Fraud detection -Real-time recommendation engines -Master data management (MDM) -IAM -Contact Tracing -Apache TinkerPop is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP) -Gremlin is the graph transversal language for Apache TinkerPop

Answer 9

Hadoop is an open-source framework for distributed processing of large data sets. Hadoop allows you to distribute: -Large dataset across many servers e.g HDFS -Computing queries across many servers e.g MapReduce Kafka is an open-source streaming platform to create high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Answer 10

HDInsight is managed service to run popular open-source analytics service. Supports the following frameworks: -Apache Hadoop -Apache Spark -Apache Kafka -Apache Storm -Apache Hive -Apache HBase Has a broad range os scenarious such as: ETL, Data Warehousing, ML, IoT

Answer 11

Spark is an open-source unified analytics engine for big data and machine learning. Lets you run workloads much faster than Hadoop: 100x faster in memory and 10x faster than disk (lighting fast) -Spark is a collection of libraries that work well together to form an analytics ecosystem Resilient Dristributed Dataset (RDD API) = is a domain specific language (DSL) to execute various parallel operations on an Apache Spark cluster.

Answer 12

Databricks is a software company specializing in providing fully managed Apache Spark clusters. Databricks Platform - Cloud-based spark platform with an ease-to-use web UI -Launch fully managed Spark clusters -Launch notebooks to write code and interact with Spark -Create workspaces to collaborate with team members -Create ELT or data analysis tasks -Available on all main cloud service providers Azure Databricks offers two environments: -Azure Databricks Workspace: Databricks platform with integrations to Azure data-related services for building big data pipelines. -Azure Databricks SQL Analytics: Run SQL queries on your data lake, create multiple visualization types to explore query results & build and share your dashboards

Answer 13

SQL Server Management Studio is an IDE for managing any SQL infrastructure. Access, configure, manage, administer, and develop all components of: SQL Server, Azure SQL Database & Synapse Analytics

Answer 14

SQL Server Data Tools transforms database development by introducing a ubiquitous, declarative model that spans all the phases of database development inside Visual Studio. -Uses SSDT Transact-SQL to build, debug, maintain, and refactor databases

Answer 15

Azure Data Studio is a cross-platform database tool for data professionals using on-premise and any cloud data platforms for Windows, macOS, and Linux. -Query, design, and manage you databases and data warehouses. -Very similar to Visual Studio Code

Answer 16

Data Factory is a managed service for ETL, ELT and data integration. -Create data-driven workflows for orchestrating data movement and transforming data at scale. -Create Pipelines -Publish your transformed data to data stores such as Synapse Analytics Activities: A processing step in a pipeline Datasets: Data structures within the data store Linked Service: Define the connection information for data sources to connect to Data Factory Data Flows: Logic to determine how data moves through a pipeline or is transformed Integration Runtime: Compute infraestructure used by DataFactory Control Flow: Orchestration of pipeline activities that includes chaining activities in a sequence, branching

Answer 17

Microsoft SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and data transformations solutions. -Can be used to automate SQL Server databases -Can be used as a IR in Data Factory -Perform ELT

Extra Flashcards

(41 cards)