Azure Data Lists Flashcards
(52 cards)
Data architectures
- Lambda architecture
- Kappa architecture
Lambda architecture layers
Batch layer
Speed layer
Serving layer
Data warehouse workload types
- Relational
- Non-relational
- Batch
- Streaming
Main phases of a data stream flow
- Production
- Acquisition
- Aggregation and transformation
- Storage
Time window aggregation types
- Tumbling window
- Hopping window
- Sliding window
- Session window
Data stream concepts
- Watermarks
- Consumer groups
- Time window aggregations
Batch processing scenarios
- Data set transformation and preparation
- ETL and ELT workloads
- Machine learning model training
- Applying machine learning models on data sets for scoring
- Report generation
Azure batch Processing Services
- Azure Synapse Analytics
- Azure Data Lake Analytics
- Azure HDInsight
- Azure Databricks
Batch processing tools
- Azure Synapse Analytics
- Azure Data Lake Analytics
- Azure HDInsight
- Azure Databricks
- Apache Hive
- Apache Pig
- Apache Spark
Analytical data stores
- Azure Synapse Analytics
- Spark SQL
- HBase
- Apache Hive
Five V’s of big data
- Volume
- Velocity
- Variety
- Veracity
- Value
Analytics techniques
- Descriptive analysis
- Diagnostic analysis
- Predictive analysis
- Prescriptive analysis
TDSP phases
- Business needs
- Data discovery and acquisition
- Model development
- Model deployment
Common TDSP roles
- Subject matter expert
- Data engineer
- Data scientist
- Application developer
MLOps best practices
- Exploratory data analysis (EDA)
- Data Prep and Feature Engineering
- Model training and tuning
- Model review and governance
- Model inference and serving
- Model deployment and monitoring
- Automated model retraining
Azure Data Factory runtime types
- Azure
- Self-hosted
- SSIS (SQL Server Integration Services)
Azure Data Factory transformation types
- External services
- Mapping data flows (uses Apache Spark code, run on Azure Databricks)
- Wrangling data flows (Power Query editor in Microsoft Power BI)
Azure Data Factory external services for transformations
- Azure SQL Database
- Azure Synapse Analytics
- Azure Databricks
- Azure HDInsight
- Azure Functions
- SQL Server Integration Services (SSIS)
Azure Stream Analytics features
- Provisioned or on-demand SQL Server pools
- Provisioned or on-demand Spark pools
- Stream processing capabalitiies through window aggregations
- ML models aggregation through the PREDICT statement
- Azure DevOps integration
- Data Factory-like pipelines development experience
- Power BI report editor integration
Macro-layers for analytics
- Analytical access
- Reporting access
- Dashboarding access
Azure SQL Database purchasing models
- vCore-based
- DTU-based
Services needed to run SQL Server on an Azure VM
- Azure Storage to contain the virtual disk(s).
- Azure Virtual Network
- Azure Compute Service to run the VM
Extra PostgreSQL data types
- Document
- Geometry
- JSON
- Composite
- Custom
Azure MariaDB and MySQL pricing tiers
- Basic
- General Purpose
- Memory Optimized