Data Engineer Flashcards
(92 cards)
Data Engineer
A professional responsible for designing and maintaining data systems and pipelines.
Pipeline
A series of automated processes that move and transform data from one system to another.
ETL (Extract-Transform-Load)
A data integration process involving data extraction, transformation, and loading into storage systems.
Structured Data
Data organized in tables, such as rows and columns (e.g., in SQL databases).
Unstructured Data
Data that does not have a predefined format (e.g., images, videos, text documents).
Big Data
Large volumes of data that require advanced processing and analysis tools.
Scalable
Able to handle increased data or workload without performance issues.
Workflow
A sequence of tasks or processes to achieve a goal.
Collaboration
Working together with team members to achieve a shared goal.
Proficiency
A high level of skill in a particular area (e.g., “proficient in SQL”).
Background
Professional history, experience, and education.
Database
A structured collection of data stored electronically.
Data Warehouse
A central repository of integrated data used for reporting and analysis.
Data Lake
A large storage repository that holds raw, unstructured, and structured data.
Schema
The structure or blueprint of a database (tables, columns, data types).
Ingestion
The process of bringing data from various sources into a system.
Transformation
The process of cleaning, modifying, or enriching data for analysis
Batch Processing
Data processing that happens in groups or batches at scheduled intervals.
Streaming Processing
Real-time data processing as the data arrives
Data Modeling
Designing the structure and relationships of data in a database.
Normalization
Organizing data to reduce redundancy and improve efficiency.
Indexing
Creating data structures that improve the speed of data retrieval.
Metadata
Data that describes other data (e.g., file name, size, creation date).
SQL
Structured Query Language used to manage and query relational databases.