AWS Certified Data Engineer - Associate (DEA-C01) glossary
Terms selected for AWS Certified Data Engineer - Associate (DEA-C01) based on common objective language and practice focus.
Data Lake Architecture
Scalable storage and processing pattern for structured and unstructured data, often using staged zones for ingestion, refinement, and consumption.
Read full term ->S3 Partitioning Strategy
Folder/key design approach that organizes data by high-selectivity attributes to improve query pruning and performance.
Read full term ->Parquet
Columnar file format optimized for analytical queries, compression, and predicate pushdown.
Read full term ->AWS Glue Crawler
Service component that scans data stores and infers table schemas into the Glue Data Catalog.
Read full term ->Glue Job Bookmark
State tracking feature that allows incremental ETL processing by remembering previously processed data.
Read full term ->Athena Partition Projection
Technique for generating partition metadata at query time without storing every partition entry in a catalog.
Read full term ->Redshift Spectrum
Feature that allows Amazon Redshift to query data directly in S3 alongside local warehouse tables.
Read full term ->Redshift Sort Key
Column ordering strategy that improves scan efficiency for filtered and range-based queries.
Read full term ->Redshift Distribution Style
Data placement method across cluster nodes that affects join performance and data movement.
Read full term ->Kinesis Data Streams Shard
Capacity unit in Kinesis Data Streams that determines ingestion and read throughput limits.
Read full term ->Kinesis Data Firehose
Managed streaming delivery service that buffers, optionally transforms, and writes data to destinations like S3 and Redshift.
Read full term ->EMR Spark Processing
Distributed data processing on Amazon EMR using Apache Spark for large-scale ETL and analytics workloads.
Read full term ->Lake Formation Permissions
Fine-grained data access controls for lake resources including table, column, and row-level permissions.
Read full term ->Data Quality Rule
Validation logic that enforces constraints such as null checks, uniqueness, ranges, and referential integrity.
Read full term ->Change Data Capture (CDC)
Pattern that captures inserts, updates, and deletes from source systems for incremental downstream processing.
Read full term ->Glue Data Catalog
Central metadata repository used by AWS analytics services to discover and query datasets.
Read full term ->Schema Evolution
Controlled process for handling structural data changes over time while preserving pipeline compatibility.
Read full term ->Data Pipeline Orchestration
Scheduling and dependency management for multi-step data workflows including retries and alerts.
Read full term ->
