Terms Flashcards
(1730 cards)
What is AWS?
Amazon Web Services is a cloud platform that manages the necessary hardware for application services, allowing users to provision and access resources through a web application.
Amazon Web Services (Cloud Supplier).
A cloud services platform such as Amazon Web Services owns and maintains the network-connected hardware required for application services, while you provision and use what you need via a web application.
What is Amazon Athena (Analytics)?
Amazon Athena is a serverless query service that enables easy analysis of data in Amazon S3 using standard SQL. It allows users to run interactive ad-hoc queries without managing infrastructure, making it ideal for quick data analysis. Athena supports various data formats and integrates with tools like Amazon QuickSight for visualization. It uses a managed Data Catalogue for schema storage and can handle complex queries, while simpler queries can be executed using the S3 Select feature. Users pay only for the queries they run.
Amazon Athena is a query service that allows for easy data analysis in Amazon S3 by using standard SQL.
Services like Amazon Athena, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR, all address different needs and use cases.
Amazon Athena provides the easiest way to run ad-hoc queries for data in S3 without the need to setup or manage any servers.
Primary use case: Query
When to use: Run interactive queries against data directly in Amazon S3 without worrying about formatting data or managing infrastructure. Can use with other services such as Amazon RedShift.
Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL.
Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
Athena is easy to use – simply point to your data in Amazon S3, define the schema, and start querying using standard SQL.
Amazon Athena uses Presto with full standard SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Apache Parquet and Avro.
While Amazon Athena is ideal for quick, ad-hoc querying and integrates with Amazon QuickSight for easy visualization, it can also handle complex analysis, including large joins, window functions, and arrays.
Amazon Athena uses a managed Data Catalogue to store information and schemas about the databases and tables that you create for your data stored in Amazon S3.
Amazon Athena is an analytics service that makes it easy to query data in Amazon S3 using standard SQL commands. AWS customers can also use an Amazon S3 feature called S3 Select to query data on S3 using SQL commands; however, S3 Select can only be used to perform simple SQL queries on a single S3 Object.
Query data in S3 using SQL (Analytics).
Amazon Athena allows you to query data in S3 using SQL (Analytics). Athena is server-less, so there is no infrastructure to manage, and you pay only for the queries that you run.
Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. AWS customers can also use an Amazon S3 feature called S3 Select to query data on S3 using SQL commands; however, S3 Select can only be used to perform simple SQL queries on a single S3 Object.
What is Amazon DynamoDB?
Amazon DynamoDB is a fully managed NoSQL database service that offers key-value and document storage with fast performance and seamless scalability. It offloads administrative tasks such as hardware provisioning and software patching to AWS, allowing users to focus on their data. AWS manages the security and infrastructure, while customers are responsible for their stored data.
Amazon DynamoDB is a fully managed NoSQL database service.
Amazon DynamoDB is not a storage service.
Amazon DynamoDB is a key-value and document database service.
DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. DynamoDB enables customers to offload the administrative burdens of operating and scaling distributed databases to AWS so that they do not have to worry about hardware provisioning, setup and configuration, throughput capacity planning, replication, software patching, or cluster scaling.
DynamoDB is a fully managed NoSQL offering provided by AWS. It is now available in most regions for users to consume.
For more information on AWS DynamoDB, please refer to the below URL:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html
Part of abstracted services for which AWS is responsible for the security & infrastructure layer. Customers are responsible for data that is saved on these resources.
What is Amazon Elastic Compute Cloud (EC2) (Compute)?
Amazon Elastic Compute Cloud (EC2) is a web service that offers secure, resizable compute capacity in the cloud. Users can quickly boot server instances and pay only for what they use. EC2 allows the installation of any database software, but users are responsible for managing it. It supports batch processing jobs, requiring users to manage the necessary software and server clusters. EC2 is a core AWS service that runs virtual machines and requires a security group for instances. Users can configure various settings through the AWS console.
Resize compute capacity: Amazon Elastic Compute Cloud (EC2) is a (web) service that provides secure, resizable, compute capacity in the cloud.
Use secure, sizable compute capacity
* Boot server instances in minutes
* Pay only for what you use
You can install and run any database software you want on Amazon EC2. In this case, you are responsible for managing everything related to this database.
Amazon EC2 can be used to run any number of batch processing jobs, but you are responsible for installing and managing a batch computing software and creating the server clusters.
EC2 is a core AWS service and runs VMs. Resize compute capacity. You cannot have an EC2 instance without a security group.
PAYG. Broad selection of HW/SW, where to host.
- Log into AWS console.
- Choose Region.
- Launch EC2 wizard.
- Select Amazon Machine Image (AMI) - software platform - windows/Linux etc.
- Select Instance Type (#cores, RAM etc)
- Configure network
- Configure storage
- Configure key pairs/tags (for connecting to instance after we launch it e.g. name)
- Configure firewall security groups.
What is Amazon ElastiCache?
Amazon ElastiCache is a managed in-memory storage service that enhances application performance by providing high throughput and low latency data retrieval. It supports Redis and Memcached, improving memory performance by caching CPU-intensive and I/O queries. Redis serves as a distributed key-value database and message broker, while Memcached is a general-purpose caching system that speeds up dynamic websites by reducing external data source reads.
In memory storage for fast, managed information retrieval.
Amazon ElastiCache is used to improve the performance of your existing apps by retrieving data from high throughput and low latency in-memory data stores.
Amazon ElastiCache is a memory cache system service on the cloud and supports Redis and Memcached.
ElastiCache improves the memory performance by CPU Intensive Queries and Caching I/O queries in memory for quick results.
Redis is an in-memory data structure store, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indices.
Memcached is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read. Memcached is free and open-source software, licensed under the Revised BSD license.
What is Amazon Elastic MapReduce (EMR)? (Analytics)
Amazon EMR (Elastic MapReduce) is a cost-effective web service for running distributed processing frameworks like Hadoop, Spark, and Presto on Amazon EC2 and S3. It simplifies data processing tasks such as machine learning, analytics, and ETL, allowing users to customize compute, memory, and storage parameters. EMR supports log analysis and other data-intensive applications, using a managed Hadoop framework to process large datasets. It enables users to launch clusters, access underlying operating systems, and manage tasks programmatically.
Amazon EMR makes it simple and cost effective to run highly distributed processing frameworks such as Hadoop, Spark, and Presto when compared to on-premises deployments.
services like Amazon Athena, data warehouses like Amazon Redshift, and sophisticated data processing frameworks like Amazon EMR, all address different needs and use cases.
Primary use case: Data Processing
When to use: Highly distributed processing frameworks such as Hadoop, Spark, and Presto. Run a wide variety of scale-out data processing tasks for applications such as machine learning, graph analytics, data transformation, streaming data.
Amazon EMR is flexible – you can run custom applications and code, and define specific compute, memory, storage, and application parameters to optimize your analytic requirements.
Amazon Elastic MapReduce (EMR) is a web service that enables you to process vast amounts of data across dynamically scalable Amazon EC2 instances.
Amazon EMR is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.
EMR utilizes a hosted Hadoop framework running on Amazon EC2 and Amazon S3.
Managed Hadoop framework for processing huge amounts of data.
Also support Apache Spark, HBase, Presto and Flink.
Most commonly used for log analysis, financial analysis, or extract, translate and loading (ETL) activities.
A Step is a programmatic task for performing some process on the data (e.g. count words).
A cluster is a collection of EC2 instances provisioned by EMR to run your Steps.
EMR uses Apache Hadoop as its distributed data processing engine, which is an open source, Java software framework that supports data-intensive distributed applications running on large clusters of commodity hardware.
EMR is a good place to deploy Apache Spark, an open-source distributed processing used for big data workloads which utilizes in-memory caching and optimized query execution.
You can also launch Presto clusters. Presto is an open-source distributed SQL query engine designed for fast analytic queries against large datasets.
EMR launches all nodes for a given cluster in the same Amazon EC2 Availability Zone.
You can access Amazon EMR by using the AWS Management Console, Command Line Tools, SDKS, or the EMR API.
With EMR you have access to the underlying operating system (you can SSH in).
A tool for big data processing and analysis. Amazon EMR processes big data across a Hadoop cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) (Analytics).
What is Amazon Inspector?
Amazon Inspector is a security assessment service that automatically assesses applications for exposure, vulnerabilities, and deviations from best practices.
Amazon Inspector is an automated security assessment service that helps improve the security and compliance of applications deployed on AWS.
Amazon Inspector is a vulnerability management service that continuously scans your AWS workloads for vulnerabilities. Amazon Inspector automatically discovers and scans Amazon EC2 instances and container images residing in Amazon Elastic Container Registry (Amazon ECR) for software vulnerabilities and unintended network exposure.
- Amazon Inspector allows you to analyse Application Security.
- An automated security assessment service.
- Assesses applications for security vulnerabilities or deviations from best practices
- Produces a report with security findings and prioritised next steps
- AWS doesn’t guarantee but does present useful information.
- Can build into DevOps process to proactively spot things and make part of build and deployment process.
- Can access Inspector through the console, SDKs, API and CLI.
Amazon Inspector can be used to analyse potential security threats for an Amazon EC2 instance against an assessment template with predefined rules. It does not provide historical data for configurational changes done to AWS resources.
What is Amazon Kinesis (Analytics)?
Amazon Kinesis is an analytics service that allows you to easily collect, process, and analyse video and data streams in real time.
Amazon Kinesis makes it easy to collect, process, and analyse real-time, streaming data so you can get timely insights and react quickly to new information.
Collection of services for processing streams of various data.
Data is processed in “shards”.
There are four types of Kinesis service.
Amazon Kinesis makes it easy to collect, process, and analyse real-time streaming data so you can get timely insights and react quickly to new information (Analytics). Reliably load real-time streams into data lakes, warehouses, and analytics services. A real-time data streaming service.
What is Amazon Macie (Security)?
Amazon Macie is a data security and data privacy service.
Amazon Macie is a machine learning powered security service to discover, classify and protect sensitive data.
Amazon Macie is a data security and data privacy service that uses machine learning (ML) and pattern matching to discover and protect your sensitive data.
Amazon Macie is a security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS. Amazon Macie recognizes sensitive data such as personally identifiable information (PII) or intellectual property, and provides you with dashboards and alerts that give visibility into how this data is being accessed or moved.
Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect sensitive data stored in Amazon S3. Macie automatically detects a large and growing list of sensitive data types, including personally identifiable information (PII) such as names, addresses, and credit card numbers. Macie automatically provides an inventory of Amazon S3 buckets including a list of unencrypted buckets, publicly accessible buckets, and buckets shared with other AWS accounts. Then, Macie applies machine learning and pattern matching techniques to the buckets you select to identify and alert you to sensitive data. Amazon Macie can also be used in combination with other AWS services, such as AWS Step Functions to take automated remediation actions. This can help you meet regulations, such as the General Data Privacy Regulation (GDPR).
AWS Macie primarily matches and discovers sensitive data such as personally identifiable information (PII).
What is Amazon Artifact (Security)?
AWS Artifact provides on-demand access to AWS’ security and compliance reports. Used to download AWS’ security & compliance documents.
Examples of these reports include Service Organization Control (SOC) reports, Payment Card Industry (PCI) reports.
Amazon Artifact enables you to download AWS security and compliance documents.
What are AWS Availability Zones?
One or more discrete data centres with redundant power, networking, and connectivity in an AWS Region.
Availability Zones (AZs) may consist of multiple data centres. For deployment of highly available applications.
Deploying your resources across multiple Availability Zones helps you maintain high availability of your infrastructure.
What is AWS Billing Console?
The AWS Billing console allows you to easily understand:
Your AWS spending;
View and pay invoices;
Manage billing preferences and tax settings; and
access additional Cloud Financial Management services.
Quickly evaluate whether your monthly spend is in line with prior periods, forecast, or budget, and investigate and take corrective actions in a timely manner.
The Billing Console offers you a number of different ways to view and monitor your AWS usage.
What is AWS Budgets?
AWS Budgets gives you the ability to set custom budgets that alert you when your costs or usage exceed (or are forecasted to exceed) your budgeted amount.
Set custom budgets that alert you when you have exceeded your budgeted thresholds.
What is AWS CloudTrail?
AWS Monitoring and Logging Services.
AWS CloudTrail is a web service that records activity made on your account and delivers log files to an Amazon S3 bucket.
CloudTrail is for auditing (CloudWatch is for performance monitoring).
CloudTrail is about logging and saves a history of API calls for your AWS account.
Provides visibility into user activity by recording actions taken on your account.
Logs API calls made via:
- AWS Management Console.
- AWS SDKs.
- Command line tools.
- Higher-level AWS services (such as CloudFormation).
CloudTrail records account activity and service events from most AWS services and logs the following records:
- The identity of the API caller.
- The time of the API call.
- The source IP address of the API caller.
- The request parameters.
- The response elements returned by the AWS service.
CloudTrail is enabled by default.
CloudTrail is per AWS account.
You can consolidate logs from multiple accounts using an S3 bucket:
- Turn on CloudTrail in the paying account.
- Create a bucket policy that allows cross-account access.
- Turn on CloudTrail in the other accounts and use the bucket in the paying account.
You can integrate CloudTrail with CloudWatch Logs to deliver data events captured by CloudTrail to a CloudWatch Logs log stream.
CloudTrail log file integrity validation feature allows you to determine whether a CloudTrail log file was unchanged, deleted, or modified since CloudTrail delivered it to the specified Amazon S3 bucket.
API history enables security analysis, resource change tracking, and compliance auditing.
CloudTrail logs all API calls made to AWS services with credentials linked to your accounts.
Track user activity and API usage:
- security analysis
- resource tracking
- troubleshooting
CloudTrail is a service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. With CloudTrail, you can log, continuously monitor, and retain account activity related to actions across your AWS infrastructure.
What is AWS Consolidated Billing?
Track the combined costs of all of the AWS accounts in your organisation.
What is AWS Cost Explorer?
Visualise, understand, and manage your AWS costs and usage over time.
Additional information:
AWS Cost Explorer is a free tool that you can use to view your costs and usage. You can view data up to the last 13 months, forecast how much you are likely to spend for the next twelve months. You can use AWS Cost Explorer to see patterns in how much you spend on AWS resources over time, identify areas that need further inquiry, and see trends that you can use to understand your costs. AWS Cost Explorer allows you to explore your AWS costs and usage at both a high level and at a detailed level of analysis, and empowering you to dive deeper using a number of filtering dimensions (e.g., AWS Service, Region, Linked Account, etc.)
What are AWS Edge Locations?
AWS edge locations are used by the CloudFront service to cache and serve content to end-users from a nearby geographical location to reduce latency. Edge locations are used by the CloudFront service to distribute content globally.
A datacentre owned by a trusted partner of AWS which has a direct connection to the AWS network. Allows low latency no matter where the end user is geographically.
Outnumber AZ.
An edge location is where end users access services which are located at AWS. They are located in most of the major cities around the world and are specifically used by CloudFront (CDN) to distribute content to end users to reduce latency. It is like a frontend for the services we access which are located in the AWS Cloud. Edge Locations - local (e.g. in most cities) locations for performance delivery of content (Amazon CloudFront). Cache = Edge Location.
Benefits of using Edge Locations include:
- Edge locations are used by CloudFront to improve your end users’ experience when uploading files
- Edge locations are used by CloudFront to distribute content to global users with low latency
- Edge locations are used by CloudFront to cache the most recent responses
What is Amazon Elastic Beanstalk (Compute)?
It is a PaaS service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.
Elastic Beanstalk provides an answer to the question “how can I quickly get my app to the Cloud?”.
You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling, to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time. Choose instance type, choose database, adjust autoscaling.
A developer centric view of deploying an application on AWS. Beanstalk = Platform as a Service (PaaS).
Developers can easily deploy the services and web applications developed with .NET, Java, PHP, Python and more without providing any infrastructure (Compute).
What is AWS Glue? (Analytics)
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics (Analytics).
Primary use case: ETL Service
When to use: Transform and move data to various destinations. Used to prepare and load data for analytics. Data source can be S3, RedShift or another database. Glue Data Catalog can be queried by Athena, EMR and RedShift Spectrum
AWS Glue is a fully managed, pay-as-you-go, extract, transform, and load (ETL) service that automates the time-consuming steps of data preparation for analytics.
AWS Glue automatically discovers and profiles data via the Glue Data Catalogue, recommends and generates ETL code to transform your source data into target schemas.
AWS Glue runs the ETL jobs on a fully managed, scale-out Apache Spark environment to load your data into its destination.
AWS Glue also allows you to setup, orchestrate, and monitor complex data flows.
You can create and run an ETL job with a few clicks in the AWS Management Console.
Use AWS Glue to discover properties of data, transform it, and prepare it for analytics.
Glue can automatically discover both structured and semi-structured data stored in data lakes on Amazon S3, data warehouses in Amazon Redshift, and various databases running on AWS.
It provides a unified view of data via the Glue Data Catalogue that is available for ETL, querying and reporting using services like Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Glue automatically generates Scala or Python code for ETL jobs that you can further customize using tools you are already familiar with.
AWS Glue is serverless, so there are no compute resources to configure and manage.
What is AWS Identity and Access Management (AWS IAM)?
Tools to control access and authentication to your network-facing applications and resources.
AWS Identity and Access Management (IAM) is a web service that helps you securely control access to AWS resources. You use IAM to control who is authenticated (signed in) and authorized (has permissions) to use resources.
Securely manage access to services and resources. IAM is free to use on top of other services.
IAM Permissions let you specify the desired access to AWS resources. Permissions are granted to IAM entities (users, user groups, and roles) and by default these entities start with no permissions. In other words, IAM entities can do nothing in AWS until you grant them your desired permissions.
AWS IAM is a global service.
AWS IAM is used to control access to AWS services or resources. It is not suited for authenticating large numbers of users to mobile applications.
What is an AWS Local Region?
An AWS Local Region is a single data centre designed to complement an existing AWS Region. Like all AWS Regions, AWS Local Regions are completely isolated from other AWS Regions.
What is the AWS Management Console?
AWS Management Console is a web application for managing Amazon Web Services.
You can interact with AWS services via the management console web interface. Can use a command line, SDK or code interface (web, terminal, code).
AWS Management Console lets you access and manage individual AWS resources through a web-based user interface.
What is AWS Marketplace?
AWS Marketplace is a digital catalogue with thousands of software listings from independent software vendors that make it easy to find, test, buy, and deploy software that runs on AWS.
What is the AWS Pricing Calculator?
The AWS Pricing Calculator is a Tool to help predict monthly bills. It is used to create estimates.
AWS Pricing Calculator does not record any information about your AWS cost and usage.
AWS Pricing Calculator is just a tool for estimating your monthly AWS bill based on your expected usage.
For example, to estimate your monthly AWS CloudFront bill, you just enter your expected CloudFront usage (Data Transfer Out, Number of requests, etc.) and AWS Pricing Calculator provides an estimate of your monthly bill for CloudFront.