Pillar Five - Operational Excellence Flashcards Preview

SA-12-The Well Architected Framework > Pillar Five - Operational Excellence > Flashcards

Flashcards in Pillar Five - Operational Excellence Deck (14):
1

Operational Excellence

  1. Includes operational practices and procedures to manage production workloads
  2. Includes how planned changes are executed, and responses to unexpected events
  3. Recommendation:
    1. Change execution and responses should be automated
    2. All processes and procedures should be documented, tested, and regularly reviewed

2

Design principles

  1. Perform operations with code
  2. Align operation processes to business objectives (eg. How is operations meeting business needs)
  3. Make regular, small, incremental changes
  4. Test for responses to unexpected events
  5. Learn from operational events and failures
  6. Keep operations procedures current (documentation, runbacks, playbooks, procedures, etc..)

3

There are three best practice areas for operational excellence in the cloud

  1. Preparation
  2. Operation
  3. Response

4

Best Practices: Preparation (part 1)

  1. Preparation drives operational excellence
  2. Checklists ensures that workloads are ready for production and prevent mistakes
  3. Workloads should have
    1. Runbacks - which offer guidance that operations teams can refer to for normal tasks
    2. Playbooks - which offer guidance to unexpected events (response plans, escalation paths, and stakeholder notification)
  4. AWS CloudFront
    1. Can be used to ensure environments contain all required resources, and that the configurations are correct based on tested best practices

5

Best Practices: Preparation (part 2)

  1. Auto Scaling
    1. Provide auto mated scaling mechanisms to respond to business related events that affect operations needs
  2. Tagging
    1. To make sure all resources in a workload can be easily identified when needed during responses
  3. Accurate Documentation
    1. Information can become stale and needs to be updated regularly and tested
    2. Should include:
      1. Application designs
      2. Environment configurations
      3. Resource configurations
      4. Response plans
      5. Mitigation plans

6

Best Practices: Preparation (part 3)

  1. Deployments
    1. CI / CD pipelines (e.g. source code repository, build systems deployment, testing automation)
    2. Release management - small changes, tested, incremental, & tracked
    3. Roll Back - revert without introducing operational issues or causing operational impact 

7

Best Practices: Operation

  1. Standardized, manageable, routine basis
  2. Automation, small changes, regular quality assurance testing
  3. Mechanisms to track, audit, roll back, and review changes
  4. Changes should not be large, infrequent, need scheduled downtime, or manual
  5. KPIs should be collected and reviewed
  6. Automation to failures
  7. Avoid manual processes for deployments, release management, changes, rollbacks
  8. Align monitoring to business needs
  9. Avoid ad hoc and non-centralized monitoring

8

Best Practices: Response

  1. Responses should be automated (mitigation, remediation, rollback, and recovery)
  2. Alerts should be timely, and invoke escalations when automated responses are not enough
  3. QA mechanisms should be in play to automatically roll back failed deployments
  4. Responses should follow a pre-defined playbook
  5. Escalation paths should be defined and include both functional and hierarchical escalation paths
  6. Hierarchical escalations should be automated
  7. Escalated priority should result in stakeholder notifications

9

AWS Key Services: Preparation

  1. Preparation
    1. AWS Config - provides detailed inventory of your AWS resources, configurations, and continuously records configuration changes
    2. Service Catalog - helps to create a standardized set of service offerings that are aligned to best practices
    3. Designing workloads to use automation with services like Auto Scaling, SQS

10

AWS Key Services: Operation

  1. Tools to manage and automate code changes to AWS workloads
    1. AWS CodeCommit
    2. AWS CodeDeploy
    3. ASW CodePipeline
  2. Use AWS SDKs to automate operatonal changes
  3. Use AWS CloudTrail to audit and track changes made to AWS environments

11

AWS Key Services: Response

  1. Response
    1. CloudWatch - for effective automated responses
    2. CloudWatch to set alerting and notification
    3. CloudWatch - to trigger automated response

12

Questions: Preparation

  1. What best practices for cloud operations are you using?
  2. How are you doing configuration management for your workload?

13

Questions: Operations

  1. How are you evolving your workload the minimizing the impact of change?
  2. How are you monitoring? 

14

Questions: Response

  1. How do you respond to unplanned operational events?
  2. How is escalation managed when responding to unplanned operational events?