Operational Excellence Flashcards
(41 cards)
Design principals
- Perform operations as code: Run infraestructure as code, scripts.
- Make frequent, smaill, reversible changes.
- Refine operations procedures frecuently.
- Anticipate failute: Test failure scenarios, Fail fast.
- Learn from all operational failures: document and share.
Practice areas
- Organization
- Prepare,
3, Operate, - Evolve
Organization practice area
need to understand organization priorities, structure, how organization supports team members so they can support business outcomes
Organization Priorities
- Evaluate external customers needs.
- Evaluate internal customers needs.
- Evaluate governance requirements,
- Evaluate compliance requirements.
- Evaluate threat landscape.
- Evaluate tradeoffs.
- Manage benefits and risks,
Organization Operation model
understand roles, responsability, how decisions are made. Models that rule the company.
Operating model 2 by 2 representations
understand relationshipe between teams in your environment. WHO does WHAT.
Operating model - Fully separated model
Application and platform are managed by a fully separed team. Work is passed between teams through mechanisms such as work requests, work queues, tickets, or by using an IT service management (ITSM) system.
Operating model - Separated AEO and IEO
Here we follow the “you build it, you run it” methodology. The engineers are responsible for the engineering and operation of their workload. To organize the teams, you should use AWS Organizations and AWS Control Tower. The platform engineering team provides a standardized set of services (e.g. development or monitoring tools) and access to cloud services to the application team. The AWS Service Catalog can be used to govern the tooling.
PRO
Standards are distributed, provided, or shared
Strong feedback loop
Platform team supports Application team
Adopting standards may reduce reviews to enter production
CON
When changes or additions, Application Team always needs to discuss with Platform Team
AEO
Application Engineering and Operations
IEO
Infraestructure Engineering and Operations
Operating model - Separated AEO and IEO with centralized governance and a Service Provider
Similar to the centralized governance, but you offload some operations tasks such a patching and updating to Managed Services. These service is handled by AWS and they take care of these tasks
PRO
Offload “boring” operational tasks
Gain advantage of your providers’ standards, best practices, processes, and expertise
Latest service offerings
CON
Does not address the bottlenecks and delays created by transition of tasks between teams
Operating model - Separated AEO and IEO with centralized governance and an internal service provider consulting partner
This model also establishes the “you build it, you run it” methodology. But the difference to the previous model, this enables a Cloud Operations and Platform Enablement (COPE) team which supports when there are no cloud related topics. It provides a forum to ask questions, discuss needs, and identify solutions. The platform engineering team builds the core shared platform capabilities governance via the AWS Service Catalog.
PRO
Adopting more DevOps culture
Enabling cloud transformation for teams, establishes centralized cloud governance, and defines account and organization management standards
Application Team get CI/CD-pipeline from COPE
Remove Barriers that slow application team adoption of beneficial cloud capabilities
CON
involves huge effort to facilitate cloud adoption and organization standards
CCoE
Cloud Center of Enablement
COPE
Cloud Operations and Platform Enablement
Operating model - Separated AEO and IEO with decentralized governance
In this model the application engineers and developers perform both platform and application for engineering and operational workloads. Standards are still distributed by the platform team but the application teams are more free to engineer and operate their own capabilities in support of their workload.
PRO
Fewer constraints
More free in choosing own tooling
CON
Higher responsibilities of Application Engineer
Risk of rework is higher
Enforce policies (Governance via AWS Organizations and AWS Control Tower)
Operating model - relationship and ownership - Resources have identified owners
Understand who has ownership of each application, workload, platform, and infrastructure component, what business value is provided by that component, and why that ownership exists.
1.Define forms of ownership and how they are assigned
2.Define who owns an organization, account, collection of resources, or individual components
3.Capture ownership in the metadata for the resources
Operating model - relationship and ownership - Processes and procedures have identified owners
Understand who has ownership of the definition of individual processes and procedures, why those specific process and procedures are used, and why that ownership exists.
1.Identify process and procedures
2.Define who owns the definition of a process or procedure
3.Capture ownership in the metadata of the activity artifact
Operating model - relationship and ownership - Operations activities have identified owners responsible for their performance
Understand who has responsibility to perform specific activities on defined workloads and why that responsibility exists. Understanding who has responsibility to perform activities informs who will conduct the activity, validate the result, and provide feedback to the owner of the activity.
Operating model - relationship and ownership - Team members know what they are responsible for
Understanding the responsibilities of your role and how you contribute to business outcomes informs the prioritization of your tasks and why your role is important. This enables team members to recognize needs and respond appropriately.
Operating model - relationship and ownership - Mechanisms exist to identify responsibility and ownership
Where no individual or team is identified, there are defined escalation paths to someone with the authority to assign ownership or plan for that need to be addressed.
Operating model - relationship and ownership - Mechanisms exist to request additions, changes, and exceptions
You are able to make requests to owners of processes, procedures, and resources. Make informed decisions to approve requests where viable and determined to be appropriate after an evaluation of benefits and risks.
Operating model - relationship and ownership - Responsibilities between teams are predefined or negotiated
Have defined or negotiated agreements between teams describing how they work with and support each other (for example, response times, service level objectives, or service level agreements).
Organizational culture
- Executive Sponsorship
- Team members are empowered to take action when outcomes are at risk
- Escalation is encouraged
- Communications are timely, clear, and actionable
- Experimentation is encouraged
- Team members are enabled and encouraged to maintain and grow their skill sets
- Resource teams appropriately
- Diverse opinions are encouraged and sought within and across teams
Prepare
understand your workloads and their expected behaviors. You will then be able to design them to provide insight to their status and build the procedures to support them.
To prepare for operational excellence, you need to perform the following:
1.Design telemetry
2.Design for operations
3.Mitigate deployment risks
4.Operational readiness and change management