High Availability Flashcards

Question 1

Q

What is the minimum supported node configuration for high availability?

Answer

A

three-node system

Question 2

Q

Can you have more than one Gateway process running on a single machine? On a cluster?

Answer

A

On a single machine no, and no point.

On a cluster yes, and they are all active.

Question 3

Q

What is the goal of HA within TS?

Answer

A

The goal is to minimize UNPLANNED downtime.

Question 4

Q

What does TS attempt to do in the event of a component failure?

Answer

A

Tries to automatically re-spawn it.

Question 5

Q

What components need to be redundant in order to achieve system HA?

Answer

A

Every single component.

Question 6

Q

Redundancy across nodes is possible for all processes except _________ service.

Why?

Answer

A

Licensing service

It can only run on the dedicated Primary node.

Question 7

Q

Gateway process can run on any node of the TS cluster.

True or false?

Question 8

Q

We can only run the Gateway process on one node at a time.

True or false?

Answer

A

False

Gateway process can run on any and all nodes of the TS cluster. They are all active.

Question 9

Q

What does Tableau recommend in order to make your Gateway process HA?

Answer

A

Have more than one node in a Tableau Server cluster and configure more than just one to run the Gateway process.

They recommend that we run an instance of the Gateway on each node.

Question 10

Q

What happens when the Gateway fails?

Answer

A

If no Gateway processes are running, the entire TS cluster will be unavailable. If other Gateway processes remain running, requests made to those working Gateways will be processed normally. However, any requests received by the failed Gateway will not be redirected and will continue to fail, despite the presence of other functioning Gateways.

Question 11

Q

How to make your system robust to Gateway failures?

Answer

A

Run multiple Gateways across the cluster and configure an external load balancer to route traffic accordingly.

Question 12

Q

How can you achieve HA with the Application Server?

Answer

A

Configure instances of the App Server on each node in the cluster.

Question 13

Q

What happens when an Application Server process fails?

Answer

A

Requests being handled by that instance of the App Server will fail, but subsequent requests will be routed to other running Application Servers.

If the node containing the failed App Server is still running, the failed process should automatically restart itself within seconds.

Question 14

Q

How many Coordination Service processes will be installed if I am running 4 nodes in a cluster? How about 3? 5?

Question 15

Q

What is a quorum?

Explain how this is important in the context of TS?

Answer

A

A quorum is just another way of saying an absolute majority.

A complete Tableau Server outage will occur if the number of running Coordination Service processes does not constitute a quorum, which is base on the total number of configured Coordination Services processes.

Question 16

Q

How many Coordination Service processes are installed on a three node cluster? How many node failures can be tolerated without crashing the entire system?

Answer

A

A cluster with either three or four computers is able to tolerate the loss of, at most, one node (one instance of the Coordination Service).

i.e.
Total CS processes: Three
Quorum equals: 2 (to have a majority)
Tolerates: 1 CS failure

Question 17

Q

Why do you need three nodes to achieve HA? Why note two?

Answer

A

A cluster with only two nodes cannot tolerate the loss of any a single Coordination Service process.

Question 18

Q

What happens when a Coordination Service process fails?

Answer

A

Nothing, as long as the number of remaining Coordination Service processes still constitutes a quorum.

If the number of still functioning Coordination Service processes number less than a quorum, the entire Tableau Server cluster becomes unavailable in order to protect the referential integrity of the underlying Postgres database.

Question 19

Q

How is the Cluster Controller process installed on a cluster?

Answer

A

One instance is installed on each node of the cluster. No explicit config is necessary.

Question 20

Q

What happens when a Cluster Controller process fails?

Answer

A

All other TS components on that same node will become unavailable and display as “unavailable” on the TS status page. Any Repository process running on that node will also by unavailable.

Question 21

Q

How do you protect against a Cluster Controller process failure?

Answer

A

Ensure that each unique server component has redundancy and is running on at least two different nodes in the cluster.

Question 22

Q

When a Cluster Controller restarts, it also restarts any _________ process configured on that node.

Answer

A

Repository

Question 23

Q

What happens if there is no fully-functioning Repository process running on the cluster?

Answer

A

The entire TS cluster will be unavailable.

Question 24

Q

How can you improve Repository availability?

Answer

A

Configure an additional “passive” Repository on a different node of the cluster.

Question 25

Q

How does the passive Repo relate to the active one?

Answer

A

The contents of the active Repository constantly stream to the passive Repository.

Question 26

Q

In an entire cluster, there can be a max of only ______ Repositories.

Can they be on the same node?

Question 27

Q

The ________ _________ manages Repository startup, shutdown, and any failover from active to passive.

Answer

A

Cluster Controller

Question 28

Q

What happens if a Cluster Controller that started a Repository process fails?

Answer

A

The Repository will also fail.

Question 29

Q

What happens when a Repository process fails?

Hint: Depends…

Answer

A

If the passive Repository fails, then users should experience no impact. Everything will continue to work since the active Repository is still functioning. In the background, the passive Repository will be restarted, and data replication will resume, though there may be some delay before the passive Repository is again fully synchronized with the active Repository.
If the active Repo fails, and if there is no fully-synchronized passive Repo, then TS will be unavailable until the active Repo can be restarted. The system will attempt to do this automatically, but, depending on the reason for the failure, this may not be possible. The active Repo is a single point of failure for the entire system if there is no synchronized passive Repo.
If the active Repo fails, and there is a fully-synchronized passive Repo available, and the cluster is configured for HA, then failover to the passive Repository will be automatically triggered. After the failover, the previously passive Repo will be the new active Repo. The system will restart the previously active Repo as the new passive Repo and begin synchronization. It also restarts other relevant processes automatically so they become aware of the newly-promoted active Repo and can reconnect. During this short window of restarts, users will experience a service disruption.

Question 30

Q

How do you manually switch the “active” designation between the two Repos?

Answer

A

You can do so using the tabadmin command:

tabadmin failoverrepository

Question 31

Q

How do you make the backgrounder HA?

Answer

A

Configure more than one Backgrounder processes to run on multiple nodes in the cluster.

Question 32

Q

What happens when the Backgrounder process goes down?

Answer

A

The jobs that the Backgrounder is working on will fail and will not be retried.

However, most background jobs are scheduled to run periodically, and the same background task will be picked up and performed normally at the next scheduled time by a functioning Backgrounder process.

If computer is functioning normally the backgrounder process will be automatically restarted, but the jobs will not be retried.

Question 33

Q

How do you ensure Data Server HA?

Answer

A

Configure one or more Data Serve processes to run on multiple nodes of the cluster.

Question 34

Q

What happens if a Data Server process fails?

Answer

A

Queries running via a proxy through the Data Server process will fail, resulting in a failed view rendering. Subsequent requests, including a retry of the failed operation, should succeed as long as a working Data Server exists that can accept rerouted requests.

Question 35

Q

Is Tableau Server dependent on the Data Server?

What happens if we take the process out of the configuration?

Answer

A

No, TS is not dependent on Data Server.

Without a running Data Server process, the cluster loses its ability for workbooks to proxy through to external data sources. Any view that does not use Data Server for on of its data sources should still function correctly.

Question 36

Q

What is the role of the Data Server process?

Answer

A

It is a component by which TS provides sharing and centralized management of Tableau Data Extracts and shared proxy database connections.

Question 37

Q

Five items that IT can centrally manage thanks to the Data Server?

Answer

A

Data connections and joins
Calculated fields (i.e. common definition of profit)
Field definitions
Sets and groups
User filters

Question 38

Q

Published data sources can be of two types. What are they? Describe each.

Answer

A

TDEs. Users can connect directly to a published data extract. This is fast, and takes the load off of critical systems. Also prevents proliferation of data silos around an org.
Shared proxy connections. Users can connect directly to live data with a proxy database connection. This means each user does not have to set up a separate connection. There is also no need for users to install database drivers, reducing the load on IT to distribute drivers and keep them up to date.

Question 39

Q

What does the Cache Server do?

Answer

A

It provides a shared external query cache. It’s a cache of key/values pairs that hold information from previous queries and speed up future requests.

Question 40

Q

What happens if the Cache Server becomes unavailable?

Answer

A

TS will continue to be available, but actions may take longer as they do not have pre-cached results available.

The impact is more on the end user performance/experience.

Question 41

Q

What is the role of the Data Engine component?

Answer

A

It loads and queries data extracts when when using in-memory analytics.

Question 42

Q

How do we make the Data Engine HA?

Answer

A

Configure one or more Data Engine processes to run on multiple nodes of the cluster.

Question 43

Q

All Data Engine processes run on _____ /_____ mode, meaning they _______ ____ ________ __________.

Answer

A

active / active

perform the same functions

Question 44

Q

Any node configured to run a Data Engine process will also be configured to run the ______________ process.

Answer

A

File Store process

Question 45

Q

What happens when a Data Engine process goes down?

Answer

A

Queries currently running on that Data Engine process will fail, resulting in a failed view rendering or failed extract refresh. Running the same operation again will automatically be reassigned to a different Data Engine that is functioning.

Question 46

Q

A File Store process will instantiate on any node that runs one or more ____________ processes.

Answer

A

Data Engine

Question 47

Q

What does the File Store do?

Overview…

Answer

A

It manages storage and replication of extract files between nodes.

Question 48

Q

How does the File Stork work?

Detail…

Answer

A

An extract file is created in the system when a user first publishes it to TS or an extract refresh of it occurs. Immediately after one of these events, the extract exists on a single File Store on a single node. The specific extract file cannot be said to be HA yet as it does not have redundancy and therefore is a single point of failure.

The File Store processes communicate with each other to quickly replicate local extracts to all other File Stores nodes in the cluster. The File Store process is designed to copy the files as quickly as the cluster network resources allow for, but can take a variable amount of time depending on the size of the extract.

Once a copy is available on multiple nodes within the clusters, the extract file is considered HA.

Question 49

Q

What happens if a File Store process fails?

Answer

A

Two consequences:

Copying extract files to and from the affected node stops.
Removal of no-longer needed extract files on the affected node is suspended. This removal process is usually referred to as “extract reaping”.

Question 50

Q

What is the main consequence of suspended extract reaping?

Answer

A

Accumulation of unwanted extracts consuming disk space.

Question 51

Q

What happens to extracts added to the failed File Store?

Answer

A

Extracts will be added to the targeted File Store but they will not sync up with other File Stores until the process is restarted.

Question 52

Q

How do you achieve Search and Browse HA?

Answer

A

Configure the process to run on multiple computers.

Question 53

Q

What happens if the Search and Browse process fails?

Answer

A

TS will largely be unusable, and though users can still log in to the system, workbook content will appear to be missing.

Note: The content is not actually missing, it is simply not being return in the search results. it will be redisplayed after the Search and Browse process restarts.

If more than one Search and Browse processes are configured and running on multiple nodes when failure occurs, requests made to a failed Search and Browse process will also fail, but subsequent requests will be routed to working Search and Browse processes.

As long as one process is running in the cluster, results can be returned across all nodes.

Question 54

Q

How to ensure VizQL HA?

Answer

A

Configure one or moe instances to run on multiple computers on the cluster.

Question 55

Q

What happens if a VizQL Server process fails?

Answer

A

If there is only one VizQL Server process running and it fails then TS will no longer be able to render any views. HA requires configuring redundant VizQL processes.

If multiple VizQL processes are running, then failure of a single process will result in the failure of requests and loss of session data at the time of that processes failure. Any future requests will be routed to the other working VizQL Server processes across the TS cluster.

Question 56

Q

How does Tableau designate the primary TS node?

How is this node unique?

Answer

A

It is identified by TS as the server where the first TS installation occurred.

The primary node is unique in that it includes a unique license management process and other admin functions, in addition to potentially being a fully-fledged server installation.

Question 57

Q

TS processes _____ the _______ node every __________ hours to perform ___________ checks.

Answer

A

poll

primary

72

licensing

Question 58

Q

What happens if the Primary node of the cluster is not available during the a license check?

Answer

A

The check will fail and our TS deployment will become “unlicensed” and be disabled.

Question 59

Q

How do you prevent licensing problems due to a bad Primary node?

Answer

A

Create a Backup Primary and keep it on standby.

Question 60

Q

Once configured and ready, the “backup” primary should be turned on and connected to the cluster.

True or false?

Why?

Answer

A

False!

It SHOULD NOT be turned on nor connected to the cluster.

This ensures that licensing and admin functions continue to work against the primary server node.

Brainscape's Knowledge GenomeTM

High Availability Flashcards

Brainscape's Knowledge Genome^TM