Data Skewing Flashcards

1
Q

Provide a solution for the following situation:

The customer had hundreds of thousands of account records and 15 Mil invoices, which were within a custom object in a master-detail relationship with the account. Each account record took a long time to display because of the invoices related list’s lengthy rendering time

A

The delay in displaying the invoices related list was related to data skew. While most account records had few invoice records, there were some records that had thousands of them.

To reduce the delay, the customer tried to reduce the number of invoice records for those parents and keep data skew to a minimum in child objects. Using the Enable Separate Loading of Related Lists setting allowed the account detail to render while the customer was waiting for the related list query to complete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the best practice when you want to make deployments more efficient when you have many parent/child records in the batch

A

Distribute child records so that no parent has more than 10,000 child records. For example, in a deployment that has many contacts but does not use accounts, setup several dummy accounts and distribute the contacts among them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is record ownership skew?

A

Large number of records with the same object type owned by a single user

Record ownership is a powerful feature for managing record access. When individual users own the records they create, the role hierarchy makes sure managers have access to the data owned by their subordinates. But when a single user owns a high percentage of the data for any one object, Force.com must perform a large sharing recalculations when you move that user in the hierarchy. The recalculations can be even worse when you add or remove the user to a role or public group that uses a sharing rule to make its data visible to other users in the organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you try and limit record ownership skew?

A
  • Design ownership strategy from the beginning so users own the data they create, then use the role hierarchy and sharing rules to provide access to others
  • When you have a single owner with a large amount of data, place them in their own role at the top of the hierarchy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is parent/child data skew?

A

When a large number of child records are associated with the same parent record, performance can degrade when the ownership of contacts (ex) change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you avoid parent/child data skew?

A

SFDC recommends that you keep the number of child records assigned to a single parent below 10,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is account data skew?

A

Accounts and Opportunities have special data relationships that maintain parent and child record access under private sharing models. Too many child records associated with the same parent object causes data skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two issues can happen with account data skew?

A

Record locking

Sharing Issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why does record locking occur (or when will it)?

A

With Account data skew. When updating a large number of contacts under the same account in multiple threads. For each update the system locks both the contact and its parent to maintain integrity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why does sharing issues occur (or when will it)?

A

With account data skew. If you change the owner of an account, you may need to examine every one of the account’s child records and adjust their sharing as well. That may include recalculating the role hierarchy and sharing rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is lookup skew?

A

When a very large number of records are associated with a single record in the lookup object (the object you’re searching against).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What strategies/techniques can you follow for mitigating problems related to Lookup Skew?

A

When you encounter any type of lock exception, try the following:

  1. Reducing Record Save time (ie. increase save performance, optimize trigger/class code, reduce workflow, consider asynchronous operations, etc)
  2. Distributing the skew
  3. Using a picklist field instead of a lookup field
  4. Reducing the load (i.e. from automated processes and integrations running concurrently)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the reason locking happens (for example when you have account data skews)

A

When adding a new contact, when you click Save, the database automatically locks the parent account when it begins the DML operation and before it actually inserts the contact. The database releases the lock after executing the triggers and standard save operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a side effect of account skew (and the data locking issue)

A

Parent Implicit Sharing

In a private sharing model, the built-in implicit sharing feature provides record accessibility, and its parent implicit sharing provides read access to an account for users who have access to standard child objects such as Contacts, Cases and Opportunities.

So when you create a contact, sharing calculations determine during the save operation if a parent implicit share to the account should be created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the impact of account data skew using a scenario

A

User Jane has access to contact Bob Smith and has a parent share to the single generic account. Her manager changes ownership of the contact Bob Smith to another salesperson and clicks Save.

The sharing calculations now run for a longer period of time because they have to determine whether to delete the parent implicit share. The calculations check if Jane has access to the remaining 299,999 contacts under the single generic account.

If another salesperson tries to add a new contact for the same account while the sharing calculations are occurring, that request will wait for Force.com to release the lock on the account, resulting in lock contention and reduced database concurrency. Because this is a synchronous request, this request starts counting against the concurrent Apex request limit if the wait exceeds 5 seconds. If the wait exceeds 10 seconds, the salesperson will get an “UNABLE_TO_LOCK_ROW” error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you avoid account data skew (Provide 3 options)

A

Design architecture to limit account objects to 10,000 children. (Some possible methods include creating a pool of Accounts and assigning children in a round robin fashion or using Custom Settings for the current Account and the number of children)

If possible, consider a Public Read/Write sharing model in which the parent account stays locked, but sharing calculations don’t occur.

If you have a skewed account, redistribute child objects in chunks during off-peak hours to lessen the impact of record-level lock contention. Batch Apex or the Bulk API are useful ways to re-parent.

17
Q

When architecting record ownership, what’s the suggested parameter for child records per parent to avoid data skew?

A

No more than 10,000 child records per parent

18
Q

What is the difference between ownership skew and lookup skew?

A

Ownership skew involves a large number of record with the same object type being owned by a single user, while lookup skew occurs when a large number of records are associated with a single record in the lookup object

19
Q

Why is Lookup Skew Bad?

A

Lookup fields are essentially foreign key relationships between objects. Every time a record is inserted or updated it locks the target records that are selected for each lookup field until the data is committed to the database.

Locks can occur when you try to insert or update records in a LDV environment where lookup skew exists

20
Q

With respect to Lookup Skew, what is an alternative to having a “catch all” lookup value?

A

Leave the value blank, which will reduce /eliminate the skew

21
Q

When should you use a Picklist field instead of a Lookup field (to avoid Lookup Skew)

A

When you have a relatively low number of values

22
Q

What is Index skew?

A

Essentially similar to lookup skew, when a large number of records point to the same index.

23
Q

What is a symptom of Index Skew?

A

Index row lock (when two updates occur at the same time and the index, which needs to be rebuilt, is large)