Data Loading/Migration Flashcards

1
Q

Name a scenario where the administrator will defer sharing calculation permissions?

A

When an admin needs to perform a large number of configuration changes which could lead to very long sharing rule evaluations or timeouts.

The deferral can help users process a large number of sharing-related configuration changes quickly during working hours, and then let the recalculation process run overnight between business days or over a weekend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the best practice when you want to use the most efficient operations when loading data from the API?

A

Use the fastest operation possible - insert() is fastest, update() is next and upsert() is next after that

Ensure the data is clean before loading when using the Bulk API. Errors in batches trigger single-row processing for that batch, and that processing heavily impacts performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an option when you want to reduce the data to transfer and process when loading data from the API?

A

Send only the fields that have changed (delta-only loads)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the best practice when you want to reduce transmission time and interruptions when loading data from the API?

A

For custom integrations:

  • Authenticate once per load, not on each record
  • Use GZIP compression and HTTP keep-alive to avoid drops during lengthy save operations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the best practice when you want to avoid unnecessary overhead when loading data from the API?

A

For custom integrations, authenticate once per load, not on each record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the best practice when you want to avoid computations when loading data from the API?

A

Use Public Read/Write security during initial load to avoid sharing calculation overhead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the possible steps to take when you want to reduce computations when doing initial loads from the API?

A

If possible for initial loads, populate roles before populating sharing rules

  1. Load users into roles
  2. Load record data with owners, triggering calculations in the role hierarchy
  3. Configure public groups and queues, and let those computations propagate
  4. Add sharing rules one at a time, letting computations for each rule finish before adding the next one

If possible, add people and data before creating and assigning groups and queues

  1. Load the new users and new record data
  2. Optionally, load new public groups and queues,
  3. Add sharing rules one at a time, letting computations for each rule finish before adding the next one
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the best practice when you want to defer computations and speed up load throughput when loading data from the API?

A

Disable Apex triggers, worfklow rules, and validations during loads; Investigate the use of bach Apex to process records after the load is complete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the best practice when you want to Balance efficient batch sizes against potential timeouts when loading data using SOAP API?

A

When using the SOAP API, use as many batches as possible - up to 200 - that still avoid network timeouts if:

  • Records are large
  • Save operations entail a lot of processing that cannot be deferred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the best practice when you want to optimize the Lightning Platform to work with Salesforce when loading data from the API?

A

Use Lightning Platform Web Service Connector (WSC) instead of other Java API clients, like Axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the best practice when you want to minimize parent record-locking conflicts when loading data from the API?

A

When changing child records, group them by parent -

ex: group records by the field ParentId in the same batch to minimize locking conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the best practice when you want to defer sharing calculations when loading data from the API?

A

Use the defer sharing calculation permission to defer sharing calculations until after all data has been loaded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the best practice when you want to avoid loading data into Salesforce?

A

Use mashups to create coupled integrations of applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the best practice when you want to use the most efficient operations when extracting data from the API?

A

Use the getUpdated () and getDeleted() SOAP API to sync an external system with Salesforce at intervals greater than 5 minutes.

Use the outbound messaging feature for more frequent syncing

When using a query that can return more than one million results, consider using the query capabiity of the Bulk API, which might be more suitable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name 5 possible best practices you can take when you want to upload data (that will make your life easier in terms of locking, calculation time, etc)

A
  1. Disable triggers (or have bypass logic for triggers), workflow rules, validation rules (but not at the cost of data integrity)
  2. Defer calculation of sharing rules
  3. Insert + Update is faster than upsert
  4. Group and sequence data to avoid parent record locking
  5. Tune the bath size (HTTP Keepalives, GZIP compression)
  6. Minimize the number of fields loaded for each record. Foreign key, lookup relationships, and roll up summary fields are likely to increase processing times.
  7. Minimize the number of triggers where possible, or alternatively, convert complex trigger code to Batch Apex that processes asynchronously after data is loaded
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the impact of organization -wide sharing defaults when loading data?

A

When you load data with a private sharing model, the system calculates sharing as the records are being added. If you load with a public read/write sharing model you can defer this processing until after cut over

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the impact of complex object relationships when loading data?

A

The more lookups you have defined on an object, the more checks the system had to perform during data loading. If you can establish some of these relationships in a later phase, loading will be quicker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the impact of the following two sharing rules (ownership-based vs criteria based) when loading data?

A

If you have ownership-based sharing rules configured before loading data, the insert requires sharing calculations if the owner belongs to a role or group that defines the data to be shared.

If you have criteria-based sharing rules configured before loading data, each record with fields that match the criteria also requires sharing calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the impact of workflow rules, validation rules, and triggers when loading data?

A

They can slow down processing if they are enabled during massive data loads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

When loading lean, what are the 3 things you should load first?

A
  1. Parent records with master-detail children (Parent record has to exist before loading child records)
  2. Record owners (users): Owners need to exist in the system before you can load the data
  3. Role Hierarchy: No benefit to deferring setting up the role hierarchy in the beginning (versus at the end)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When you turn off validation rules, workflow rules, assignment rules and triggers, what can you do to preserve data integrity prior to the data loads?

A

Before loading the data

  1. Query data set before loading to find and fix records that don’t conform to the rules
  2. Extract parent IDs and update the source data to include the parent IDs before loading the child records
22
Q

When loading data and you don’t want triggers to fire off, what can you do?

A

Create a custom setting and corresponding checkbox to control when a trigger should fire. Then include that statement in your code to check for these.

23
Q

If you had turned off validation rules, workflow rules, assignment rules and triggers, what are 4 possible steps you should do after you loaded the data (and before turning the rules back on)

A
  1. Add lookup relationships between objects, roll-up summary fields to parent records and other data relationships between records
  2. Enhance records in SFDC with foreign keys or other data to facilitate integration with your other systems.
  3. Batch Apex and Bulk API are efficient methods for performing these udpates to a large number of records
  4. Reset the fields on the custom settings you created for triggers
  5. Turn validation, workflow and assignment rules back on
24
Q

What are the 4 sequenced steps for loading data?

A
  1. Configure org for the data load
  2. Prepare to load data
  3. Execute the data load
  4. Configure org for production
25
Q

When configuring an org for a data load, what should be done?

A
  • Consider enabling parallel recalculation and defer sharing calculation (contact SFDC Customer Support)
  • Create Role Hierarchy
  • Load users, assigning them to appropriate roles
  • Configure public read/write org wide sharing defaults on the object you plan to load
26
Q

How can you defer sharing calculations?

A

Contact SFDC Customer support

27
Q

When you are preparing to load data, what should be done?

A
  • Clean data, especially foreign key relationships. When there’s an error parallel loads switch to single execution mode slowing down the load considerably
  • Suspend events that fire on insert
  • Perform advance testing to tune batch sizes for throughput
28
Q

When you are executing the data load, what should be done?

A
  • Load parent objects first, extract keys as needed for later loading
  • Use fastest operation possible (insert and update is faster than upsert
  • For updates, only send fields that have changed for existing records
  • Group child records by parentid so that separate batches don’t reference the same parentIds (could reduce or eliminate the risk of record-locking errors)
29
Q

When you have finished loading your data and are configuring the org for production, what should be done?

A

Defer sharing calculations before performing some or all of the following operations

  • Change public read/write org wide settings
  • Create or configure public groups and queues
  • Configure sharing rules

If you are not using deferred sharing calculation, create public groups, queues and sharing rules one at a time and let the calculations complete before moving on to the next one

Resume events that fire on insert so validation and data enhancement processes run properly

30
Q

What is the max batch size for SOAP API?

A

2,000 records

Unless there are two or more custom fields of type long text, then the batch size will drop to 200

31
Q

What is the max number of records that can be created or updated in one call when using SOAP API?

A

200 records

32
Q

How many batches can you submit (over a 24 hour period)

A

10,000 batches per day

33
Q

If you want to load up to 250,000 records, which API should typically be used?

A

SOAP API

34
Q

If you want to load more than 250,000 records, which API should typically be used?

A

Bulk API

35
Q

What is the rolling hr period for batches?

A

24 hours

36
Q

If you need synchronous communication, which APIs would be an option?

A

REST API

SOAP API

37
Q

If you need asynchronous communication, which APIs would be an option?

A

Bulk API

Streaming API

38
Q

If you need to use XML principles, which APIs would be an option?

A

REST API
SOAP API
Bulk API

39
Q

If you need to use JSON principles, which APIs would be an option?

A

REST API
Bulk API
Streaming API

40
Q

Which API would be the best for server-to-server integrations?

A

SOAP API

41
Q

Which API would be the best when dealing with large amounts of data

A

Bulk API

42
Q

Which API would be the best when you need to get frequent updates?

A

Streaming API

43
Q

Scenario: As part of a large Account data load of about 10 MM records, there needs to be an extract of Account Ids for building relationship in child records. What mechanism should be employed by the customer?
A. Use of SOAP API to load data and extract IDs
B. Use of Bulk API to load data and extract IDs
C. Use of Bulk API to load data and SOAP API to extract IDs
D. Use of Bulk API to load data and Bulk API to extract IDs

A

B

44
Q

If you need to use CSV principles, which API would be an option?

A

Bulk API

45
Q

If you need to use pub/sub, which API would be bets?

A

Streaming API

46
Q

What are some of the ETL tools you can use for Extracting, Transforming and Loading large volumes of data from multiple data sources

A

Talend
Informatica
Jitterbit
Data migrator

47
Q

What tool can you use to load 50,000 to 5,000,000 records?

A

You can use the data loader provided by Salesforce

48
Q

If you want to migrate historical data from a legacy system into Salesforce, what are some of the steps you can follow?

A
  1. Use a staging database consisting out of 2 layers (Transformation layer and target layer). The transformation layer is a set of intermediate database structures used for transformation and data quality rules. The target layer has tables structured identical to SFDC.
  2. Code your ETL logic to incorporate triggers, workflow rules and validation rules
  3. Turn off triggers, workflow rules, validation rules, and sharing rules
  4. Load reference data
  5. Run full migration into sandbox
  6. Prepare reports on records in source system and in Sandbox. Identify any missing data
  7. Fix issues and repeat until no errors
  8. Run full migration in a full sandbox environment
  9. Load records in increments of 1M into production using a job-run table so that failed jobs can be restarted. If you have to rerun a load that failed, ensure it is restarted with Upsert turned on.
49
Q

Scenario: You are migrating data from the on-premise system to SFDC. Your initial loads go quickly (ex. loading the parent Account object). Subsequent loads of some related objects are plagued with lock contention errors. What could be the reason?

A

The batch records of the child records were not pre-sorted by the parent ID in the input CSV. This will lessen the chance of parent record lock contention among parallel load batches

Sharing configuration setup prior to the data load could further contribute to the poor loading performance. Defer the org’s sharing calculation until after the data load is done.

50
Q

To avoid going too lean when loading lean, all but one of these should not be touched. Which is it?

  • Record owners
  • Organization-wide sharing defaults
  • Role hierarchy
  • Parent records with master-detail children
A

Organization-wide sharing settings

51
Q

Universal Containers would like to import over 1 million Lead records into Salesforce. What API should be used for optimal performance during the data load?
(Choose one answer)

A. Soap API

B. Bulk API in serial mode

C. Streaming API

D. Bulk API in parallel mode

A

D. Bulk API in parallel mode

52
Q

Universal Containers (UC) has two existing Salesforce orgs used for Sales and Service. UC plans to merge one of the Salesforce orgs (the source) into the other org (the target).

Which two tasks should be completed first to document the target data architecture?

Choose two answers

A. Analyze the source and target Salesforce orgs to identify data model conflicts and gaps.

B. Gather and analyze the system audit logs from each org to reconcile the metadata change history

C. Gather and analyze the new business requirements for the combined Salesforce orgs.

D. Deploy all customizations and data as is from the source org to the target org to identify conflicts

A

A. Analyze the source and target Salesforce orgs to identify data model conflicts and gaps.

C. Gather and analyze the new business requirements for the combined Salesforce orgs.