Data Loading/Migration Flashcards by Gwynn Bleidd

Name a scenario where the administrator will defer sharing calculation permissions?

When an admin needs to perform a large number of configuration changes which could lead to very long sharing rule evaluations or timeouts.

The deferral can help users process a large number of sharing-related configuration changes quickly during working hours, and then let the recalculation process run overnight between business days or over a weekend

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to use the most efficient operations when loading data from the API?

Use the fastest operation possible - insert() is fastest, update() is next and upsert() is next after that

Ensure the data is clean before loading when using the Bulk API. Errors in batches trigger single-row processing for that batch, and that processing heavily impacts performance.

How well did you know this?

Not at all

Perfectly

What is an option when you want to reduce the data to transfer and process when loading data from the API?

Send only the fields that have changed (delta-only loads)

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to reduce transmission time and interruptions when loading data from the API?

For custom integrations:

Authenticate once per load, not on each record
Use GZIP compression and HTTP keep-alive to avoid drops during lengthy save operations

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to avoid unnecessary overhead when loading data from the API?

For custom integrations, authenticate once per load, not on each record.

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to avoid computations when loading data from the API?

Use Public Read/Write security during initial load to avoid sharing calculation overhead

How well did you know this?

Not at all

Perfectly

What are the possible steps to take when you want to reduce computations when doing initial loads from the API?

If possible for initial loads, populate roles before populating sharing rules

Load users into roles
Load record data with owners, triggering calculations in the role hierarchy
Configure public groups and queues, and let those computations propagate
Add sharing rules one at a time, letting computations for each rule finish before adding the next one

If possible, add people and data before creating and assigning groups and queues

Load the new users and new record data
Optionally, load new public groups and queues,
Add sharing rules one at a time, letting computations for each rule finish before adding the next one

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to defer computations and speed up load throughput when loading data from the API?

Disable Apex triggers, worfklow rules, and validations during loads; Investigate the use of bach Apex to process records after the load is complete

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to Balance efficient batch sizes against potential timeouts when loading data using SOAP API?

When using the SOAP API, use as many batches as possible - up to 200 - that still avoid network timeouts if:

Records are large
Save operations entail a lot of processing that cannot be deferred

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to optimize the Lightning Platform to work with Salesforce when loading data from the API?

Use Lightning Platform Web Service Connector (WSC) instead of other Java API clients, like Axis

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to minimize parent record-locking conflicts when loading data from the API?

When changing child records, group them by parent -

ex: group records by the field ParentId in the same batch to minimize locking conflicts

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to defer sharing calculations when loading data from the API?

Use the defer sharing calculation permission to defer sharing calculations until after all data has been loaded

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to avoid loading data into Salesforce?

Use mashups to create coupled integrations of applications

How well did you know this?

Not at all

Perfectly

What is the best practice when you want to use the most efficient operations when extracting data from the API?

Use the getUpdated () and getDeleted() SOAP API to sync an external system with Salesforce at intervals greater than 5 minutes.

Use the outbound messaging feature for more frequent syncing

When using a query that can return more than one million results, consider using the query capabiity of the Bulk API, which might be more suitable

How well did you know this?

Not at all

Perfectly

Name 5 possible best practices you can take when you want to upload data (that will make your life easier in terms of locking, calculation time, etc)

Disable triggers (or have bypass logic for triggers), workflow rules, validation rules (but not at the cost of data integrity)
Defer calculation of sharing rules
Insert + Update is faster than upsert
Group and sequence data to avoid parent record locking
Tune the bath size (HTTP Keepalives, GZIP compression)
Minimize the number of fields loaded for each record. Foreign key, lookup relationships, and roll up summary fields are likely to increase processing times.
Minimize the number of triggers where possible, or alternatively, convert complex trigger code to Batch Apex that processes asynchronously after data is loaded

How well did you know this?

Not at all

Perfectly

What is the impact of organization -wide sharing defaults when loading data?

When you load data with a private sharing model, the system calculates sharing as the records are being added. If you load with a public read/write sharing model you can defer this processing until after cut over

How well did you know this?

Not at all

Perfectly

What is the impact of complex object relationships when loading data?

The more lookups you have defined on an object, the more checks the system had to perform during data loading. If you can establish some of these relationships in a later phase, loading will be quicker

How well did you know this?

Not at all

Perfectly

What is the impact of the following two sharing rules (ownership-based vs criteria based) when loading data?

If you have ownership-based sharing rules configured before loading data, the insert requires sharing calculations if the owner belongs to a role or group that defines the data to be shared.

If you have criteria-based sharing rules configured before loading data, each record with fields that match the criteria also requires sharing calculations

How well did you know this?

Not at all

Perfectly

What is the impact of workflow rules, validation rules, and triggers when loading data?

They can slow down processing if they are enabled during massive data loads

How well did you know this?

Not at all

Perfectly

When loading lean, what are the 3 things you should load first?

Parent records with master-detail children (Parent record has to exist before loading child records)
Record owners (users): Owners need to exist in the system before you can load the data
Role Hierarchy: No benefit to deferring setting up the role hierarchy in the beginning (versus at the end)

How well did you know this?

Not at all

Perfectly

When you turn off validation rules, workflow rules, assignment rules and triggers, what can you do to preserve data integrity prior to the data loads?

Study These Flashcards

Before loading the data

Query data set before loading to find and fix records that don’t conform to the rules
Extract parent IDs and update the source data to include the parent IDs before loading the child records

When loading data and you don’t want triggers to fire off, what can you do?

Study These Flashcards

Create a custom setting and corresponding checkbox to control when a trigger should fire. Then include that statement in your code to check for these.

If you had turned off validation rules, workflow rules, assignment rules and triggers, what are 4 possible steps you should do after you loaded the data (and before turning the rules back on)

Study These Flashcards

Add lookup relationships between objects, roll-up summary fields to parent records and other data relationships between records
Enhance records in SFDC with foreign keys or other data to facilitate integration with your other systems.
Batch Apex and Bulk API are efficient methods for performing these udpates to a large number of records
Reset the fields on the custom settings you created for triggers
Turn validation, workflow and assignment rules back on

What are the 4 sequenced steps for loading data?

Study These Flashcards

Configure org for the data load
Prepare to load data
Execute the data load
Configure org for production

When configuring an org for a data load, what should be done?

- Consider enabling parallel recalculation and defer sharing calculation (contact SFDC Customer Support) - Create Role Hierarchy - Load users, assigning them to appropriate roles - Configure public read/write org wide sharing defaults on the object you plan to load

How can you defer sharing calculations?

Contact SFDC Customer support

When you are preparing to load data, what should be done?

- Clean data, especially foreign key relationships. When there's an error parallel loads switch to single execution mode slowing down the load considerably - Suspend events that fire on insert - Perform advance testing to tune batch sizes for throughput

When you are executing the data load, what should be done?

- Load parent objects first, extract keys as needed for later loading - Use fastest operation possible (insert and update is faster than upsert - For updates, only send fields that have changed for existing records - Group child records by parentid so that separate batches don't reference the same parentIds (could reduce or eliminate the risk of record-locking errors)

When you have finished loading your data and are configuring the org for production, what should be done?

Defer sharing calculations before performing some or all of the following operations - Change public read/write org wide settings - Create or configure public groups and queues - Configure sharing rules If you are not using deferred sharing calculation, create public groups, queues and sharing rules one at a time and let the calculations complete before moving on to the next one Resume events that fire on insert so validation and data enhancement processes run properly

What is the max batch size for SOAP API?

2,000 records | Unless there are two or more custom fields of type long text, then the batch size will drop to 200

What is the max number of records that can be created or updated in one call when using SOAP API?

200 records

How many batches can you submit (over a 24 hour period)

10,000 batches per day

If you want to load up to 250,000 records, which API should typically be used?

SOAP API

If you want to load more than 250,000 records, which API should typically be used?

Bulk API

What is the rolling hr period for batches?

24 hours

If you need synchronous communication, which APIs would be an option?

REST API | SOAP API

If you need asynchronous communication, which APIs would be an option?

Bulk API | Streaming API

If you need to use XML principles, which APIs would be an option?

REST API SOAP API Bulk API

If you need to use JSON principles, which APIs would be an option?

REST API Bulk API Streaming API

Which API would be the best for server-to-server integrations?

SOAP API

Which API would be the best when dealing with large amounts of data

Bulk API

Which API would be the best when you need to get frequent updates?

Streaming API

Scenario: As part of a large Account data load of about 10 MM records, there needs to be an extract of Account Ids for building relationship in child records. What mechanism should be employed by the customer? A. Use of SOAP API to load data and extract IDs B. Use of Bulk API to load data and extract IDs C. Use of Bulk API to load data and SOAP API to extract IDs D. Use of Bulk API to load data and Bulk API to extract IDs

If you need to use CSV principles, which API would be an option?

Bulk API

If you need to use pub/sub, which API would be bets?

Streaming API

What are some of the ETL tools you can use for Extracting, Transforming and Loading large volumes of data from multiple data sources

Talend Informatica Jitterbit Data migrator

What tool can you use to load 50,000 to 5,000,000 records?

You can use the data loader provided by Salesforce

If you want to migrate historical data from a legacy system into Salesforce, what are some of the steps you can follow?

1. Use a staging database consisting out of 2 layers (Transformation layer and target layer). The transformation layer is a set of intermediate database structures used for transformation and data quality rules. The target layer has tables structured identical to SFDC. 2. Code your ETL logic to incorporate triggers, workflow rules and validation rules 3. Turn off triggers, workflow rules, validation rules, and sharing rules 4. Load reference data 5. Run full migration into sandbox 6. Prepare reports on records in source system and in Sandbox. Identify any missing data 7. Fix issues and repeat until no errors 8. Run full migration in a full sandbox environment 9. Load records in increments of 1M into production using a job-run table so that failed jobs can be restarted. If you have to rerun a load that failed, ensure it is restarted with Upsert turned on.

Scenario: You are migrating data from the on-premise system to SFDC. Your initial loads go quickly (ex. loading the parent Account object). Subsequent loads of some related objects are plagued with lock contention errors. What could be the reason?

The batch records of the child records were not pre-sorted by the parent ID in the input CSV. This will lessen the chance of parent record lock contention among parallel load batches Sharing configuration setup prior to the data load could further contribute to the poor loading performance. Defer the org's sharing calculation until after the data load is done.

To avoid going too lean when loading lean, all but one of these should not be touched. Which is it? - Record owners - Organization-wide sharing defaults - Role hierarchy - Parent records with master-detail children

Organization-wide sharing settings

Universal Containers would like to import over 1 million Lead records into Salesforce. What API should be used for optimal performance during the data load? (Choose one answer) A. Soap API B. Bulk API in serial mode C. Streaming API D. Bulk API in parallel mode

D. Bulk API in parallel mode

Universal Containers (UC) has two existing Salesforce orgs used for Sales and Service. UC plans to merge one of the Salesforce orgs (the source) into the other org (the target). Which two tasks should be completed first to document the target data architecture? Choose two answers A. Analyze the source and target Salesforce orgs to identify data model conflicts and gaps. B. Gather and analyze the system audit logs from each org to reconcile the metadata change history C. Gather and analyze the new business requirements for the combined Salesforce orgs. D. Deploy all customizations and data as is from the source org to the target org to identify conflicts

A. Analyze the source and target Salesforce orgs to identify data model conflicts and gaps. C. Gather and analyze the new business requirements for the combined Salesforce orgs.

Data Loading/Migration Flashcards

(52 cards)