Bulk API Flashcards

1
Q

What is the max number of records that can be processed by Batch Apex

A

50 million

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How many GB of data can Bulk queries retrieved (and how are they divided up?)

A

Can retrieve up to 15GB of data, divided into 15 1GB files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In which scenario would Batch Apex not work well?

A

Anything Synchronous like a VF page that needs to query more than 50,000 records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What two operations does Bulk API support?

A

Query and queryAll

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the QueryAll operation do?

A
  • Returns records that have been deleted because of a merge or delete
  • Returns information about archived Task and Event records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What time limit is there on executing bulk API queries, and what error is thrown?

A

2 minutes and it fails with QUERY_TIMEOUT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For Bulk API, what happens when the results exceeds a 1GB file size (or takes longer than 10 minutes?)

A

The completed results are cached and another attempt is made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How many attempts are made for Bulk API when they timeout (or the file size is greater than 1GB), and what type of error is thrown

A
  1. After that it fails with Retried more than 15 times.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How long are Bulk API results stored?

A

7 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which API would be good to use when loading a few thousand to millions of records

A

Bulk API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

On which principle is Bulk API based on?

A

REST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the benefits for using Bulk API?

A
  • Developed to simplify and optimize the process of loading or deleting large data sets
  • Super-fast processing speeds
  • Reduced client-side programmatic language
  • Easy-to-monitor job status
  • Automatic retry of failed records
  • Support for parallel processing
  • Minimal roundout trips to Force.com
  • Minimal API calls
  • Limited dropped connections
  • Easy-to-tune batch size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the default chunk size for Bulk API?

A

100,000 record chunks by default

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you configure the chunk size for Bulk API?

A

Use chunkSize header to configure smaller chunks or larger ones up to 250,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the maximum chunk size for Bulk API?

A

250,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the file size limit for Bulk API

A

10 MB

17
Q

What is the limit on the number of records that can be processed by the Bulk API?

A

10,000 records

18
Q

What is the maximum character data limit for all the data in a batch when using Bulk API?

A

10 Million characters data

19
Q

What is the character field Max limit for Bulk API?

A

32,000 characters

20
Q

What is the limit for fields per record for Bulk API?

A

5,000 fields

21
Q

What is the limit for all characters per record for Bulk API?

A

400,000 characters per record

22
Q

What is the max size of a file that can be loaded using Bulk API?

A

10MB

23
Q

For binary content, what is the max zip file size when using Bulk API?

A

10MB

24
Q

For binary content, what is the max total size of the unzipped content when using Bulk API?

A

The total size of the unzipped content can’t exceed 20MB

25
Q

What is the degree of parallelism

A

The amount of work completed (as a duration) divided by the actual amount of time it took to complete that work

26
Q

How does the Bulk API work? (3 main steps)

A
  1. Data streamed in large batches directly to temporary storage over a simple HTTP connection
    (The client creates the job, send all data to server in batches, check status and at the end retrieve the result)
  2. Data set is managed in a job that can be monitored and controlled (aborted) from Admin Setup (The jobs are then split into multiple data batches that will return multiple results)
  3. Data set can be processed faster by allocating multiple servers to process in parallel (The Processing servers dequeue batch from job, insert or update records, save the results back to the job)
27
Q

Which of these is true about bulk queries?

  • Bulk API can access or query compound address and compound geolocation fields
  • Bulk queries always time out when querying more than 100,000 records
  • A bulk query can retrieve up to 15GB data divided into
    1GB files
  • In order to keep results lean, bulk query does not support queryAll operations
A

A bulk query can retrieve up to 15GB data divided into

1GB files

28
Q

Which of these is one of the advantages of using Bulk API when uploading large volumes of data?

  • Bulk API loads data in bite-size chunks, increasing the speed of jobs
  • Bulk API is optimized for real-time client applications
  • Bulk API only allows batches to be processed serially
  • Bulk API moves the functionality and work from your client application to the server
A

Bulk API moves the functionality and work from your client application to the server

29
Q

Which of the following is true: The hard delete function in the Bulk API:

  • Is disabled by default
  • Allows deleted records to stay in the Recycle Bin for 15 days
  • Can be used only when deleting fewer than 10,000 records
  • Is not a recommended strategy for deleting large data volumes
A

The hard delete function in the Bulk API is disabled by default

30
Q

What is enabled for the Bulk API by default?

A

Parallel Mode

31
Q

Describe Parallel Mode within the Bulk API

A

It is enabled by default. It allows for faster loading of data by processing batches in parallel

32
Q

What are the trade-offs with respect to Parallel Mode?

A

There is a risk of lock contention. Serial mode is an alternative to Parallel mode in order to avoid lock contentions

33
Q

When should you use Parallel Mode versus Serial Mode?

A

Whenever possible, as it is a best practice.

34
Q

When should you use Serial Mode versus Parallel Mode?

A

When there is a risk of lock contention and you cannot reorganize the batches to avoid these locks.

35
Q

How can you organize data load batches to avoid risks of lock contention?

A

By organizing the data by parent Id.

Suppose that you are inserting AccountTeamMember records and you have references to the same Account Id within multiple batches. You risk lock timeouts as these multiple batches process (for example in parallel) and attempt to lock the Account record at once. To avoid these lock contentions, organize your data by Account Id such that all AccountTeamMember Records referencing the same Account Id are in the same batch

36
Q

What does the Bulk API do when it encounters locks?

A
  1. Waits a few seconds for the lock to be released
  2. If lock is not released, the records is marked as failed
  3. If there are problems acquiring locks for more than 100 records in the batch, the remainder of the batch is put back in the queue and will be tried again later.
  4. When a batch is reprocessed, records that are marked as failed will not be retried. Resubmit these in a separate batch to have them processed
  5. The batch will be tried again up to 10 times before the batch is marked as failed
  6. As some records have succeeded, you should check the results of the data load to confirm success/error details
37
Q

With respect to data loads, any batch job that takes longer than this amount of time is suspended and returned to the queue for later processing

A

10 minutes.

38
Q

With respect to data loads, how can you optimize batch sizes?

A

All batches should run in under 10 minutes. Start with 5000 records per batch and adjust accordingly based on the processing time. If processing time is more than 5 minutes, reduce the batch size. If it takes only a few seconds, increase the batch size. And so on. If you get a timeout error, split your batches into smaller batches.