07 Data Analysis and Reporting Tools Flashcards
(40 cards)
Understanding the data, determining whether predication exists, and building a profile of potential frauds are all steps of which phase of the data analysis process?
A. The post-analysis phase
B. The testing and interpretation phase
C. The preparation phase
D. The planning phase
D. The planning phase
As with most tasks, proper planning is essential in a data analysis engagement. Without sufficient time and attention devoted to planning early on, the fraud examiner risks analyzing the data inefficiently, lacking focus or direction for the engagement, running into avoidable technical difficulties, and possibly overlooking key areas for exploration.
The first phase of the data analysis process is the planning phase. This phase consists of several important steps, including:
* Understanding the data
* Defining examination objectives
* Building a profile of potential frauds
* Determining whether predication exists
During which phase of the data analysis process does the fraud examiner identify, obtain, and verify the relevant or requested data?
A. The planning phase
B. The preparation phase
C. The post-analysis phase
D. The testing and interpretation phase
B. The preparation phase
The second phase of the data analysis process is the preparation phase. The results of a data analysis test will only be as good as the data used for the analysis. Thus, before running tests on the data, the fraud examiner must make certain the data being analyzed are relevant and reliable for the objective of the engagement. During the preparation phase of the data analysis process, the fraud examiner must complete several important steps, including:
* Identifying the relevant data
* Obtaining the requested data
* Verifying the data
* Cleansing and normalizing the data
Which of the following is a limitation of Benford’s Law?
A. Benford’s Law cannot be applied to data sets with non-natural numbers, such as check or invoice numbers.
B. Benford’s Law only works on data sets with assigned numbers, such as bank account or telephone numbers.
C. Benford’s Law applies best to data sets with three-digit numbers.
D. Benford’s Law can only be applied to data sets listed in currency amounts.
A. Benford’s Law cannot be applied to data sets with non-natural numbers, such as check or invoice numbers.
Benford’s Law distinguishes between natural and non-natural numbers, and it is important to understand the difference between the two types because Benford’s Law cannot be applied to data sets with non-natural numbers. Natural numbers are those numbers that are not ordered in a particular numbering scheme and are not human-generated or generated from a random number system. For example, most vendor invoice totals will be populated by currency values that are natural numbers. Conversely, non-natural numbers (e.g., employee identification numbers and telephone numbers) are designed systematically to convey information that restricts the natural nature of the number. Any number that is arbitrarily determined, such as the price of inventory held for sale, is considered a non-natural number.
Which of the following functions does a Benford’s Law analysis help to achieve?
A. Measuring the relationship between items on financial statements by expressing accounts as percentages
B. Extracting usable information from unstructured text data
C. Identifying fictitious numbers
D. Identifying duplicate payments
C. Identifying fictitious numbers
The goal of a Benford’s Law analysis is to identify fictitious numbers. Benford’s Law provides that the distribution of the digits in multi-digit natural numbers is not random; instead, it follows a predictable pattern. That is, Benford’s Law maintains that certain digits show up more than others do when dealing with natural numbers. A “1” appears as the first non-zero digit roughly 30 percent of the time; “2” is the leading digit almost 18 percent of the time; and “9” leads off only 4.6 percent of the time. Moreover, “0” is most likely to be the second digit, appearing 12 percent of the time.
Many fraudsters fail to consider the Benford’s Law pattern when creating false documentation or transactions to cover their tracks. Consequently, testing data sets for the occurrence or non-occurrence of the predictable digit distribution can help identify included numbers that are not legitimate.
Why do fraud examiners perform textual analytics?
A. To reveal patterns, sentiments, and relationships indicative of fraud
B. To gauge the pressures/incentives, opportunities, and rationalizations to commit fraud
C. To uncover warning signs of potentially fraudulent employee behavior
D. All of the above
D. All of the above
Textual analytics is a method of using software to extract usable information from unstructured text data. Through the application of linguistic technologies and statistical techniques—including weighted fraud indicators (e.g., fraud keywords) and scoring algorithms—textual analytics software can categorize data to reveal patterns, sentiments, and relationships indicative of fraud. For example, an analysis of email communications might help fraud examiners to gauge the pressures/incentives, opportunities, and rationalizations to commit fraud that exist in an organization. Textual analytics provides the ability to uncover additional warning signs of potentially fraudulent employee behavior.
Depending on the type of fraud risk present in a fraud examiner’s investigation, he will want to come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, fraud schemes, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails. Additionally, it can be helpful to consider the three factors identified in the Fraud Triangle when coming up with a keyword list.
Which of the following data analysis functions can be used to determine the relationship between two variables in raw data?
A. Benford’s Law analysis
B. Correlation analysis
C. Duplicate testing
D. Gap testing
B. Correlation analysis
Using the correlation analysis function, fraud examiners can determine the relationships among different variables in the raw data. Fraud examiners can learn a lot about data files by learning the relationship between two variables. For example, we should expect a strong correlation between the following independent and dependent variables because a direct relationship exists between the two variables. Hotel costs should increase as the number of days traveled increases. Gallons of paint used should increase as the number of houses painted increases.
Black, a fraud examiner, is conducting textual analytics on emails sent to and from specific employees that his client has identified as fraud suspects. He is using the Fraud Triangle to come up with a list of fraud keywords to use in his search. Which of the following words found in email text might indicate a fraudster is rationalizing his actions?
A. Quota
B. Override
C. Deserve
D. Write off
C. Deserve
In conducting a textual analytics examination, the fraud examiner should come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, the suspected fraud schemes or types of fraud risk present, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails.
The factors identified in the Fraud Triangle are helpful when coming up with a fraud keyword list. One of these factors is rationalization; consequently, the fraud examiner should consider how someone in the entity might be able to rationalize committing fraud. Because most fraudsters do not have a criminal background, justifying their actions is a key part of committing fraud. Some keywords that might indicate a fraudster is rationalizing his actions include reasonable, deserve, andtemporary.
Other keywords can be used to identify the other factors indicated by the Fraud Triangle. For example, write off and override would indicate opportunity to commit fraud, while quota suggests pressure to commit fraud.
Scott, a fraud examiner, is concerned that employees are abusing their expense accounts and are spending more than the $30 per day allowed for meals. Which of the following is the most appropriate data analysis function for locating meal expenses greater than $30?
A. Compliance verification
B. Multi-file processing
C. Duplicate search
D. Gap testing
A. Compliance verification
Compliance verification determines whether company policies are met by employee transactions. If a company limits the amount of its reimbursements, the software can check to see that this limit is being observed. Many times, fraud examiners can find early indications of fraud by testing detail data for values above or below specified amounts. For example, when employees are out of town, do they adhere to company policy of spending no more than $30 per day for meals? To start, fraud examiners can look at all expense report data and select those with daily meal expenses exceeding $30. With the information returned from this simple query, there is a starting point for suspecting fraud.
Which of the following is a data analysis tool that is effective in identifying indirect relationships and relationships with several degrees of separation?
A. Link analysis
B. Word maps
C. Geospatial analysis
D. Tree maps
A. Link analysis
Link analysis software is used by fraud examiners to create visual representations (e.g., charts with lines showing connections) of data from multiple data sources to track the movement of money; demonstrate complex networks; and discover communications, patterns, trends, and relationships.
Link analysis is very effective for identifying indirect relationships and relationships with several degrees of separation. For this reason, link analysis is particularly useful when conducting a money laundering investigation, since it can track the placement, layering, and integration of money as it moves around unexpected sources. It could also be used to detect a fictitious vendor (shell company) scheme. For instance, the investigator could map visual connections between a variety of entities that share an address and bank account number to reveal a fictitious vendor created to embezzle funds from a company.
Which of the following data analysis functions is most useful in testing for hidden journal entries?
A. Identifying duplicates
B. Statistical sampling
C. Gap testing
D. Aging analysis
C. Gap testing
Gap testing is used to identify missing items in a sequence or series, such as missing checks or invoice numbers. It can also be used to find sequences where none are expected to exist (e.g., employee government identification numbers). In reviewing journal entries, gaps might signal possible hidden entries.
A fraud examiner is conducting textual analytics on journal entry data and runs a keyword search using the terms override, write off, and reserve/provision. With which leg of the Fraud Triangle are these fraud keywords typically associated?
A. Opportunity
B. Capability
C. Pressure
D. Rationalization
A. Opportunity
In conducting a textual analytics examination, the fraud examiner should come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, the suspected fraud schemes or types of fraud risk present, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails.
The factors identified in the Fraud Triangle are helpful when coming up with a fraud keyword list. One of these factors is opportunity; consequently, the fraud examiner should consider how someone in the entity might have the opportunity to commit fraud. Examples of keywords that indicate the opportunity to commit fraud includeoverride, write off, recognize revenue, adjust, discount, and reserve/provision.
Which of the following is an example of a data analysis function that can be performed to help detect fraud through examination of payroll accounts?
A. Compare customer credit limits and current or past balances.
B. Identify paycheck amounts over a certain limit.
C. Generate depreciation to asset cost reports.
D. Compare approved vendors to the cash disbursement payee list.
B. Identify paycheck amounts over a certain limit.
The following are examples of data analysis queries that can be performed by data analysis software on payroll accounts to help detect fraud:
* Summarize payroll activity by specific criteria for review.
* Identify changes to payroll or employee files.
* Compare timecard and payroll rates for possible discrepancies.
* Prepare check amount reports for amounts over a certain limit.
* Check proper supervisory authorization on payroll disbursements.
In data analysis, date fields are generally not a problem when importing and exporting data because standard formats are always used.
A. True
B. False
B. False
When conducting data analysis, the fraud examiner must consider the data format and structure. This consideration is important when the fraud examiner wishes to import or export data with his computer. A date can be formatted into a number of different styles, such as mm/dd/yyyy. The structure of the data will also be important, along with the extension. A text file will have a .txt extension associated with it. In what format is the current data? What format will the computer require? How does the fraud examiner get the data from here to there if the data formats and structures are different?
Which of the following is TRUE regarding textual analytics?
A. Textual analytics is used to figure out whether someone is lying or telling the truth based on context clues.
B. Textual analytics can be used to categorize data to reveal patterns, sentiments, and relationships indicative of fraud.
C. The purpose of performing textual analytics is to search for and find an admission of fraud that can be presented in court.
D. There is a universal list of fraud keywords to use when implementing textual analytics that is applicable to any fraud examination.
B. Textual analytics can be used to categorize data to reveal patterns, sentiments, and relationships indicative of fraud.
Textual analytics is a method of using software to extract usable information from unstructured text data. Through the application of linguistic technologies and statistical techniques—including weighted fraud indicators (e.g., fraud keywords) and scoring algorithms—textual analytics software can categorize data to reveal patterns, sentiments, and relationships indicative of fraud. For example, an analysis of email communications might help fraud examiners to gauge the pressures/incentives, opportunities, and rationalizations to commit fraud that exist in an organization. Textual analytics provides the ability to uncover additional warning signs of potentially fraudulent employee behavior.
Depending on the type of fraud risk present in a fraud examiner’s investigation, he will want to come up with a list of fraud keywords that are likely to point to suspicious activity. This list will depend on the industry, fraud schemes, and the data set the fraud examiner has available. In other words, if he is running a search through journal entry details, he will likely search for different fraud keywords than if he were running a search of emails. It might be helpful to look at the Fraud Triangle when coming up with a keyword list. Additionally, it can be helpful to consider the three factors identified in the Fraud Triangle when coming up with a keyword list.
Which of the following steps is included in the planning phase of the data analysis process?
A. Defining examination objectives
B. Determining whether predication exists
C. Building a profile of potential frauds
D. All of the above
D. All of the above
As with most tasks, proper planning is essential in a data analysis engagement. Without sufficient time and attention devoted to planning early on, the fraud examiner risks analyzing the data inefficiently, lacking focus or direction for the engagement, running into avoidable technical difficulties, and possibly overlooking key areas for exploration.
The first phase of the data analysis process is the planning phase. This phase consists of several important steps, including:
* Understanding the data
* Defining examination objectives
* Building a profile of potential frauds
* Determining whether predication exists
All of the following are advantages to using data analysis software EXCEPT:
A. Fraud examiners can use data analysis software to produce accurate results from bad data.
B. Fraud examiners can use data analysis software to search for red flags of possible fraud.
C. Fraud examiners can use data analysis software to ensure that an investigation is accurate and complete.
D. Fraud examiners can use data analysis software to centralize fraud investigations.
A. Fraud examiners can use data analysis software to produce accurate results from bad data.
There are five significant advantages to using data analysis software. First, data analysis software allows the fraud examiner to centralize an investigation, relying less on others to gather data. Second, data analysis software allows the fraud examiner to ensure that an investigation is accurate and complete. Third, data analysis allows the fraud examiner to base predictions about the probability of a fraudulent situation on reliable statistical information. Fourth, data analysis allows the fraud examiner to search entire data files for red flags of possible fraud. Finally, data analysis can assist the fraud examiner in developing reference files for ongoing fraud detection and investigation work.
Which of the following is an example of a data analysis function that can be performed on cash disbursements to help detect fraud?
A. Generate vendor cash activity summary for further analysis
B. Identify disbursements by department, supervisor approval, or amount limits
C. Verify audit trail for all disbursements by purchase order, vendor, department, etc.
D. All of the above
D. All of the above
The following are examples of data analysis queries that can be performed by data analysis software on cash disbursements to help detect fraud:
* Summarize cash disbursements by account, bank, department, vendor, etc.
* Verify audit trail for all disbursements by purchase order, vendor, department, etc.
* Generate vendor cash activity summary for analysis.
* Identify disbursements by department, supervisor approval, or amount limits.
Link analysis can be used to visually map connections between entities that share an address.
A. True
B. False
A. True
Link analysis software is used by fraud examiners to create visual representations (e.g., charts with lines showing connections) of data from multiple data sources to track the movement of money; demonstrate complex networks; and discover communications, patterns, trends, and relationships.
Link analysis is very effective for identifying indirect relationships and relationships with several degrees of separation. For this reason, link analysis is particularly useful when conducting a money laundering investigation, since it can track the placement, layering, and integration of money as it moves around unexpected sources. It could also be used to detect a fictitious vendor (shell company) scheme. For instance, the investigator could map visual connections between a variety of entities that share an address and bank account number to reveal a fictitious vendor created to embezzle funds from a company.
Text-based data is typically considered:
A. Narrative data
B. Structured data
C. Unstructured data
D. Documentary data
C. Unstructured data
Data are either structured or unstructured. Structured data is the type of data found in a database, consisting of recognizable and predictable structures. Examples of structured data include sales records, payment or expense details, and financial reports. Unstructured data, by contrast, is data that would not be found in a traditional spreadsheet or database. It is typically text based.
For the purpose of a Benford’s Law analysis, an employee identification number would be considered a “natural number.”
A. True
B. False
B. False
Benford’s Law distinguishes between natural and non-natural numbers, and it is important to understand the difference between the two types because Benford’s Law cannot be applied to data sets with non-natural numbers. Natural numbers are those numbers that are not ordered in a particular numbering scheme and are not human-generated or generated from a random number system. For example, most vendor invoice totals will be populated by currency values that are natural numbers. Conversely, non-natural numbers (e.g., employee identification numbers and telephone numbers) are designed systematically to convey information that restricts the natural nature of the number. Any number that is arbitrarily determined, such as the price of inventory held for sale, is considered a non-natural number.
On which of the following data fields would a fraud examiner be most likely to run a duplicate test to search for a duplicate value?
A. Inventory counts
B. Product numbers
C. Customer account balances
D. Invoice numbers
D. Invoice numbers
Duplicate testing is used to identify transactions with duplicate values in specified fields. This technique can quickly review the file, or several files joined together, to highlight duplicate values of key fields. In many systems, the key fields should contain only unique values (no duplicate records).
For example, a fraud examiner would expect fields such as check numbers, invoice numbers, and government identification numbers to contain only unique values within a data set; searching for duplicates within these fields can help the fraud examiner find anomalies that merit further examination.
Which of the following is an example of a data analysis function that can be performed to detect fraud through an examination of the general ledger?
A. Create actual-to-budget comparison reports
B. Calculate financial ratios
C. Analyze and confirm specific ledger accounts for legitimate transaction activity
D. All of the above
D. All of the above
The following are typical examples of data analysis queries that can be performed by data analysis software on the general ledger:
* Select specific journal entries for analysis.
* Create actual-to-budget comparison reports.
* Analyze and confirm specific ledger accounts for legitimate transaction activity.
* Speed account reconciliation through specialized account queries.
* Calculate financial ratios.
* Calculate percentage comparison ratio between accounts.
* Prepare custom reports, cash flow, profit/loss, and asset and liability total reports.
* Compare summaries by major account in any order (low-high, high-low).
* Create reports in any format by account, division, department, etc.
Karen is undertaking a data analysis engagement to identify potential fraud at XYZ Corporation. Which of the following lists the most appropriate order in which she should conduct the steps involved in the data analysis process?
I. Cleanse and normalize the data.
II. Build a profile of potential frauds.
III. Analyze the data.
IV. Obtain the data.
V. Monitor the data.
A. II, IV, III, I, V
B. IV, II, I, V, III
C. IV, I, III, V, II
D. II, IV, I, III, V
D. II, IV, I, III, V
To ensure the most accurate and meaningful results, a formal data analysis process should be applied that begins several steps before the tests are run and concludes with active and ongoing review of the data. While the specific process will vary based on the realities and needs of the organization, the following approach contains steps that should be considered and implemented, to the appropriate extent, in each data analysis engagement:
1. Planning phase
* Understand the data.
* Define examination objectives.
* Build a profile of potential frauds.
* Determine whether predication exists.
2. Preparation phase
* Identify the relevant data.
* Obtain the data.
* Verify the data.
* Cleanse and normalize the data.
3. Testing and interpretation phase
* Analyze the data.
4. Post-analysis phase
* Respond to the analysis findings.
* Monitor the data.
Which of the following is an example of a data analysis function that can be performed to help detect fraud through examination of asset accounts?
A. Compare book and tax depreciation and indicate variances
B. Select samples for asset existence verification
C. Recalculate expense and reserve amounts using replacement costs
D. All of the above
D. All of the above
The following are examples of data analysis queries that can be performed by data analysis software on asset accounts to help detect fraud:
* Generate depreciation to cost reports.
* Compare book and tax depreciation and indicate variances.
* Sort asset values by asset type or monetary amount.
* Select samples for asset existence verification.
* Recalculate expense and reserve/provision amounts using replacement costs.