Use of data and information Flashcards
(18 cards)
A description of data from a customer satisfaction survey includes a note stating that all the survey responses were scored on a scale of 1 to 5, in which 1 = “very dissatisfied” and 5 = “very satisfied.” This note is referring to which of the following factors that define a data set?
A. Units of measurement of data elements.
B. Sources from which the data was collected.
C. Nature of each data element.
D. Population of events or instances that were included.
A. Units of measurement of data elements.
A data set is a collection of related data that share similar characteristics. A data set’s users should be provided with a description of the data. The data description helps users understand the nature of the data, its purpose, which data points are included or excluded, and the data’s limitations.
Among other factors, the data description should define the units of measurement (eg, length in meters or feet) for each data element. This information provides context when analyzing the data. In this scenario, knowing the scale (ie, 1 to 5) of responses in the survey can help users interpret the data in terms of the range of possible values of each response.
(Choice B) The sources define where the data originated (eg, commercial data provider).
(Choice C) The nature of each data element defines how it relates to other data elements in the data set.
(Choice D) The population of events defines the factors that determined the inclusion or exclusion of certain data elements or data points in the data set.
Things to remember:
A data description helps users understand the nature of the data, its purpose, which data points are included or excluded, and the data’s limitations. The description should also define the units of measurement (eg, response scale of 1 to 5) for each data element when appropriate.
Software used for extract, transform, load (ETL) processes can perform each of the following functions, except:
A. Retrieve data from semi-structured sources.
B. Calculate summary statistics.
C. Maintain a metadata repository.
D. Operate enterprise resource planning (ERP) systems.
D. Operate enterprise resource planning (ERP) systems.
Preparing data for analysis requires a three-step process called extract, transform, load (ETL). Designing and maintaining ETL procedures is time consuming and resource intensive. Instead of writing custom programming code, organizations use ETL software for administration and operation of ETL processes. ETL software expedites and automates ETL procedures.
ETL software can be used in all three ETL phases. It can:
Connect to and retrieve data from different sources, including those containing semi-structured data, during the extract phase. Semi-structured data (eg, email) contains tags that makes it easier to process than unstructured data (Choice A).
Provide data management capabilities such as calculating summary statistics (eg, means) and performing data quality checks (Choice B).
Maintain a metadata repository. Metadata (ie, data about data) includes characteristics such as the data type (eg, integer, decimal) and the length of the data (ie, 50 characters). By maintaining a repository, ETL software centralizes the metadata for ongoing and future ETL initiatives (Choice C).
Enterprise resource planning (ERP) software applications automate many of the core business processes (eg, inventory management) within an organization. Although ETL software can extract data from ERP systems, it cannot operate those systems.
Things to remember:
Retrieving, preparing, and loading data from multiple sources for data analysis involves a three-step process called extract, transform, load (ETL). The ETL process is time consuming and resource intensive, so organizations use ETL software to expedite and automate all three phases of the process.
An organization is moving data from various files into a data warehouse. The extracted data includes total sales by zip code. The target database requires total sales by region (ie, North, South, West, East). Which of the following transformation methods most likely will be used to transform the data to meet the requirements of the target database?
A. Transposing.
B. Joining.
C. Aggregating.
D. Sorting.
C. Aggregating.
Data extracted from different sources is often not useable for analysis due to inconsistencies and formatting issues. The data may need to be transformed before being loaded into the target database. The objectives of the transformation phase include removing inconsistencies, correcting mismatches and formatting, and enriching datasets by supplying additional information.
One technique used to transform data is aggregation. Aggregating data involves summarizing smaller data categories into one larger category to make it more useable. For example, if sales data by zip code needs to be transformed into sales by region, sales for all relevant zip codes would be combined.
(Choice A) Transposing data involves switching columns and rows or pivoting data by turning multiple columns into multiple rows or vice versa.
(Choice B) Joining data connects multiple data points (eg, combining weather data from online sources with internal sales data to forecast sales).
(Choice D) Sorting rearranges data into a useful order (eg, ascending, descending).
Things to remember:
Data extracted from different sources is often not useable due to inconsistencies and formatting issues; it may require transformation before it is loaded. One transformation method is to aggregate the data, which summarizes or groups data into a new category. Although the transposing, joining, and sorting methods are also useful transformation techniques, they are not the preferred techniques for summarizing data.
Each of the following is considered to be a characteristic of big data, except
A. Versatility.
B. Volume.
C. Velocity.
D. Variety.
A. Versatility.
Big data refers to amounts of data so voluminous that they cannot be analyzed using traditional processing techniques such as spreadsheets. This data obtained from multiple sources (eg, spreadsheets, emails, apps on personal electronics and smart devices, questionnaires, social media posts).
Big data is often described as having four characteristics, all of which begin with a “V” (ie, the four Vs). These attributes illustrate the nature of data that is used in data analytics.
Volume: The size/amount of data sets to be analyzed (Choice B).
Velocity: The frequency/speed at which data is generated, processed, and analyzed (Choice C).
Variety: The form of the data (structured, unstructured, or semi-structured) (Choice D).
Veracity: The quality/trustworthiness, accuracy, and precision of the data.
Versatility refers to the ability to adapt items to many functions or purposes. Data does not have to be versatile to be useful. Some websites suggest considering a fifth “V” for value. Essentially, this is a cost/benefit assessment (and not a true characteristic) and asks if the data is worth the cost of extraction and analysis.
Things to remember:
Big data refers to massive amounts of data that cannot be analyzed using traditional processing techniques. Big data is often described as having four characteristics: volume, variety, veracity, and velocity.
Once an auditor has planned an audit data analytic, the data must be accessed and prepared. This preparation process may include which of the following procedures?
A. Determining the characteristics of data requiring further investigation.
B. Determining the data population to be analyzed.
C. Sorting data by categories to identify outliers.
D. Merging data from different sources.
D. Merging data from different sources.
The AICPA’s Audit Data Analytics Guide provides five steps to perform audit data analytics (ADAs). In Step 2 of an ADA, the auditor accesses and prepares the data for the purposes of the ADA by performing the following:
Extract the data from the system in which it resides (eg, ERP, accounting software)
Transform the data into a format the auditor can use (eg, merge the data into one spreadsheet)
Load the data into the analysis software
For example, if an auditor wanted to perform an ADA on inventory for several segments of an entity, the auditor should obtain data from every location, clean the data, and merge all the workbooks together. The auditor should then load the data into the analysis software.
(Choices A and B) Determining the characteristics of data requiring further investigation and determining the data population to be analyzed would both occur in planning the ADA (Step 1). During Step 1, the auditor determines the overall purpose and objective of the ADA, the ADA techniques, and the data set to be used.
(Choice C) Sorting data by categories to identify outliers is an example of a technique used to perform the ADA (Step 4).
Things to remember:
While accessing and preparing the data, the auditor should extract data from the system in which it resides, transform the data into a format the auditor can use (eg, merge the data), and load the data into the analysis software.
Audit data analytics is the process of transforming large amounts of raw data into useful information for the purpose of identifying patterns and irregularities in the audit. Audit data analytics would be least useful in
A. Planning the audit for a new client.
B. Performing substantive procedures when the reliability of internal data is questionable.
C. Performing risk assessment procedures when inherent risks are high.
D. Forming a conclusion on the audit of a client that operates in a volatile industry.
B. Performing substantive procedures when the reliability of internal data is questionable.
An audit data analytic (ADA) is a procedure used to help the auditor discover patterns and identify anomalies that may require further investigation. It can be used in all phases of an audit, including risk assessment procedures, tests of controls, substantive analytical procedures, tests of details, and in forming an audit opinion.
The AICPA’s nonauthoritative guidance on ADAs includes five basic steps. Auditors should consider the relevance and reliability of the data being used (Step 3). This includes examining the nature and source of the data, the processes used to produce the data, and whether additional audit procedures are needed to verify the reliability of the data. If the data is unreliable, the conclusions produced by the ADA may mislead the auditor; therefore, the auditor would use alternative procedures.
(Choice A) ADAs can be used for new clients if the auditor concludes that the underlying data is relevant and reliable.
(Choices C and D) Factors such as a high inherent risk and the volatility of the client’s industry do not make the ADA less useful as long as the underlying data is relevant and reliable.
Things to remember:
As part of the AICPA’s five-step approach to audit data analytics (ADAs), auditors should consider the relevance and reliability of the data used. If an auditor concludes that the data is relevant and reliable, ADAs may be performed using the data regardless of the circumstances (eg, new client, volatility of industry, inherent risk).
Which of the following procedures would an accountant be least likely to perform when transforming raw data from different sources into a usable database?
A. Add a data dictionary.
B. Insert calculated fields.
C. Join data by a common field.
D. Delete repeated columns.
A. Add a data dictionary.
Accountants often need to extract, transform, and load (ETL) data from different sources into a database. Extractioninvolves obtaining data from various sources and formats. The extracted data is often not usable for analysis due to formatting issues and may need to be transformed into a useable format. The load phase involves inserting the transformed data into the target database (eg, data warehouse) for analysis.
A data dictionary is a centralized repository of metadata, or data about data, that addresses the descriptions and format of the data. It may include information such as variable names, descriptions, data types, formats, time and date of creation, and who has access to what data. If it doesn’t already exist, the dictionary would be created after the load phase, not during the transform phase.
(Choices B, C, and D) Inserting calculated fields, joining data from different sources, and deleting (deduplicating) repeated columns are all performed during the transform stage.
Things to remember:
A data dictionary is a repository of metadata, or data about data, and describes the types and format of the data. It may include variable names, descriptions, data types, formats, creation data, and who has access to the data. It may already exist before the data is extracted or would be completed after the data have been loaded.
Which of the following is a descriptive analytic technique that would be most useful in identifying transactions recorded shortly after the balance sheet date?
A. Calculating summary statistics.
B. Time-period sampling.
C. Record counting.
D. Sorting.
D. Sorting.
The primary and foreign keys that are used in a relational database are examples of properties of the:
A. Master data.
B. Data classification.
C. Logical data model
D. Physical data model
D. Physical data model
A primary key is a field or group of fields that uniquely identifies a data record in a table in a relational database. Each table has one primary key and each primary key must contain a value (eg, customer number).
A foreign key is used within a database table to identify a unique record that resides in a different table. The primary and foreign keys that are used in the specific database are properties of the physical data model. The physical data asset model shows how data are stored in the organization’s accounting system.
(Choice A) Master data are the core data that uniquely identify entities such as customers, suppliers, employees, products, and services.
(Choice B) Data classification defines the privacy and security properties of data.
(Choice C) A logical data asset model shows the data at the level of business requirements.
Things to remember:
A primary key is a field or group of fields that uniquely identifies a record in a table in a relational database A foreign key is used within a database table to identify a unique record that resides in a different table. The primary and foreign keys are properties of the physical data model, which shows how data are stored in the organization’s accounting system.
Which of the following statements is not true of the test data approach to testing an accounting system?
A. Test data are processed by the client’s computer programs under the auditor’s control.
B. The test data need consist of only those valid and invalid conditions that interest the auditor.
C. Only one transaction of each type need be tested.
D. The test data must consist of all possible valid and invalid conditions.
D. The test data must consist of all possible valid and invalid conditions.
Choice D (Correct) and Choices A, B, C (Incorrect): Under the test data approach, the accountant will run both valid and invalid conditions through a client’s computer system. The test data need only consist of the items that the auditor is interested in testing and does not have to include all possible conditions.
An audit client is using a computer program to process financial data. The auditor’s goal is to verify that transactions are processed correctly. There is no appropriate audit program available. Which of the following approaches most appropriately achieves the auditor’s goal?
A. The controlled reprocessing approach where the auditor verifies that a program to be tested is the one used to process actual data.
B. The test data approach where the auditor includes several examples of each type of transaction error.
C. The parallel simulation approach where the auditor runs client data through a generalized audit software program and the client’s program.
D. The integrated test facility approach where audit data is processed in the client’s system along with the client’s data.
D. The integrated test facility approach where audit data is processed in the client’s system along with the client’s data.
Edits are system controls to ensure authorized access, correct processing, and proper interactions with other computer systems. Computer-assisted audit techniques (CAATs) are used to determine the presence of appropriate edits.
Here, the integrated test facility is the most appropriate CAAT to verify proper transaction processing. The auditor enters single examples of correct and erroneous transactions and the client’s system processes them along with actual client data. The auditor then determines whether the test transactions are handled appropriately (eg, edits detect and reject erroneous transactions).
(Choice A) Controlled reprocessing involves reprocessing client data using the client’s program run on the auditor’s computer. This verifies that the program provided to the auditor is the same one that processed client data but does not verify proper transaction processing.
(Choice B) The test data approach is similar to the integrated test facility approach. However, only a single (not several) example of each type of transaction error is required (eg, 1 unauthorized access attempt). A computer program does not have the ability to modify the way it processes multiple instances of a single type of transaction (eg, all unauthorized users should be denied access).
(Choice C) Parallel simulation involves processing client data using both the client’s program and the auditor’s program. Here, there is no appropriate audit program available.
Things to remember:
An integrated test facility involves entering and processing audit test data into the client’s system while simultaneously processing actual client data. The auditor uses test results to determine whether transactions process properly.
Which of the following components of a database is responsible for maintaining the referential integrity of the data in the system?
A. Database management system (DBMS)
B. Data query language (DQL).
C. Data manipulation language (DML).
D. Data definition language (DDL).
A. Database management system (DBMS)
(Choice A) The database management system (DBMS) controls the storage and retrieval of the information maintained in a database and is responsible for maintaining the referential integrity of the data.
(Choice B) Data query language (DQL) is used to extract information from the database.
(Choice C) Data manipulation language (DML) is used to add, update, and delete data.
(Choice D) Data definition language (DDL) is used to create tables and fields of information within the fields.
In an audit of a nonissuer, the auditor plans to recalculate the nonissuer’s year end employee vacation accrual using a management-provided list of employees’ salaries and banked vacation hours. In order to validate the completeness of the employees’ information, the auditor would most appropriately
A. Use the payroll system to validate a sample of employee salaries.
B. Agree the number of employees included in the vacation-accrual calculation to the final payroll register.
C. Review a list of employees who were terminated after year end to verify that they have been removed from the vacation-accrual calculation.
D. Verify that the number of vacation hours accrued by each employee for each pay period complies with the corresponding policy from the human resources department.
B. Agree the number of employees included in the vacation-accrual calculation to the final payroll register.
The completeness assertion is a management claim that everything that should have been recorded is included in the financial statements (F/S). When testing the completeness of liabilities, auditors look for accrued vacation amounts that have been omitted from the F/S. To verify completeness, auditors trace the data by comparing the information provided back to the source data.
In this scenario, the auditor needs to verify the completeness of the employees’ information before recalculating the client’s vacation accrual. Agreeing the number of employees from the management-provided list to the final payroll register would be the best way to determine the completeness of the accrual. If the list and the register agree, it would indicate that all people employed at year end were included when the vacation expense accrual was computed.
(Choices A and D) Neither sampling employee salaries nor verifying that the number of vacation hours accrued complies with company policy would ensure that all employees are included in the accrual calculation.
(Choice C) Employees who were terminated after year end may need to be removed from the vacation accrual calculation for the following year. However, this review would not detect whether any employees who should have been included in the accrual calculation at year end were omitted.
Things to remember:
To verify completeness, auditors trace the information by comparing the data provided to the source data.
An accountant imported vendor invoice data from a comma-separated values (CSV) file into an analytics program. When verifying the import, the accountant expected to see the following:
Invoice Date Items Amount
7589 04/03/X1 Roses, tulips, and violets 102.56
Instead, the accountant saw the following:
Invoice Date Items Amount Unknown field 1 Unknown field 2
7589 04/03/X1 Roses tulips and violets 102.56
Which of the following strings could have created the error?
A. 7589,04/03/X1, Roses tulips and violets, 102.56
B. 7589,04/03/X1, Roses, tulips, and violets,102.56
C. 7589,04/03/X1,”Roses, tulips, and violets”,102.56
D. 7589,04/03/X1,”Roses” “tulips” “and violets”,102.56
B. 7589,04/03/X1, Roses, tulips, and violets,102.56
An accountant often needs to extract, transform, and load (ETL) data from different sources into a centralized database. Extracted data may not be immediately usable for analysis because of inconsistencies and formatting issues. The accountant likely needs to transform the data before loading it into the target database. Data transformation involves converting the raw data gathered from the extraction phase into a consistent, useful format that the accountant can load into the target database. Once loaded, the data can be analyzed.
In this situation, the raw data did not properly convert. In a CSV file, the commas separate the values. As a result, the words roses, tulips, and and violets appeared in separate fields instead of in one field. The accountant did not realize that the commas in the raw data file would incorrectly separate those words.
The accountant could reformat the data set in several ways to make it load correctly:
Removing all commas other than the ones needed to separate fields would result in a proper upload (Choice A)
The accountant could encapsulate the data in the “Items” field in quotation marks. The first double quotation mark would indicate the beginning of the data, and the next double quotation mark would identify the end of that string of text (Choice C)
(Choice D) This string would result in Rosestulipsandviolets appearing in the “Items” field.
Things to remember:
Data transformation involves converting the raw data gathered from the extract phase into a consistent, useful format for loading the data into the target database.
Which of the following factors would most likely influence an auditor’s consideration of the reliability of data generated through the client’s use of artificial intelligence (AI)?
A. Whether the AI systems and related infrastructure were designed to be ethically responsible.
B. Whether the AI systems and related infrastructure were subject to robust security controls.
C. Whether the AI systems and related infrastructure were flexible enough to adapt to future technological advancements.
D. Whether the AI systems and related infrastructure enhance operational effectiveness.
B. Whether the AI systems and related infrastructure were subject to robust security controls.
Audit data analytics (ADAs) are used to analyze patterns, identify anomalies, and extract other useful information from a data set. When ADAs are performed, the auditor must consider the reliability and relevance of the underlying information.
In this scenario, the client provided data that was generated using artificial intelligence (AI). This could have been created using an AI tool embedded within their accounting system or by using a third-party AI tool (eg, ChatGPT). Regardless of the source, the auditor should verify information technology general controls, including security controls, regarding the technology used to ensure data reliability. If the client used third-party tools, it may be more challenging to determine data reliability.
(Choices A and D) The client may consider ethical responsibility and enhanced operational effectiveness important to the use of AI. However, these factors do not impact the auditor’s assessment of data reliability.
(Choice C) Although an AI system’s flexibility to adapt to future technological advancements may impact system availability in the future, it would not impact the auditor’s assessment of data reliability for the current year.
Things to remember:
Audit data analytics (ADAs) rely on reliable and relevant data. When using client AI-generated data, auditors must critically assess the reliability of the technology and its controls.
When performing data analytics, the auditor would identify nominal data by which of the following characteristics?
A. It is in equal increments.
B. It can be ranked.
C. It is named.
D. It is quantitative.
C. It is named.
Auditors need to understand measurement scales to effectively perform audit data analytics. A measurement scale is a system for classifying data based on how values assigned are interpreted. The type of measurement scales (nominal, ordinal, interval, and ratio) used determines the types of analyses auditors can perform. Incorrect use of measurement scales can result in misleading or erroneous conclusions.
Nominal scales categorize data into mutually exclusive groups using qualitative or named variables (Choice D). For example, nominal scales are used to collect demographic information, such a customer’s state of residence”
(Choice A) Some measurement scales use equal increments between each level to reflect a consistent difference (eg, age measured in whole years). These types of increments are found in interval and ratio scales, not in nominal scales.
(Choice B) Ordinal data can be ranked based on magnitude, using the direction of the measurement scale, such as ascending (least to most) or descending (highest to lowest) order. Nominal scales are not ranked.
Things to remember:
Nominal scales categorize data into mutually exclusive groups using qualitative variables, such as a customer’s state of residence in a survey collecting demographic information.
Your firm has been engaged to audit a computer hardware manufacturer. At the request of the audit team, the client has provided data files for all sales transactions for the year under audit. The audit team plans to use these files to perform a non-statistical analysis of sales revenue trends by product type.
The following information pertains to the files obtained:
The record count of transactions in the file agrees with the number of transactions completed for the year under audit.
The file provided is encrypted.
The data does not include a field for product type.
Total sales revenue agrees to the general ledger.
What primary concern should the auditor have with the data files obtained?
A. Data integrity.
B. Data accuracy.
C. Data freshness.
D. Data clarity and relatedness.
D. Data clarity and relatedness.
(Choice A) Incorrect. The information states that the files are encrypted, which gives the auditor some assurance that the file was not tampered with or changed during transmission to the auditor.
(Choice B) Incorrect. The data agrees to the general ledger, which rules out some obvious pre-audit errors.
(Choice C) Incorrect. Freshness means that the data is up to date. Since the data provided covers the entire year under audit, the data appear to be up to date.
(Choice D) Correct! Data clarity and relatedness involve whether the data has the elements requested and needed for the objective. The objective of this ADA is to evaluate sales trends by product. Without the product type field, the auditor will not be able to perform the desired analysis.
Which of the following would an auditor use to aid in determining if a database table is reliable for testing purposes?
A. Data dictionary.
B. Database schema.
C. Data normalization tools.
D. Data visualization tools.
A. Data dictionary.
Auditors may submit data extraction requests to obtain financial information for testing purposes. This information is contained in a database table. To test the integrity of the table, the auditor can review the data dictionary (ie, metadata), which defines field attributes such as the correct data type, value range, format, descriptions, and constraints. If any of these attributes are incorrect, the data may have errors and may not be reliable.
(Choice B) A database schema is a diagram of the underlying structure of the database. Although database schemas include information similar to that of the data dictionary, such as tables, attributes, and fields, they do not provide the same level of detail.
(Choice C) Data normalization tools organize a relational database into fewer tables with fewer fields to reduce data redundancy and improve data consistency.
(Choice D) Data visualization tools are used for presentation purposes to make audit results more understandable (eg, graphs, charts, etc.).
Things to remember:
A data dictionary stores metadata defining the properties of data fields, such as the correct data type, value range, format, descriptions, and constraints, in a relational database table. The data dictionary provides auditors with information necessary to determine the integrity of data used to test the information presented in the F/S.