Chapter 2: Domain 2: Architecture and Design Flashcards
An email is an example of what type of data?
A. Structured data
B. Semi- structured data
C. RFC- defined data
D. Unstructured data
Answer: D. Unstructured data
Emails and other freeform text are examples of unstructured data. Structured data like the data found in databases is carefully defined, whereas semi-structured data like XML and JSON applies structure without being tightly controlled. While email itself is defined by an RFC, the term RFC-defined data is not used in this context.
Nick wants to ensure that data is properly handled once it is classified. He knows that data labeling is important to the process and will help his data loss prevention tool in its job of preventing data leakage and exposure. When should data be labeled in his data lifecycle?
A. Creation
B. Storage
C. Use
D. Destruction
Answer: A.Creation
Data labeling typically occurs during the Creation phase of the data lifecycle. Data labels may also be changed or added during use as data is modified.
Jacinda is planning to deploy a data loss prevention (DLP) system in her cloud environment. Which of the following challenges is most likely to impact the ability of her DLP system to determine whether sensitive data is being transmitted outside of her organization?
A. Lack of data labeling
B. Use of encryption for data in transit
C. Improper data labeling
D. Use of encryption for data at rest
Answer: B. Use of encryption for data in transit
Jacinda is likely to face challenges using her DLP system due to the broad and consistent use of encryption for data in transit or data in motion in cloud environments. She will need to take particular care to design and architect her environment to allow the DLP system to have access to the traffic it needs. Data labeling can be a challenge, if it is lacking or if it isn’t done properly, but DLP systems can use pattern matching and other techniques to identify data. Data at rest is typically not as much of a concern for a DLP system since preventing loss requires understanding when data is going somewhere, not when it is remaining in a location.
Susan wants to ensure that super user access in her cloud environment can be properly audited. Which of the following is not a common item required for auditing of privileged user access?
A. The remote IP address
B. The account used
C. The password used
D. The local IP address
Answer: C. The password Used
Passwords are intentionally not captured or logged since creating an audit log that contains passwords would be a significant security issue. The source and destination IP address as well as the account used for privileged access are all common log data that can help when events need to be audited.
Ben’s organization uses the same data deletion procedure for their on-site systems and their third-party-provided, cloud-hosted systems. Ben believes there is a problem with the process
currently in use, which involves performing a single-pass zero-wipe of the disks and volumes in use before they are reused. What problem with this approach should Ben highlight for the
cloud environment?
A. Crypto-shredding is a secure option for third-party-hosted cloud platforms.
B. Zero-wiping alone is not sufficient, and random patterns should also be used.
C. Zero-wiping requires multiple passes to ensure that there will be no remnant data.
D. Drives should be degaussed instead of wiped or crypto-shredded to ensure that data is fully destroyed at the physical level.
Answer: A. Crypto-shredding is a secure option for third-party-hosted cloud platforms.
Cryptographic erasure, or crypto-shredding, is the only way to ensure that drives and volumes hosted by third parties are securely cleared. Zero-wiping may result in remnant data,
particularly where drives are dynamically allocated space in a hosted environment. Degaussing or other physical destruction is typically not possible with third-party-hosted systems
without a special contract and dedicated hardware.
Jason has been informed that his organization needs to place a legal hold on information related to pending litigation. What action should he take to place the hold?
A. Restore the files from backups so that they match the dates for the hold request.
B. Search for all files related to the litigation and provide them immediately to opposing counsel.
C. Delete all the files named in the legal hold to limit the scope of litigation.
D. Identify scope files and preserve them until they need to be produced.
Answer: D. Identify scope files and preserve them until they need to be produced.
Legal holds require organizations to identify and preserve data that meets the hold’s scope. Jason should identify the files and preserve them until they are required. Restoring files might erase important data, deleting files is completely contrary to the concept of the hold, and holds do not require immediate production— they are just what they sound like, a requirement to hold the data.
Murali is reviewing a customer’s file inside of his organization’s customer relationship management tool and sees the customer’s Social Security number listed as XXX- XX- 8945.
What data obfuscation technique has been used?
A. Anonymization
B. Masking
C. Randomization
D. Hashing
Answer: B. Masking
Masking data involves replacing data with alternate characters like X or . This is typically done via controls in the software or database itself, as the underlying data remains intact in the database.
Anonymization or deidentification removes data that might allow
individuals to be identified.
Hashing is used to allow data to be referenced by a hash without displaying the actual data, but it causes properties of the data that may be needed for testing to be lost.
Randomization or shuffling data moves it around, disassociating the data but leaving real data in place to be tested.
An XML file is considered what type of data?
A. Unstructured data
B. Restructured data
C. Semi-structured data
D. Structured data
Answer: C. Semi-structured Data
XML and JSON are both examples of semi-structured data. Other examples include CSV files, XML, NoSQL databases, and HTML files.
Lucca wants to implement logging in an infrastructure as a service cloud service provider’s environment for his Linux instances. He wants to capture events like the creation and destruction of systems, as part of scaling requirements for performance. What logging tool or service should he use to have the most insight into these events?
A. Syslog from the Linux systems
B. The cloud service provider’s built-in logging function
C. Syslog-NG from the Linux systems
D. Logs from both the local event log and application log from the Linux systems
Answer: B. The cloud service provider’s built-in logging function
The provider’s own logging function is the best option. Information about systems being created and destroyed won’t exist on the local systems, and thus syslog, syslog-ng, and local logs won’t work. In addition, Linux typically doesn’t have an application log—both event and application logs are common for Windows systems.
Joanna’s company uses a load balancer to distribute traffic between multiple web servers. What data point is often lost when traffic passes through load balancers to local web servers
in a cloud environment?
A. The source IP address
B. The destination port
C. The query string
D. The destination IP address
Answer: A. The source IP address
Original source IP addresses may not be visible in the local web server log. Fortunately, load-balancer logs can be used if they are available. The destination IP address will typically remain, as well as the destination port and the actual query.
Isaac is using a hash function for both integrity checking and to allow address data to be referenced without the actual data being exposed. Which of the following attributes of the data will be not be lost when the data is hashed?
A. Its ability to be uniquely identified
B.The length of the data
C. The formatting of the data
D. The ability to sort the data based on street number
Answer: A. Its ability to be uniquely identified
Hashing converts variable length data to fixed-length outputs, meaning that the length, formatting, and the ability to perform operations on the data using strings or numbers will be lost. Its ability to be uniquely identified won’t be lost—Isaac just needs to know the hash of a given address to continue to reference that data element.
Lisa runs Windows instances in her cloud-hosted environment. Each Windows instance is created with a C: drive that houses the operating system and application files. What type of storage best describes the C: drive for these Windows instances?
A. Long-term storage
B. Ephemeral storage
C. Raw storage
D. Volume-based storage
Answer: B. Ephemeral Storage
Storage that is associated with an instance that will be destroyed when the instance is shut down is ephemeral storage.
Raw storage is storage that you have direct access to like a
hard drive, or an SSD that has access to the underlying device.
Long-term storage is storage that is intended to continue to exist, and is often used for logs or data storage.
Volume-based
storage is storage allocated as a virtual drive or device within the cloud.
Amanda’s operating procedures for secure data storage require her to ensure that she is using data dispersion techniques. What does Amanda need to do to be compliant with this requirement?
A. Delete all data not in secure storage.
B. Store data in more than one location or service.
C. Avoid storing data in intact form, requiring data from more than one location to use a data set.
D. Geographically separate data by at least 15 miles to ensure that a single natural disaster cannot destroy it.
Answer: B. Store data in more than one location or service.
Data dispersion is the practice of ensuring that important data is stored in more than one location or service. It does not necessarily require specific distances or geographic limits, doesn’t require deletion of data not in secure storage, and doesn’t require you to use multiple data sets to access data.
Gary is gathering data to support a legal case on behalf of his company. Why might he digitally sign files as they are collected and preserve them along with the data in a document,
validated way?
A. To allow for data dispersion
B. To ensure the files are not copied
C. To keep the files secure by encrypting them
D. To support nonrepudiation
Answer: D. To support nonrepudiation
Chain of custody documentation, often including actions like hashing files to ensure they are not changed from their original form, is commonly done to support nonrepudiation. Digitally signing files and data dispersion won’t prevent copying and does not encrypt them.
Steve is working to classify data based on his organization’s data classification policies. Which of the following is not a common type of classification?
A. Size of the data
B. Sensitivity of the data
C. Jurisdiction covering the data
D. Criticality of the data
Answer: A. Size of the data
The size of the data or files is not typically a data classification type or field. Sensitivity, jurisdiction, and criticality are all commonly used to classify data
Chris is reviewing his data lifecycle and wants to take actions in the data creation stage that can help his data loss prevention system be more effective. Which of the following actions should he take to improve the success rate of his DLP controls?
A. Data labeling
B. Data classification
C. Hashing
D. Geolocation tagging
Answer: A. Data Labeling
Data labels can help DLP systems identify and manage data, so Chris should ensure that data is labeled as part of its creation process to help his DLP identify and protect it.
Classification is important, but without tags it won’t be useful to the DLP. Hashing can be used to help a DLP identify specific files, but tends to be done by the DLP system itself if needed, and geolocation tagging is not a typical DLP protection.
Valerie is performing a risk assessment for her cloud environment and wants to identify risks to her organization’s ephemeral volume-based storage used for system drives in a scalable,
virtual machine–based environment. Which of the following is not a threat to ephemeral storage?
A. Inadvertent exposure
B. Malicious access due to credential theft
C. Poor performance due to its ephemeral nature
D. Loss of forensic artifacts
Answer: C. Poor performance due to its ephemeral nature
Ephemeral storage will have the performance of its overall storage type, so low performance isn’t an expected issue. Inadvertent exposure, malicious access, and loss of forensic artifacts are all concerns for ephemeral storage.
Which storage type is most likely to have remnant data issues in an environment in which the storage is reused for other customers after it is reallocated if it is not crypto-shredded when it is deallocated and instead is zero-wiped?
A. Ephemeral storage
B. Raw storage
C. Long-term storage
D. Magneto-optical storage
Answer: B. Raw storage
Raw storage provides direct access to a disk, and without crypto-shredding is likely to have remnant data on the disk after it is used. Since long-term and ephemeral storage is typically abstracted, it is less likely to have remnant data in unallocated or reallocated sectors that would not be purged through typical wipe operations.
Kathleen wants to perform data discovery across a large data set and knows that some data types are more difficult to perform discovery on than others. Which of the following data types is the hardest to perform discovery actions on?
A. Unstructured data
B. Semi-structured data
C. Rigidly structured data
D. Structured data
Answer: A. Unstructured Data
Unstructured data is the most difficult to perform discovery against because the data is unlabeled and requires discovery to be done using searches or other techniques that can handle arbitrary data. Rigidly structured data is not a common description of a type of data for the purposes of the CCSP exam.
Isaac wants to filter events based on the country of origin for authentications. What log information should he use to perform a best-effort match for logins?
A. userID
B. IP address
C. Geolocation
D. MAC address
Answer: C. Geolocation
While it isn’t always perfectly accurate, geolocation (sometimes called geoIP) data attempts to identify the location of a given IP address. Isaac can use that data to attempt to match authentication events to logins, although VPNs and other tools may obscure the actual login location for users.
Charleen wants to use a data obfuscation method that allows realistic data to be used without the data being actual data associated with specific users or individuals. What data
obfucation method should she use?
A. Hashing
B. Shuffling
C. Randomization
D. Masking
Answer: B. Shuffling
While there may be some concerns about real data being used, Charleen’s goal is to have actual data for testing, making shuffling her best option. Hashing, randomization, and mask-
ing all remove or modify data in meaningful ways, resulting in testing not being as accurate s it might be against real-world data.
Michelle wants to track deletion of files in an object storage bucket. What potential issue should she be aware of if her organization makes heavy use of object-based storage for
storage of ephemeral files?
A. The logging may not be accurate.
B. Logging may be automatically disabled if too many events occur.
C. Creation and deletion events cannot be logged in filesystems.
D. The high volume of logging may increase operational costs.
Answer: D. The high volume of logging may increase operational costs
Michelle should be aware that logging deletion events, like any other high-volume event, may incur additional costs for her organization. Since the question specifically mentions
ephemeral files and heavy usage, this may be a more significant concern for her organization.
Logs will show the relevant information, and there is nothing to indicate they would not be accurate, logging is enabled or disabled by the account holder or owner, and creation and deletion event logging is supported by object-based filesystems.
Diana is outlining the labeling scheme her organization will use for their data. Which of the following is not a common data label?
A. Creation date
B. Data monetary value
C. Date of scheduled destruction
D. Confidentiality level
Answer: B. Data monetary value
Data’s confidentiality level is often contained in a label, but the monetary value is not a common data label. Creation and scheduled destruction date are also common data labels.
Susan wants to be prepared for legal holds. What organizational policy often accounts for legal holds?
A. Data classification policy
B. Retention policy
C. Acceptable use policy
D. Data breach response policy
Answer: B. Retention policy
Retention policies often include language that addresses legal holds because holds can impact retention practices and requirements. Data classification, acceptable use, and data
breach response policies typically do not include legal hold language.