Chapter 2: Domain 2: Architecture and Design Flashcards

Question

Henry wants to follow the OWASP guidelines for key storage. Which of the following is not a best practice for key storage? A. Keys must be stored in plaintext to allow for access. B. Keys must be protected in both volatile and persistent memory. C. Keys stored in databases should be encrypted using key encryption keys. D. Keys should be protected in storage to ensure that they are not modified or changed inadvertently.

Answer 1

**Answer: A. Keys must be stored in plaintext to allow for access** Keys should never be stored in plaintext format and should instead be stored in a secure manner—typically encrypted in a hardware security module or other key vault.

Answer 2

**Answer: C. Share** While IRMs are useful through many of the phases of the cloud data lifecycle, Marco knows that sharing data is when an IRM is most heavily used to ensure that data is not inadvertently exposed or misused. Information Rights Management (IRM) refers to the design and implementation of technologies and strategies to control access to and usage of digital information, ensuring its confidentiality and security.

Answer 3

**Answer: C. The discovery process cannot be run against archival storage because it is not online under normal circumstances.** While IRMs are useful through many of the phases of the cloud data lifecycle, Marco knows that sharing data is when an IRM is most heavily used to ensure that data is not inadvertently exposed or misused.

Answer 4

**Answer: B. Semi- structured data** JSON is an example of semi-structured data. Other examples include CSV files, XML, NoSQL databases, and HTML files.

Answer 5

**Answer: B. Matching fields in one database to fields in another database** Data mapping is the process of matching fields in databases to allow them to be integrated or for purposes such as data migration.

Answer 6

**Answer: C. Add labels to the file metadata.** Adding labels as part of the file’s metadata is a common practice and is less likely to be changed than including them in the filename. Modifying the files themselves with data at the beginning or end of the file can cause issues with processing that may not be prepared to handle labeled data.

Answer 7

**Answer: D. The data is lost and Nina cannot recover it.** Key escrow and backup is incredibly important, and once a key is lost, it cannot be recovered and the data should be considered lost. Generating new keys will not decrypt the data, the passphrase isn’t sufficient to recover keys, and hashing is not a form of encryption and cannot be reversed.

Answer 8

**Answer: D. Structured data** Madani knows that structured data is well-defined and organized, and will be the easiest to perform discovery actions on. It isn’t possible to perform discovery against encrypted data unless it is decrypted for the discovery process. Unstructured data is harder than semi-structured data discovery in most cases.

Answer 9

**Answer: A. Documenting chain of custody** **Ashley is documenting the chain of custody for the image to ensure it can be used in court**. Repudiation might occur if the chain of custody was not properly documented. A legal hold does not necessarily require a chain of custody documentation, although organizations may choose to or be required to ensure it. Forensic accounting is a term used to describe financial investigations.

Answer 10

**Answer: D. Mapping data** Data mapping is a term used to describe matching fields in databases to allow data migration or integration. Data classification policies do identify classification levels, assign responsibilities, and define roles.

Answer 11

**Answer: B. Automated labeling at the data creation stage** Labeling data at creation ensures that it can be properly handled through the rest of its lifecycle. Automated labeling is preferable where possible to avoid human error and to accommodate the volume of data that most organizations create.

Answer 12

**Answer: A. Anonymize access using the key** Anonymizing access using a key removes the ability to provide accountability and works against organizational best practices. Identifying the key, the user, and when and how it is used supports accountability for usage.

Answer 13

**Answer: B. Crypto-shredding** Crypto-shredding is the only viable option in environments controlled or managed by a third-party organization in most circumstances. Physical destruction is not permitted or supported by third-party providers, nor is degaussing, which will also not work on many modern drives. Overwriting may be appealing, but remnant data is an issue with SSDs and volumes that are dynamically allocated.

Answer 14

**Answer: B. He will need two distinct databases.** **Tokenization relies on two distinct databases, one with actual data and one with tokenized data**. Token servers then pull the data the token represents from the real data database when needed. This process does not require encryption, specify FIPS 140 requirements, or involve de-identification practices.

Answer 15

**Answer: A. Data discovery to identify sensitive data** With the source database ready and a tokenization database prepared to be populated, the next step is to identify which data should be tokenized. Not all data is sensitive, and thus not all data needs to be tokenized. Once you know what data will be tokenized, you can tokenize it— sometimes by hashing the data. Randomization is not part of this process.

Answer 16

**Answer: C. Dataflow diagrams** Dataflow diagrams are a critical part of organizational understanding of how data is created, moves, and is used throughout an organization. They often include details like ports, protocols, data elements and classification, and other details that can help you understand not only where data is, but how it gets there and what data is in use.

Answer 17

**Answer: A. Data Mapping** Annie should map the data between the database fields and then use the mappings to help her sort and manage data as needed. Data labeling is used to tag data with important information like classification, creation date or time, or other elements. Column consolidation and columnar aggregation were made up for this question.

Answer 18

**Answer: D. Data Lifespan Information** Data lifespan is not a typical entry in a dataflow diagram since they focus on flows, not policies. Data types, fields or names, services, systems, ports, protocols, and security details are all commonly included in dataflow diagrams.

Answer 19

**Answer: C. Encrypt the drive or volume at creation** Encrypting the drive or volume at creation ensures that any data written to the drive or volume through its lifespan will be encrypted and that destruction of the encryption key will result in secure destruction of the data.

Answer 20

**Answer: C. Unstructured Data** **Unstructured data does not have the structure required to be easily stored in a database.** Semi-structured and structured data is easier to store in a relational database since it has descriptors and elements that make it easier to map into fields.

Answer 21

**Answer: A. Retention periods, regulatory compliance requirements, data classification, data deletion and lifespan, and archiving and retrieval procedures.** Retention policies often include retention periods, regulatory and compliance requirements, data classification impacts on retention, how and when data should be deleted, and archiving and retrieval processes. In addition, Olivia may want to include monitoring, maintenance, and enforcement.

Answer 22

**Answer: C. Check the IRM's certificate revocation list** An IRM needs to maintain a certificate revocation list that allows users and applications to validate certificates, just like any other service that uses certificates. That means that Hui should be able to check to see if the revoked certificate is in the list to validate her actions. Issuing a new certificate using the same information, accessing the data using her own certificate, or deleting the private keys will not invalidate the existing certificate.

Answer 23

**Answer: C. A dataflow Diagram** Susan should prepare a dataflow diagram to share with her application developers and cloud architects to make sure that the application and service environment is correctly documented. Data mapping matches fields in databases, business impact analysis work assesses the importance of data to an organization’s work, and data classification describes data based on things like sensitivity, jurisdiction, or criticality.

Answer 24

**Answer: D. Provisioning** **The provisioning capability of IRM systems focuses on providing rights to individuals based on roles and job functions**. Tagging and data labeling are used to ensure that data is handled appropriately based on rules. Encryption secures data at rest and in transit.

Answer 25

**Answer: A. Criticality** Business impact analysis helps to determine what data is needed to continue the operations of the business. This is an assessment of the criticality of the data, and Randy knows that the data he flags may be necessary in the event of a disaster or other business continuity issue.

Answer 26

**Answer: C. An outage at a provider may result in data not being available.** The most common risk, and thus the most impactful risk in this list, is the potential for a provider outage to result in the inability to access dispersed data.

Answer 27

**Answer: B. The need to install an agent on endpoint devices** Installation of a local agent is typically required by IRM systems to ensure data is properly handled on endpoint systems. The question doesn’t specify the use of a server either in the cloud or locally, and the endpoints mentioned are not cloud-based.

Answer 28

**Answer: A. Unstructured data** Unstructured data includes data like images, audio, video, word processing files, and other data that does not have a formally defined structure like structured data does. There’s no information that indicates that this is sensitive information, and there isn’t any mention of labeling in the question.

Answer 29

**Answer: B. Ingress and egress fees for moving the data between cloud services.** Ingress and egress fees are often some of the highest expenses involved in this effort. Since the data is already contained in a storage tier, no new expenses should arise. Logs are text-based and can be stored efficiently, so it is unlikely that log files will be a major cost driver.

Answer 30

**Answer: C. Make a copy of the original image, validate it, and then analyze the copy.** Amanda knows that the original drive should be preserved, and actions should only be taken on a forensic copy. Working with the original, regardless of the process, is not a forensic best practice.

Answer 31

**Answer: A. Preventing taking a screenshot or photo of the displayed text.** While IRM systems can control actions on the system, taking a photo (or even a screen-shot) of displayed text is typically outside of their capabilities. Preventing copying, printing, and making copies are all common IRM features.

Answer 32

**Answer: B. Regulatory requirements will vary across different locations and may make compliance difficult during discovery.** **Regulatory compliance is important for organizations, and Felix needs to point out that discovery across multiple countries and regions may involve a complex set of regulations to comply with.** Costs may vary in different regions, but that is not the primary concern Felix needs to raise. There isn’t any mention in the question of whether data is structured or not, and while encryption import and export may be covered by law, discovery can be conducted using local tools in each region if necessary.

Answer 33

**Answer: B. Archive the data to a lower-cost storage tier** Archiving data to a lower-cost, and typically lower-performance storage tier, is a common strategy for cloud data retention. Examples like Amazon’s Glacier allow for long-term storage with infrequent or slow access and may come with additional costs for retrieval, but optimize costs when data is unlikely to be accessed. Deleting the data does not allow it to be used, moving to a third- party service provider is more complex, and moving to a higher-performance storage tier does not meet his cost needs.

Answer 34

**Answer: Use** The Use phase of the data lifecycle often includes modification of data, and thus will require labels to change or be added.

Answer 35

**Answer: D. TLS** TLS, or Transport Layer Security, is the encryption protocol of choice for web application traffic. It replaced SSL. Both MD5 and SHA-1 are hashing algorithms, not encryption.

Answer 36

**Answer: A. Data Dispersion** Data Dispersion best fits this description, and bit splitting is a form of data dispersion, although it is one that is more frequently associated with malicious use to avoid forensic analysis and investigations.

Answer 37

**Answer: C. Continue to preserve the data to meet the legal hold requirements.** Legal holds normally take precedence over other deletion requirements. While Gurleen should check with her organization’s legal counsel, in general, she should continue to preserve the data.

Answer 38

**Answer: D. The storage will be wiped and reclaimed by the provider to be allocated elsewhere.** Ephemeral storage that is associated with instances in most cloud provider’s infrastructure is wiped and reclaimed for reallocation when instances are terminated. If the instance were merely shut down, the storage would be retained for when the system was reactivated.

Answer 39

**Answer: A. 45 days** Ephemeral data is often kept for shorter time periods like 45 days, a time period sufficient to allow investigations without building up large volumes of data that will not be used and which can be expensive to store. Longer-term storage may be required by law or contracts or due to specific contractual requirements.

Answer 40

**Answer: A. Destroy all copies of the encryption key.** While it may seem quite simple, securely erasing all copies of the encryption key is all that it takes to complete the destruction process for crypto-shredding.

Answer 41

**Answer: A. Service outages** Service outages are the only direct threat to availability on this list. Lincoln should review the service provider’s history of service outages and issues as he makes decisions about adopting the service.

Answer 42

**Answer: D. A certificate** Information Rights Management Systems typically rely on certificates to identify systems. They can be issued centrally and managed, as well as used for digital signatures and encryption.

Answer 43

**Answer: C. Hashing** Derek’s organization is using hashing, which uses a one-way cryptographic function to replace data with values that can be referenced without exposing the actual data. Anonymization focuses on removing data that can be associated with specific users or individuals, masking uses alternate characters to conceal data, and shuffling switches data around while retaining actual data for testing.

Answer 44

**Answer: A. Long- term storage** **Long- term storage is storage that is intended to continue to exist and is often used for logs or data storage.** Storage that is associated with an instance that will be destroyed when the instance is shut down is ephemeral storage. Raw storage is storage that you have direct access to like a hard drive or an SSD that has access to the underlying device. Volume-based storage is the storage allocated as a virtual drive or device within the cloud.

Answer 45

**Answer: C. Use a second instance in the original provider’s cloud for the backup system** OWASPs Secrets Management Cheatsheet describes three main requirements for “break-glass” secrets backup environments: - ensuring automated backups are in place - executed regularly based on the number of secrets and their lifecycle, - frequently testing the restore procedures, - encrypting backups and placing them on secure, monitored storage.

Answer 46

**Answer: B. Use standard test secrets and search for them.** Using standard test secrets makes them easier to detect if they are in an exposed location. Since Alaina specifically has an internal test environment, this is the right location to use standard secrets. Secrets managers should be used in all locations, but they won’t help with identifying issues with secrets exposure. Multiple utilities may be tempting, but they add overhead and additional work where a properly configured and tested utility will provide sufficient detection in most cases. High entropy secrets are useful, but secrets with a known and consistent format can more easily be detected and thus problems with exposure can be detected.

Answer 47

**Answer: D. Retrieval time** Retrieval time, the amount of time before data can be accessed. is a primary driver for the cost of archival storage in cloud environments. The size of the data in the question is fixed, so Angie needs to figure out what other options she can control. The type and sensitivity of the data do not impact costs but may impact practices that Angie will put in place to secure and manage it.

Answer 48

**Answer: D. Data mapping** The figure shows a data mapping process between two database tables matching names, phone numbers, and email addresses. String comparison compares two strings to determine if they are the same. Field hashing and table matching were made up for this question.

Answer 49

**Answer: B. Provisioning** Provisioning for IRM systems gives users the rights and permissions that they need to access files they have rights to. Tagging and labeling help to mark files based on data classification guidelines, allowing the IRM to apply rules appropriately, and data mapping maps fields in databases to each other.

Answer 50

**Answer: B. Data access patterns** Data access patterns are one of the most important things to understand before selecting archival storage. Archival storage classes and services often focus on inexpensive, low frequency, slow access, but other options can include more dynamic access capabilities— often at a higher cost since the storage needs to be capable of higher performance. The cost of the storage is just that— cost, not performance. The volume of the data, and the amount of time that the data will be stored for, also influence cost, not performance.

Answer 51

**Answer: B. The data owner** Data owners are defined by data classification policies and hold overall responsibility for the data that they own. Data custodians are responsible for the data, ensuring access control, proper storage, and other operational controls. Data processors are often third parties who process the data as part of a business process, and data users are end users who use the data for their jobs.

Answer 52

**Answer: C. Tokenized data is only a concern if the database it is matched with is also exposed.** Tokenized data is only a concern if the database with the original data that it references is also exposed. Tokens are not encrypted, and while hashing is used in many tokenization processes, hashes are a one-way function and should have additional transformations performed to ensure that simply hashing data to get matches will not succeed.

Answer 53

**Answer: A. Enable versioning.** Versioning tools like those found in Amazon’s S3 environment allow you to revert to a previous version if an inadvertent change occurs, if data is corrupted, or if the file is deleted. This can take up significant amounts of additional space in some circumstances, so Asha should carefully consider where and when she enables versioning in her cloud environment. Daily backups, recurring snapshots, or archiving processes will take longer to restore from in most cases and may allow for multiple changes before backup, snapshot, or archiving occurs.

Answer 54

**Answer: C. Maximize the rights of important secrets to reduce the total number of secrets required.** The concept of least privilege is used for secrets too, and maximizing the number of rights a secret provides instead of minimizing it is a bad idea. Brian should instead seek to limit the rights a given secret provides to constrain the impact of a potential breach or misuse.

Answer 55

**Answer: B. Enable access logging.** Access logging will allow Chris to monitor for access to specific buckets by individuals, including privileged accounts. Authentication logging won’t show access, nor will bandwidth logging. Timestamps should be on by default for any log event.

Answer 56

**Answer: C. Content-based discovery** Jaime is performing content-based discovery by searching for specific terms in unstructured data. Label-based discovery would rely on data labels, metadata-based discovery uses metadata information, and structure-based discovery was made up for this question.

Answer 57

**Answer: B. Immediately revoke the certificate and add it to the certificate revocation list.** Mike should immediately revoke the certificate so that any malicious use will be limited. He may then want to determine why the certificate was exposed and issue a new certificate to ensure that the user or owner can continue to perform their job.

Answer 58

**Answer: A. Tag data with its sensitivity level.** Tagging data will help the DLP manage it appropriately without having to rely on pattern matching. Reducing the overall amount of data may help, but Dan’s first priority should be using more effective methods. Classifying data helps, but only if you tag it or otherwise help the DLP to easily manage it. Regular expressions are part of pattern matching. Refining them may help, but the underlying issue of relying on the most challenging matching mode will remain.

Answer 59

**Answer: C. A unique tag for each system instance** Tags are an important tool when working with ephemeral systems. While IP addresses may be reused and administrative accounts are likely to be the same across systems, tags can be unique, allowing events to be tracked to an instance. The system’s deletion time should be logged, as should the time it is instantiated, but this obviously wouldn’t be in every log event created by the machine.

Answer 60

**Answer: B. Preserve the data requested until the legal hold is over.** Legal holds typically override other agreements and lifecycles. Valerie should preserve only the data requested or covered by the legal hold, and she should continue with normal practices for the remaining data.

Answer 61

**Answer: A. Establish DLP policies.** Heikki’s next step will be to set up policies that apply appropriate controls to the data that he has categorized and labeled. Once those policies are set, he can train users and run the DLP.

Answer 62

**Answer: A. Metadata-based discovery** Kara is performing metadata- based discovery using the metadata that already exists in most digital photos. Content- based discovery might compare the photos looking for specific subjects, and label- or tag- based discovery would leverage data labels.

Answer 63

**Answer: D. The tokenized data can be looked up against a database that contains customer data.** Rick knows that the tokenized data can be looked up against a database that contains the original customer data, allowing it to be accessed as needed. The actual customer data is stored in a separate database, meaning that a breach of the token database would not result in a customer data breach. The customer data is more secure but could still be breached if attackers leveraged the tokens and their lookup capability.

Answer 64

**Answer: B. Check hashes of machine instances against a hash of the original. ** Hashes can be used to validate whether an image has been modified, but it is important to note that a running machine will have a different hash than the original. Jack may need to check specific file hashes instead. Cloud provider logs may not show changes to an instance, timestamps are time-consuming to check and may be modified when files are copied, and rebuilding the machine will typically result in differences.

Answer 65

**Answer: B. Dispersion** Dispersion is the concept of ensuring that data is in multiple locations so that a single failure, event, or loss cannot result in the destruction or loss of the data. Deduplication involves removing duplicates from a data set; protecting the supply chain and lifecycle management is how data is managed throughout its life and doesn’t specifically describe where data is stored for redundancy.

Answer 66

**Answer: A. The need to ensure regulatory compliance** When data is created and used in other countries, it may be subject to regulations specific to those countries. Christina should point out that regulatory compliance is critical and could prove complex in this scenario. Exposure issues should be accounted for through best practices like using encryption for all transfers. Transfer speeds for archival data are typically not as critical as for operational data, and permissions should be set to be appropriate for the data regardless of location.

Answer 67

**Answer: D. Use** The Use stage occurs after Classification and before Transfer or Archiving.

Answer 68

**Answer: D. Two different files can be hashed to the same output.** Ben knows that hashes are one- way functions and cannot be reversed. They generate fixed- length output from variable- length input, and identical files will generate identical output. Two different files should never generate the same output— this is known as a collision and is not acceptable in a hashing algorithm.

Answer 69

**Answer: C. Share** IRM can be particularly useful during the sharing stage of the cloud data lifecycle since information rights management tools can ensure privileges and access to data are appropriately managed as data moves around the organization and potentially leaves it.

Answer 70

**Answer: A. Storage** Storage occurs after creation in the cloud data lifecycle. Labeling is typically done as part of the creation process. Provisioning may be done at any time but is often associated with use. Encryption is not a stage in the cycle.

Answer 71

**Answer: A. Anonymization** Naomi has attempted to anonymize the data by removing personally identifiable data. Hashes use a one- way cryptographic function to reference data without having the data itself exposed or in use. Shuffling moves data around so it is not associated with its original entries, but allows actual data to be used for testing. Tokenization uses two databases with data mapped to reference items known as tokens, allowing tokens to be referenced and then data that those tokens refer to be accessed.

Answer 72

**Answer: B. Data value** Data value is typically not included in a data retention policy. Instead, retention periods, compliance requirements, and data classification are included as well as lifecycle requirements and archiving and retrieval procedures.

Answer 73

**Answer: C. Raw storage** Raw storage is storage that you have direct access to like a hard drive or an SSD that has access to the underlying device. Long- term storage is storage that is intended to continue to exist and is often used for logs or data storage. Storage that is associated with an instance that will be destroyed when the instance is shut down is ephemeral storage. Volume- based storage is storage allocated as a virtual drive or device within the cloud.

Answer 74

**Answer: C. They will need to open the files using tools that support IRM.** IRM tools require that client devices and applications support IRM or that they access files via web applications that can provide IRM controls. Fortunately, SharePoint users tend to use Microsoft Office heavily, meaning that many, if not most, of the files will be opened using tools that natively support IRM.

Answer 75

**Answer: B. Extract and catalog metadata.** Using existing metadata will help Angelo with his data discovery process, so he should start with that. Once he has the metadata processed, he can use it to speed up other cataloging efforts like scanning for sensitive data, classifying data, and mapping the data to compliance requirements.

Answer 76

**Answer: B. Raw storage** This is an example of raw storage, which directly presents underlying hardware to users.

Chapter 2: Domain 2: Architecture and Design Flashcards

(100 cards)