14. Related Technologies - DONE Flashcards

1
Q

What is “Big Data”?

A

The term “big data” refers to extremely large data sets from which you can derive valuable information. Big data can handle volumes of data that traditional data-processing tools are simply unable to manage. You can’t go to a store and buy a big data solution, and big data isn’t a single technology. It refers to a set of distributed collection, storage, and data-processing frameworks.

According to Gartner, “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

“The CSA refers to the qualities from the Gartner quote as the “Three Vs.” Let’s define those now:

A

*High Volume - A large amount of data in terms of the number of records or attributes
*High Velocity - Fast generation and processing of data (such as real-time or data stream)
*High Variety - Structured, semistructured, or unstructured data”

“The Three Vs of big data make it very practical for cloud deployments, because of the attributes of elasticity and massive storage capabilities available in Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) deployment models. Additionally, big data technologies can be integrated into cloud-computing applications.”t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

“Big data systems typically are typically associated with three common components:

“ data gets collected, stored, and processed.”

A

*Distributed data collection = This component refers to the system’s ability to ingest large volumes of data, often as streamed data. Ingested data could range from simple web clickstream analytics to scientific and sensor data. Not all big data relies on distributed or streaming data collection, but it is a core big data technology.

*Distributed storage = This refers to the system’s ability to store large data sets in distributed file systems (such as Google File System, Hadoop Distributed File System, and so on) or databases (such as NoSQL). NoSQL (Not only SQL) is a nonrelational distributed and scalable database system that works well in big data scenarios and is often required because of the limitations of nondistributed storage technologies.

*Distributed Processing = Tools and techniques can distribute processing jobs (such as MapReduce, Spark, and so on) for the effective analysis of data sets that are so massive and rapidly changing that single-origin processing can’t effectively handle them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You know that big data is a framework that uses multiple modules across multiple nodes to process high volumes of data with a high velocity and high variety of sources. This makes security and privacy challenging when you’re using a patchwork of different tools and platforms.

“This is a great opportunity to discuss how security basics can be applied to technologies with which you may be unfamiliar, such as big data.”

A

At its most basic level, you need to authenticate, authorize, and audit (AAA) least-privilege access to all components and modules in the Hadoop environment. This, of course, includes everything from the physical layer all the way up to the modules themselves.

For application-level components, your vendor should have their best practices documented (for example, Cloudera’s security document is roughly 500 pages long) and should quickly address any vulnerabilities with patches. Only after these AAA basics are addressed should you consider encryption requirements, both in-transit and at-rest as required.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Collection

A

“When data is collected, it will likely go through some form of intermediary storage device before it is stored in the big data analytics system. Data in this device (virtual machine, instance, container, and so on) will also need to be secured, as discussed in the previous section. Intermediary storage could be swap space (held in memory).

Your provider should have documentation available for customers to address their own security requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

EXAM TIP

A

“all components and workloads required of any technology must have secure AAA in place. This remains true when underlying cloud services are consumed to deliver big data analytics for your organization. An example of a cloud-based big data system could consist of processing nodes running in instances that collect data in volume storage.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Key Management

A

“If encryption at rest is required as part of a big data implementation (everything is risk-based, after all), implementation may be complicated by the distributed nature of nodes. As far as the protection of data at rest, encryption capabilities in a cloud environment will likely be defined by a provider’s ability to expose appropriate controls to secure data, and this includes key management. Key management systems need to be able to support distribution of keys to multiple storage and analysis tools.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Security Capabilities

A

CSP controls can be used to address your security requirements as far as the services that may be consumed (such as object storage) as part of your big data implementation. If you need your data to be encrypted, see if your cloud provider can do that for you. If you need very granular access control, see if the provider’s service includes it.

The details of the security configuration of these services and controls should be included in your security architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

“Identity and Access Management”

A

As mentioned, authorization and authentication are the most important controls. You must ensure that they are done correctly. In your cloud environment, this means starting with ensuring that every entity that has access to the management plane is restricted based on least-privilege principles.

Moving from there, you need to address access to the services that are used as part of your big data architecture.

Finally, all application components of the big data system itself need to have appropriate access controls established.

“Considering the number of areas where identity and access management (IAM) must be implemented (cloud platform, services, and big data tool level), entitlement matrices can be complicated.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

PaaS benefits

A

“Cloud providers may offer big data services as a PaaS. Numerous benefits can be associated with consuming a big data platform instead of building your own. Cloud providers may implement advanced technologies, such as machine learning, as part of their offerings.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PaaS risks

A

You need to have an adequate understanding of potential data exposure, compliance, and privacy implications. Is there a compliance exposure if the PaaS vendor employees can technically access enterprise data? How does the vendor address this insider threat? These are the types of questions that must be addressed before you embrace a big data PaaS service.

risk-based decisions must be made and appropriate security controls implemented to satisfy your organizational requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

“Internet of Things (IoT)”

A

“Internet of Things includes everything in the physical world, ranging from power and water systems to fitness trackers, home assistants, medical devices, and other industrial and retail technologies.

Beyond these products, enterprises are adopting IoT for applications such as the following:

*Supply chain management
*Physical logistics management
*Marketing, retail, and customer relationship management
*Connected healthcare and lifestyle applications for employees and consumers”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

“The following cloud-specific IoT security elements are identified in the CSA Guidance:”

A

“*Secure data collection and sanitization = This could include, for example, stripping code of sensitive and/or malicious data.

*Device registration, authentication, and authorization = One common issue encountered today is the use of stored credentials to make direct API calls to the backend cloud provider. There are known cases of attackers decompiling applications or device software and then using those credentials for malicious purposes.
*API security for connections from devices back to the cloud infrastructure = In addition to the stored credentials issue just mentioned, the APIs themselves could be decoded and used for attacks on the cloud infrastructure.
Encrypted communications = Many current devices use weak, outdated, or nonexistent encryption, which places data and the devices at risk.”
Ability to patch and update devices so they don’t become a point of compromise = Currently, it is common for devices to be shipped as-is, and they never receive security updates for operating systems or applications. This has already caused multiple significant and highly publicized security incidents, such as massive botnet attacks based on compromised IoT devices.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Mobile Computing

A

Companies don’t require cloud services to support mobile applications, but still, many mobile applications are dependent on cloud services for backend processing. Mobile applications leverage the cloud not only because of its processing power capabilities for highly dynamic workloads but also because of its geographic distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The CSA Guidance identifies the following security issues for mobile computing in a cloud environment:

A

*Device registration, authentication, and authorization are issues for mobile applications, as they are for IoT devices, especially when stored credentials are used to connect directly to provider infrastructure and resources via an API. If an attacker can decompile the application and obtain these stored credentials, they will be able to manipulate or attack the cloud infrastructure.

*Any application APIs that are run within the cloud environment are also listed as a potential source of compromise. If an attacker can run local proxies that intercept these API calls, they may be able to decompile the likely unencrypted information and explore them for security weaknesses. Certificate pinning/validation inside the application may help mitigate this risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
A

“The Open Web Application Security Project (OWASP) defines pinning as “the process of associating a host with their expected X509 certificate or public key. Once a certificate or public key is known or seen for a host, the certificate or public key is associated or ‘pinned’ to the host.”.

17
Q

Serverless Computing

A

“Serverless computing can be considered an environment in which the customer is not responsible for managing the server. In this model, the provider takes care of the servers upon which customers run workloads. The CSA defines serverless computing as “the extensive use of certain PaaS capabilities to such a degree that all or some of an application stack runs in a cloud provider’s environment without any customer-managed operating systems or even containers.”

18
Q
A

“If your organization is planning on using the cloud, you will likely be using serverless offerings. There’s nothing inherently wrong with this, because these services can be highly orchestrated (aka event-driven) and have deep integration with IAM services supplied by the provider.

Just be aware that the more you leverage services supplied by the provider, the more dependent (locked in) your organization becomes, because you would have to re-create the environment in a new environment.”

18
Q

“The following serverless computing examples are provided by the CSA:”

A

*Object storage
*Cloud load balancers”
*Cloud databases
*Machine learning
*Message queues
*Notification services
*API gateways
*Web servers

19
Q

“From a security perspective, the CSA Guidance calls out the following issues that you should be aware of before taking your CCSK exam:”

A

“*There will be high levels of access to the cloud provider’s management plane because that is the only way to integrate and use the serverless capabilities.

*Serverless can dramatically reduce attack surfaces and pathways, and integrating serverless components may be an excellent way to break links in an attack chain, even if the entire application stack is not serverless

*Any vulnerability assessment or other security testing must comply with the provider’s terms of service. Cloud users may no longer have the ability to test applications directly, or they may test with a reduced scope, since the provider’s infrastructure is now hosting everything and can’t distinguish between legitimate tests and attacks.

*Incident response may also be complicated and will definitely require changes in process and tooling to manage a serverless-based incident.”

19
Q

“From a security perspective, the CSA Guidance calls out the following issues that you should be aware of before taking your CCSK exam:”

A

*Serverless places a much higher security burden on the cloud provider. Choosing your provider and understanding security SLAs and capabilities is absolutely critical.

*Using serverless, the cloud user will not have access to commonly used monitoring and logging levels, such as server or network logs. Applications will need to integrate more logging, and cloud providers should provide necessary logging to meet core security and compliance requirements.

*Although the provider’s services may be certified or attested for various compliance requirements, not necessarily every service will match every potential regulation. Providers need to keep compliance mappings up-to-date, and customers need to ensure that they use only the services within their compliance scope.”

20
Q
A

“Trust but verify your provider by performing due diligence, and remember that in addition to preventative controls, you need detection. Since the services are built and managed by the provider with all serverless offerings, you may need to build logging into applications that are run in a serverless environment.”

21
Q

Big Data Recommendations:

A

“*Authorization and authentication for all services and application components need to be locked down on a least-privilege basis.

*Access to the management plane and big data components will be required. Entitlement matrices are required and may be complicated by addressing these various components

*Follow vendor recommendations for securing big data components.

*Big data services from a provider should be leveraged wherever possible. When using provider services as part of a big data solution, you should understand the advantages and security risks of adopting such services.”

22
Q

Big Data Recommendations:

A

*If encryption of data at rest is required, be sure to address encryption in all locations. Remember that in addition to the primary storage, you must address intermediary and backup storage locations.

*Do not forget to address both security and privacy requirements.

*Ensure that the cloud provider doesn’t expose data to employees or administrators by reviewing the provider’s technical and process controls.

*Providers should clearly publish any compliance standards that their big data solutions meet. Customers need to ensure that they understand their compliance requirements.

*If security, privacy, or compliance is an issue, customers should consider using some form of data masking or obfuscation.”

23
Q

Internet of Things Recommendations:

A

“*IoT devices must be able to be patched and updated.
*Static credentials should never be used on devices. This may lead to compromise of the cloud infrastructure or components.
*Best practices for device registration and authentication to the cloud should always be followed. Federated identity systems can be used for such purposes.
*Communications should always be encrypted.
*Data collected from devices should be sanitized (input validation best practice).
*Always assume API requests are hostile and build security from that.”

24
Q

Mobile Computing Recommendations:

A

*When designing mobile applications, follow CSP recommendations regarding authentication and authorization.
*As with IoT, federated identity can be used to connect mobile applications to cloud-hosted applications and services.
*Never transfer any keys or credentials in an unencrypted fashion.
*When testing APIs, assume all connections are hostile and that attackers will have authenticated unencrypted access.
*Mobile applications should use certificate pinning and validation to mitigate the risk of attackers using proxies to analyze API traffic that may be used to compromise security.
*Perform input validation on data and monitor all incoming data from a security perspective. Trust no one!
*Attackers will have access to your application. Ensure that any data stored on the mobile device is secured and properly encrypted. No data that may lead to a compromise of the cloud side (such as credentials) should be stored in the device.”

25
Q

Serverless Computing Recommendations:

A

*Serverless platforms must meet compliance requirements. Cloud providers should be able to clearly state to customers what certifications have been obtained for every platform.

*Customers should use only platforms that meet compliance requirements.

*Serverless computing can be leveraged to enhance the overall security architecture. By injecting a provider service into your architectures (such as a message queuing service), attackers would need to compromise both the customer and provider services, which will likely be a significant hurdle for them, especially if a service removes any direct network connectivity between components or the cloud and the customer data centre.

*Security monitoring will change as a result of serverless, because the provider assumes more responsibility for security and may not expose log data to customers. This may require that more logging be built into applications created for serverless environments.

*Security assessments and penetration testing of applications leveraging provider platforms will change. Use only assessors and testers who are knowledgeable about the provider’s environment.

*Incident response will likely change even more dramatically in PaaS platforms than in IaaS. Communication with your provider regarding incident response roles is critical.”

26
Q
A

The only listed attribute in the CSA Guidance regarding mobile application suitability for the cloud is the geographical nature of the cloud. Yes, a cloud environment may be more secure, but this is, of course, a shared responsibility. You are never guaranteed that running in the cloud will be cheaper than running systems in your own data center.

27
Q

Why may entitlement matrices be complicated when using them for big data systems?”

A

“CSA states that entitlement matrices can be complicated by both the number of components in a big data system as well as the cloud resources that may be leveraged as part of a big data implementation.”

28
Q

“According to the CSA, what is an/are attribute(s) of the cloud that makes it ideal to support mobile applications?”

A

“CSA states that entitlement matrices can be complicated by both the number of components in a big data system as well as the cloud resources that may be leveraged as part of a big data implementation.”