Big data Flashcards
(8 cards)
Characteristics of big data:
Technology has allowed both the public and private sector to collect and analyse
very large data sets of information.
- Includes private personal data, and other forms of data
- Big data can include personal data (data from social media or loyalty cards ) , but can
include other data such as climate change data
- Characteristics of Big Data :
▪ Very large data sets
▪ Data brought together from different sources
▪ Data which can be analysed very quickly (e.g. real time)
Data protection considering big data
Considers the situation where the underlying data set includes personal data
- If personal data is held by a company, then it must comply with the protection of
data rules
- Anonymisation can potentially aid big data analytics since privacy considerations are
likely to be a concern for individuals whose data is held in large amount ( big data)
- Anonymisation can assist organisations to carry on research or develop products and
services
- Anonymisation also helps organisations to give an assurance to the people whose
data was collected that the organisation is not using data that identifies them for
analytics
- Big data uses all data and this raises questions about whether big data is excessive,
while the sources used also raise questions of whether personal information being
used is relevant
- Consent is required for using personal information
- Regulators expect the organisations that hold
- Data governance is essential for holders of big data
- Big data to proactive (take-action ) in considering any information security risks
- Organisations using big data need to do the following to show compliance :
* Be transparent when they collect data
* Explain how the data will be used
Big data analytics (processing)
- Uncover patterns
- Uncover trends
- Uncover correlations
- Uncover other details that can be used to inform decision making in
organisations
Data governance
- Governance focuses on overall management of the
▪ availability of data employed in an organisation
▪ usability of data employed in an organisation
o ability of a user to derive useful information from data
▪ integrity of data employed in an organisation
o looks at the overall accuracy, completeness and consistency(validity)
of data
▪ security of data employed in an organisation
Data governance policy
- documented set of guidelines for ensuring the proper management of the
organisation’s data - normally sets out guidelines with regards to :
▪ the specific roles and responsibilities of individuals in the organisation with
regards to data
▪ mechanism for ensuring that the relevant legal and regulatory requirements
with regards to data management are met
▪ how an organisation will capture, analyse, and process data
issues with respect to data security and privacy
▪ controls that will put in place to ensure that the required data standards are
applied
▪ how the adequacy of the controls will be monitored on an ongoing basis with
respect to : data usability, data accessibility, data integrity , and data security
Data governance risks
- a sound data governance should provide the organisation’s stakeholders (staff,
management, shareholders and policyholders ) with confidence that the
organisation is dealing appropriately with data it holds - organisations that do not have adequate data governance procedures can be
exposed to risks relating to :
▪ legal and regulatory non-compliance
▪ inability to rely on data for decision making
▪ reputational issues – lead to loss of business both existing and ability to
attract other new customers
▪ incurring additional costs ( example, fines and legal costs )
Summary of data risks
➢ data is inaccurate( erroneous ) or incomplete – lead to erroneous
results/conclusions
➢ There may be insufficient historical data available to estimate credibility of extent of
risk and likelihood of occurrence
➢ Risk that there in insufficient data on extreme events (very adverse circumstances)
to provide credible estimates of that risk – which may be necessary for some
purposes
➢ Risk that data from other sources ( industry , other countries, competition ) may not
be a sufficiently good proxy for the risk being assessed
➢ Historical data may not reflect what will happen in the future
➢ Risks with homogenous groups
- Risk that individual groups are too small for credible analysis
- If data groups are matched for sufficiency, there is risk that data may not be
sufficiently homogenous
➢ risk that data is not available in required format for intended purpose
➢ risk of data collected not appropriate for another different purpose ( not relevant
for intended purpose )
➢ risk of lack of confidence in available data
Historical data is not a good reflection of future experiences because:
➢ past abnormal events
➢ significant random fluctuations
➢ future trends not being reflected sufficiently in past data
➢ changes in the way past data was recorded
➢ heterogeneity with group to which assumptions are to relate
➢ past data may not be up to date
➢ other changes :
o medical changes
o social and economic changes