Chapter 9 Flashcards
(23 cards)
What is Big Data?
Complex and big data sets.
Where does Big Data come from?
Social media,
Smart phones,
Sensors.
What is primary data in Big Data?
Specifically collected for research purpose.
What is secondary data in Big Data?
Not specificaly collected for research.
Which three characteristics of Big Data are good for research?
1: Big,
2: Always on
3: Nonreactive.
Which 7 characteristics of Big Data are bad for research?
1: Incomplete
2: Inaccessible
3: Nonrepresentative
4: Drifting
5: Algorithmically confounded
6: Dirty
7: Sensitive
Why is Big an advantage?
With rare events you get enough data when there is heterogenicity.
Why is always on an advantage?
You have real-time measurements. (Spotting trends)
Why is nonreactive an advantage?
Measuring Big Data sources is less likely to change behavior
Why is incomplete an disadvantage?
Leaving out important data.
Why is inaccessible a disadvantage?
Legal and complience of data acces.
Why is unrepresentative an disadvantage?
E.g. consumers saying reviews are important but the data is not always valid.
Why is drifting an disadvantage?
Big Data source can change, the users, the usage or the platform.
Why is algorithmically confounded an disadvantage?
Design of the platform can change behavior. (FB encourages atleast 20 friends so minimum of friends is not easy to study)
Why is dirty a disdvantage?
Big Data can be loaded with junk.
Why is sensitive a disadvantage?
Big Data can have sensitive data.
How can you leanr from big data?
With measuring, prediction and experiments.
What can you do to fight against overfitting?
Cross validation
Regularization
What is cross validation?
Validate model on antoher.
What is regularization?
Raise the bar for significance.
What is Map Reduce?
Access data in parralel.
Data can be big in which two ways?
Tall (many observations)
Wide (few observations)
What is a lift?
Hoe vaak een variabel voorkomt tenopzichte van een ander variabel.