Big Data Flashcards

1
Q

Big Data

A

The general consensus of the day is that there are specific attributes that define big data. In most big data circles, these are called the four V’s: volume, variety, velocity, and veracity.

Basically, data too big to transport, to analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Volume

A

The main characteristic that makes data “big” is the sheer volume. It makes no sense to focus on minimum storage units because the total amount of information is growing exponentially every year. In 2010, Thomson Reuters estimated in its annual report that it believed the world was “awash with over 800 exabytes of data and growing.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variety

A
o	Variety is one the most interesting developments in technology as more and more information is digitized. Traditional data types (structured data) include things on a bank statement like date, amount, and time. These are things that fit neatly in a relational database.
o	Structured data is augmented by unstructured data, which is where things like Twitter feeds, audio files, MRI images, web pages, web logs are put — anything that can be captured and stored but doesn’t have a meta model (a set of rules to frame a concept or idea — it defines a class of information and how to express it) that neatly defines it.
o	Unstructured data is a fundamental concept in big data. The best way to understand unstructured data is by comparing it to structured data. Think of structured data as data that is well defined in a set of rules. For example, money will always be numbers and have at least two decimal points; names are expressed as text; and dates follow a specific pattern.
o	With unstructured data, on the other hand, there are no rules. A picture, a voice recording, a tweet — they all can be different but express ideas and thoughts based on human understanding. One of the goals of big data is to use technology to take this unstructured data and make sense of it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Veracity

A

Veracity refers to the trustworthiness of the data. Can the manager rely on the fact that the data is representative? Every good manager knows that there are inherent discrepancies in all the data collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Velocity

A

Velocity is the frequency of incoming data that needs to be processed. Think about how many SMS messages, Facebook status updates, or credit card swipes are being sent on a particular telecom carrier every minute of every day, and you’ll have a good appreciation of velocity. A streaming application like Amazon Web Services Kinesis is an example of an application that handles the velocity of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly