Book - Chapter 1 intro to big data analytics Flashcards

1
Q

What are the vs of big data

A

Volume. Velocity. Variety.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meta data

A

The minimum you should know about the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is paraders

A

How has the data been processed. What are the artefacts left in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is velocity

A

It is speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three attributes that stand out of defining big data characteristics

A

Huge volume of data
Complexity of data types and structures
Speed of new date of creation and growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is huge volume of data

A

Rather than thousands of rows, big data can be billions of rows and millions of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is complexity of data types and structures

A

It reflects the variety of new data sources, formats and structures, including digital traces been left on the web and other digital repositories for subsequent analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is speed of new data creation and growth

A

If you describe high velocity data, the rapid data ingestion in near real-time analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What way is big data sometimes described as having

A

The big free v’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the big three Vs

A

Volume, variety and velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can big data be Efficiently analysed using only traditional database or methods

A

No it requires new tools and technologies to store, manage and realise the business benefits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What main two forms can big data come from

A

Structured and nonstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is most of the big data formed

A

Usually unstructured or semistructured in nature Which requires different techniques and tools to process and analyse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where does 80 to 90% of future data growth come from

A

Non-structured data types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What sort of data in addition could the RDBMS have

A

Quasi-or semistructured data, such as three form cell log information taking from an email ticket of the problem, customer chat history

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the four parts of big data characteristics: data structures

A

Bottom: unstructured
Third: “is the structured
Second: semistructured
Top: structured

17
Q

What is quasi structured

A

Erratic structure, Webb click

18
Q

What is semistructured

A

Structure definition is embedded in the data

19
Q

What is structured

A

External definition of structure

20
Q

What does structured data consist of

A

A defined data type, format, and structure (transaction data online analytical processing data cubes, traditional RDBMS, CSV files and even simple spreadsheet) Excel

21
Q

What does semistructured data consist of

A

Textual data files with a discernible pattern that enables passing (such as extensible markup language XML data files that are self describing and find by an XML schema)

Scripts

22
Q

What does quasi-structured data consist of

A

Textual data with erratic data formats that can be formatted with effort, and time, and tools (for instance, web clckstreams data that may contain inconsistencies in data values and format)

23
Q

What does unstructured data consist of

A

Text documents, PDFs, images and video i.e. data has no inherent structure

24
Q

How can a clickstream be used

A

It can be passed in mind by data scientist to discover usage patterns I don’t have a relationship someone clicks and areas of interest on the website a group of sites

25
How does big data describe data
It describes new kinds of data with which most organisations may not be used to working
26
Is database administration training required to create spreadsheets
No
27
What are EDW
Enterprise data warehouse
28
What are enterprise data warehouse is critical for
Reporting and B I tasks and solve many other problems that proliferating spreadsheets introduce such as which of multiple versions of a spreadsheet is correct
29
Despite the benefits of EDW and PI what do these systems tend to restrict
The flexibility need to perform robust or exploratory data analysis
30
With the EDW model who is the data managed and controlled by
IT groups and database administrators (DBA) And data analysts who depend on IT for access and changes to the data of schemas
31
What new problems do EDW and B I introduce
Flexibility and agility which were less pronounced when dealing with spreadsheets
32
What is the solution to the problems faced with EDW and PI when dealing with spreadsheets
The analytic sandbox
33
What does the analytic sandbox attempt to resolve
The conflict for analysis and data scientists with EDw and more formally managed corporate data
34
How are analytic sandboxes purposely designed
To enable robust analytics well being centrally managed and secured
35
How are analytic sandboxes often referred to as
Work spaces as they are designed to enable teams to explore more data set in a controlled fashion and are not typically use the enterprise level financial reporting and sales databases
36
What do Analytic sandboxes enable
High-performance Computering using in database processing