Chap 2 Flashcards

(32 cards)

1
Q

What is data science

A

its a multi disciplinary field that uses scientific methods , processes , algorithms and systems to extract knowledge and insights from structured , semi-structured and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a data scientist ?

A

its a person engaging in a systematic activity to acquire knowledge from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the role of data scientists ?

A

they perform research toward a more comprehensive understanding of products , systems or nature including physical , mathematical and social realms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the skillset of data scientists?

A

a strong background in
1. statistics and linear algebra
2. programming knowledge
3. data warehousing , mining and modeling to build and analyze algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is an algorithm ?

A

its a set of instructions designed to perform a specific task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is Data?

A

Data can be described as unprocessed facts and figures, it can exist in any form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is information ?

A

its data that has been given meaning and is the processed data on which decisions and actions are based.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is Data processing

A

its the restructuring of data by people or machines to increase their usefulness and add value for a particular purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the basic steps of data processing ?

A
  • input
    -processing
    -output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what are some material forms of data?

A

numbers
text
symbols
images
sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the 2 categories of data forms

A

qualitative =descriptions
quantitative =numeric records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what’s data type

A

its what informs the interpreter how the programmer intends to use the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the different types of computer programming perspectives

A
  • integers
    -booleans
    characters
    strings
    float
    Astrings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the 3 common types of data types:

A

structured
semi structured
unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is structured data

A

its data that can be easily organized stored and transferred in a defined data model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is semi structured data

A

its a mix of unstructured and structured data

17
Q

what is unstructured data

A

information that either does not have predefined data model or is not organized in a pre defined manner

18
Q

what is meta data?

A

its data about data.
It provides additional information about a specific set of data

19
Q

what is Data value chain?

A

it describes the process of data creation and reuse

20
Q

whats data acquisition :

A

its the process of gathering filtering and cleaning data

21
Q

what is data analysis?

A

its the process of evaluating data using analytical and statistical tools to discover useful information
it involves :
exploring
transforming
modeling data

22
Q

what is data curation?

A

its the active management of data over its life cycle to ensure that it meets the necessary data quality requirements for its effective usage.

23
Q

what are the different activities of data curation processes ?

A

content creation
selection
classification
transformation
validation
preservation

24
Q

what is data storage

A

its the persistence and management of data

25
what is data usage
it is the use of data for the required purpose
26
what is big data
its a collection of data sets that are large and complex, and that is hard to process using management tools or data processing apps
27
what 3 things is the definition of big data based on?
volume velocity variety
28
what is resource pooling ?
combining the available storage space to hold data
29
whats High availability
it prevents hardware or software failures from affecting access to data and processing
30
whats easy scalability
it makes it easy to scale horizontally by adding additional machines to the group
31
whats hadoop
its an open software that stores and processes large non-relational data
32
what are the 4 characteristics of hadoop?
1. economical 2.reliable 3.scalable 4.flexible