unit 5-data Flashcards

1
Q

citizen science

A

scientifc research conducted in whole or part by distributed individuals, many of whom may not be scientists, who contribute relevant data to research using their own computing devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

cleaning data

A

a process that makes data uniform without changing its meaning (changing spelling numbers in letters to their numerical value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

correlation

A

A relationship between two pieces of data typically refers to the amount that one has in relation to the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Crowdsourcing

A

Crowdsourcing: the practice of obtaining input or information from a large number of people via the Internet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

information

A

a collection of facts or patterns collected from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

data bias

A

data that does not accurately reflect the full population or phenomenon being studied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is data filtering and how is it done?

A

choosing a smaller subset of a data set to use for analysis, for example by eliminating / keeping only certain rows in a table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

bar chart

A

Graph of bars that shows the number of times each value in a column of data appears

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Histogram

A

: Similar to a bar chart, but all numbers within a range (bucket) are grouped together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Crosstab chart

A

Crosstab Chart: counts the number of times combinations of values appear (similar to a frequency table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

scatterplot

A

Scatterplot: graph that shows the relationship between 2 sets of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Open Data:

A

publicly available data shared by governments, organizations, and others so that anyone can analyze it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

big Data

A

collection of huge amounts of data so we can learn from it often requiring cloud computing or parallel processing systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is Metadata, what is it used for, and why is it important?

A

data about data that is used to organize, find, and manage information. it is important because it increases the effectiveness of data by providing extra information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is crowdsourcing

A

The practice of obtaining input or information from a large number of people via the internet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages of a histogram? What are the advantages of a bar chart?

A

Histogram advantages-
- useful when many unique values must be grouped (ONLY NUMBERS)
- easier to read with wider buckets

Bar chart advantages
- can work with both numeric nad qualitative data
- good at finding frequency of a value

17
Q

what are the disadvantages of a histogram? what are the disadvantages of a bar chart?

A

histogram
-only works with numerical values

cons
-not useful because they have too many unique values (especially if data has small incremnets and is each input has the same/similar output)

18
Q

What is two-column data?

A

data that uses 2 variables

for ex. height and max lifespan of dogs

19
Q

What is one-column data?

A

1 variable ( for ex., the population across states (states is the variable)

20
Q

What are the pros and cons of cross-tabs?

A

Pros/Useful for:
Finding the most / least common combinations of values
in two columns
Finding patterns across two columns
Exploring two columns when one or both are strings.

Cons/Not useful:
If either column has too many values
(the chart would be enormous)

21
Q

When are scatter plots useful? When are they not useful?

A

scatter plots are useful when you want to see trends and patterns between two values or when you have numerical data with lots of unique, different values.
Seeing patterns and trends between two values
Numeric data with lots of different values
scatter plot are not useful when a specific combination has many values, as this is not easy to visualize. In that situation, using a cross-tab would be more helpful because it counts the frequency of a specific value.