unit 9 Flashcards
what is a fact when looking at visualizations?
what the data shows
what is the opinion when looking at visualizations?
why the fact might be the case
what assumptions should we be careful to make?
correlation does not equal causation
what is metadata?
data about data
what happens to the primary data when metadata is changed?
can be changed without impacting the primary data
what can the metadata be used for?
finding, organizing, and managing information
what does metadata increase?
increases effective use of data by providing extra information
what does metadata allow the data to be?
structured and organized
how to create a bar chart
count how many times each value in the column appears and make a bar at that height
information we can get out of bar charts
what values are the most common in this column
what values are the least common in this column
what is the unique list of values in this column
what happens when all the values of a chart is unique?
it is not useful
how to create a histogram
similar to a bar chart, but all numbers in a bucket are grouped together
when can histograms be created?
only with numeric data
when is a histogram useful?
when a normal bar chart may be difficult to read
information we can get out of histograms?
what range of values are the most common in this column
what range of values are the least common in this column
what range of values do or do not appear in this column
when does data need to be cleaned?
data in incomplete
data is invalid
multiple tables are combined into one
what leads to messy data
users enter in different types of data - “two” or 2
users use diff abbreviations for some info - “February” or “Feb”
data may have diff spellings - “colour” or “color”
data has inconsistent capitalization - “spring” or Spring”
what does filtering data allow for?
allows the user to look at a subset of the data
when are bar charts and histograms useful?
when looking at one column of data
ways to visualize data that look at two columns of data at the same time
crosstab chart
scatterplot
crosstab chart
counts how many times combinations of values appear
scatterplot
useful for seeing patterns and trends between two values and numeric data with lots of different values
not useful for lots of repeated values
what can we takeaway from manipulating and visualizing data?
develop insights and knowledge about our world by finding patterns
what can we see when investigating two columns of data?
we can observe patterns different values move together (how they are correlated) but cannot know the cause of correlation