Midterm Flashcards
(79 cards)
FILTER + REPRESENT
Reorganize your data and take only what you need
The pros of mining before filtering is you know exactly what you want to filter. The con is you don’t know if there is enough data to answer your questions
Filter and Represent have an iterative nature. How you represent data can influence what you acquire
This stage could lead you back to aquire
ACQUIRE
Locate and download the data from a source
Primary Data
information collected for specific purpose at hand
Secondary Data
information that already exists somewhere, having been collected for another purpose
PARSE
Look through data columns and identify the types and its correctness
Modify columns by splitting if needed
Each piece of data needs to be converted to a useful format
String
a set of characters that forms a word of sentence
Float
a number with a decimal point
Character
a single letter or other symbol
Integer
a number with no fractional part
Alphanumeric
consists of both letters and numbers
Boolean
True or False
MINE
Determine basic descriptors and statistics for your data, categorize it, and figure out the range and spread, as well as partters
Categorize your data into groups such as nutrient fact
Should also start asking questions
Figure out if temporal data needs to be reorganized
Range check is important to see if there are null / na or negative numbers
FILTER + REPRESENT
Reorganize your data and take only what you need
The pros of mining before filtering is you know exactly what you want to filter. The con is you don’t know if there is enough data to answer your question
Filter & Represent have an iterative nature. How you represent data can influence what you aquire
This stage could lead you back to acquire
CHRTS
categorical, hieratical, relational, temporal, spatial
Categorical
compare categories of quantitative data
Hierarchical
visualize relationships and hierarchies
Relational
charts relations to explore correlations
Temporal
data that happens over time
Spatial
data pertaining to a location
CRITIQUE + REFINE
Get feedback of your charts and refine based on the feedback
This stage could lead you back to acquire, min, or filter & represent
Data Product
translate the records of a data source into an easily understandable format
ex:
Raw vs Processed
Granular vs Summarized
Textual vs Quantitative
Statistic vs Dynamic
Small vs Massie
Structured Data
easily searchable
Unstructured Data
not easily searchable
ex:
audio, video, reviews
Quantitative
numerical data that is either discrete or continuous