Lecture 1 and 2 Flashcards
Data deluge
Overwhelming amount of data is generated and collected at such a rapid pace that it becomes challenging to process, manage, and make sense of efficiently
Storage capacity
Amount of data a system can store (measure with MB, GB etc.)
Human resources
Workforce or personnel
Digital assets
Files, documents, software, and data with value
Technological and analysis development
Advancement in tools, methods, and technologies to enhance data processing, analysis, and decision-making capabilities
Small Data (micro-data)
Elementary data, easy to analyze, collected for specific objectives. Result of consesus or sample survey. Used to measure scale of phenomenon. Not available in real time.
Open Data (administrative data)
Publicly accessible data that anyone can use (ex. government released statistics), certified and free. Don’t cover entire territory, so it’s not fit for comparing, although it can help with the decision-making process. Generally not available in real time.
Big Data
Huge and constantly growing sets of data generated unconsciously (ex. Google analyzing searches to improve algorithms). Characterized by variety and lack of structure. Unofficial and without certification model. Accessible in real time.
7V model
Volume
Velocity
Variety
Value
Veracity
Validity
Visualization
Volume
How much data we have.
Velocity
The speed at which the data is processed and redistributed.
Variety
Types and formats of data (ex. structured, unstructured).
Value
Value of data.
Veracity
Accurateness of data.
Validity
Validity of data / importance.
Visualisation
Way of representing data (ex. charts).
Social Data
Data obtained through social networking and interactions. It helps to understand social behavior and identify key influencers and trading topics. It changes quickly and frequently. (ex. likes, shares, comments topics, etc.)
Features of Data (new data, new sources)
Relevance
Accessibility
Reliability
Comparability
Completeness
Timeliness
Periodicity
Quality
Relevance
Data should meet user needs and include meaningful measurements.
Accessibility
Information should be clean and easy to obtain.
Reliability
Data should be accurately reflected in reality.
Comparability
Data must be compatible with other sources for side-by-side comparisons.
Completeness
Data are often not available for some regions or groups.
Timeliness
Data should be released soon after collection.