Data Life Cycle & Environment Flashcards
Week 2.1 (39 cards)
5 stages of the data life cycle
- data collection
- data storage
- data processing
- data analysis
- data disposal
describe data collection stage
- gathering raw data from various sources
- importance of accurate and relevant data collection
- potential sources of data: primary and secondary
challenges of data collection
quality, accuracy, completeness, and ethical/regulatory considerations
describe data storage
- securely holding active data in physical or cloud storage
- types
- security measures: encryption, access controls
- compliance considerations for sensitive data: GDPR, DPA 2018
describe data processing
- transforming raw data into usable formats
- processing techniques
- ensuring data quality and standardization
describe data analysis
- applying statistical, machine learning, or visualisation techniques to gain insights
- types of analysis: descriptive, diagnostic, predictive, prescriptive
- highlight the value of insights for decision-making
describe data disposal stage
- securely deleting or archiving data that is no longer needed
- methods: deletion, anonymization, archiving
- archiving data for historical, legal, or regulatory reasons is moved to long-term storage
challenges of data disposal
ensuring data is permanently removed to prevent unauthorised access
describe a file-based system
- a system where data is stored in files on a computer and managed through specific application programs
- organised into separate files
- each file is independent and accessed by specific programs
- data is stored in logical formats such as sequential
- reading in sequential order
- simple to design and implement
use of file-based system
small, straightforward tasks
5 limitations of file-based systems
- separation and isolation of data
- duplication of data
- data dependence
- incompatible file formats
- fixed queries/proliferation of application programs
2 advantages of CSV
- human readability
- based compatability
2 disadvantages of CSV
- not hierarchical structure
- data types
2 advantages of JSOn
- hierarchical structure
- data types
2 disadvantages of JSON
- larger file size
- parsing overhead
define databases
- a shared collection of logically related data and its description, designed to meet the information needs of an organisation
- all data items are integrated into a larger repository
define database management system (DBMS)
- a software that manages databases, providing tools for storage, retrieval and data management
- DBMS interacts with the application programs and the database
describe database application programs
- application that interacts with the database by issuing requests to the DBMS - appropriate SQL queries
- users interact with the database through database application programs
- action = transaction
- to prevent interference between operations on the database, all transaction posses the ACID properties
atomicity
a transaction must be performed or not performed at all
consistency
a transaction must transform the database from one consistent state to another consistent state
isolation
transactions execute independently of one another
durability
the effects of a successful transaction are permanently recorded in the database
advantages of data application programs
- control of data redundancy
- data consistency
- sharing of data becomes easier across entire organisation
- data integrity
- improved security
- enforcement of standards
- data accessibility and responsiveness
- increased productivity- improved maintenance through data independence
- increased concurrency
- improved backup and recovery services
disadvantages of database application programs
- requires specialised knowledge
- dependency of centralised systems
- indexing overhead
- space overhead