U1T3.2 - Applications of DT Flashcards Preview

CCEA A2 Digital Technology > U1T3.2 - Applications of DT > Flashcards

Flashcards in U1T3.2 - Applications of DT Deck (45)
Loading flashcards...

What is data mining?

Process of analysis large data sets (big data) with view to discovering patterns + trends that go beyond simple analysis. Combines AI, stats + database systems in analysis of groups of (un)structured data sets which are difficult to analyse using traditional methods. Extracts info from data set + transforms into appropriate format for use. (Summary of input data for analysis) Stops at process of pattern extraction.


What is big data?

Data sets so complex that traditional databases + other processing applications can't capture, curate, manage + process them in acceptable time frame.


What does curate mean?

Process of organising data from range of data sources.


What are the 3 big data challenges?

Volume (amount of data to be processed), variety (num of types of data to be analysed) + velocity (speed of data processing)


How does DT allow us to collect data for analysis?

Online forms, mobile phone data transmissions, email data, stock market data, market research, PDAs, smartphones, tablets + netbooks etc.


How can data sources be categorised and what are the differences?

Internal + external. Internal = customer details, product details, sales data. External = business partners, data suppliers, internet, govt + market research companies.


What are the most commonly used data sources?

Social media, machine data (generated from devices like RFID chip readers, GPS results) + transactional data (data from companies like eBay, Amazon, Tesco)


What are the key requirements of big data storage?

Handle large amounts of data + keep scaling up to handle growth of data sets. High speed input/output operations to support delivery of data analytics as they're carried out. Big data practitioners run hyperscale computing environments.


What are hyperscale computing environments?

Consists of many servers with DAS, each unit has PCIe flash storage devices to support data storage + high speed access to data sets.


What is DAS?

Direct Attached Storage.


How can smaller organisations support the storage of big data?

Use of NAS devices, can scale outward so can be difficult to manage as span out in hierarchial manner (many devices, many folders within folders)


What is NAS?

Network Attached Storage. File access shared storage, easily scaled out to meet increased capacity/computing requirements for big data analysis.


What are object-based storage systems?

Alt to NAS devices + their issues. Each file storage given unique identifier + index to support high speed access to particular data file/set.


What do big data processing techniques do?

Analyse data sets at terabyte/petabyte scale. Some methods include cluster analysis, classification, anomaly detection, association rule mining + sequential pattern mining, regression + summarisation.


What is cluster analysis?

Groups of data records identified.


What is classification?

Data mining process used to determine appropriate structure to new data. e.g. way email application classified some emails as spam.


What is anomaly detection?

Unusual records identified. Some anomalies merit investigation as points of interest to organisation or may be representative of errors.


What is association rule mining + sequential pattern mining?

Dependencies between data items identified. e.g. use of data sets by supermarket to determine which patterns of products bought together.


What is regression?

Relationships between data variables investigated to help see how change in independent variable impacts on dependent data variable.


What is summarisation?

Data summarised in visual format.


What are some of the key objectives of collecting and using big data by the financial services sector?

Ensure they comply with regulations (using fuzzy matching to check customer names + aliases against customer blacklist, lower cost), improve risk analysis (algorithms run of transaction data to identify fraudulent activity/perform risk analysis, support trading decisions), understand customer behaviour/transaction patterns + improve services (identify what leads to dissatisfaction)


How does the health sector use big data?

Predict epidemics, cure disease, improve life quality + avoid preventable deaths. Smartphones measure steps, diet + sleep patterns which in future could be shared with GP for diagnosis help. Supports clinical trials to select best subjects. Phone location can track pop movement and predict spread of Ebola virus.


How does the retail sector use big data?

Predict trends + forecast demand, price optimisation (spending habits + demand) + identify potential customers (data collected through transactional records + loyalty programs allows demand to be forecast on basis of geographical areas)


What is cloud computing?

Use of internet by large computing companies to provide services normally provided by LAN. Use server farms to host services they provide for other organisations who can access these services from any computer w/ internet connection. Users don't know where data stored. Virtual servers form foundation of cloud servers. Capitalises on principle of virtual clusters.


What are server farms?

Central computer centre consisting of large num of linked file servers. Each location could have many servers, comp storage devices + other components used to support services provided by Cloud Service Provider.


What is virtualisation?

Allows virtual servers to run on physical server platform. Separates physical infrastructures to create dedicated resources. Possible to run multiple OS's + applications on same server at same time by making servers. Manipulates hardware used to provide cloud computing as service to client users. Virtual version of physical device/resource where users can use resource as if real single resource.


What is a cloud instance?

Location of physical memory on cloud server which has been allocated to particular client. Acts as virtual server for client + has own allocation of processing power, storage + other components. Each server has multiple clients. Can be used by client for processing of cloud based task/application. End user doesn't need to consider how many servers/resources applied to application. Location is immaterial + dynamic nature means resources reassigned as needed without downtime. If end user wants access to app/service, create cloud instance for time they use app.


What does each server in cloud computing having multiple clients do?

Possible to allocate additional capacity to clients when usage spikes + resource demands increase. Each of elements in instance is dynamic + can be changed


What is ahosted solution?

Like cloud instance but hardware + software made available to client is reserved for servicing of their needs + noone else. They pay for all resources whether used or not. Where usage exceeds capacity, additional investment in resources by client is made.


What are virtual clusters?

Formed when virtual machine, established to meet demands of cloud instance, configures available resources on network to meet client demands.