1. Database Management - Segment 1 [ Week 1 & 2- Data and Data Sources] Flashcards

1
Q

On which Data analysis is relies on ?

A

Data analysis relies on Data & Data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In which form data is collected by Various Data Sources ?

A

Data is in raw format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the Different types of data .

A
  1. Scientific data
  2. Multimedia data
  3. Transactional data or structure data
  4. Relational data
  5. Web data
  6. Flat files data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is scientific data ?

A

Data that comes from various sensors and scientific equipments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is multimedia data ?

A

The data that comes from cameras, satellite images, videos, and CCTV footage are referred to as multimedia data. They typically contain audio and video content over a period of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is transactional data or structured data ?

A

Predefined or prestructured data taken at different time stands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is relational data ?

A

Data comes in row and column format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is web data ?

A

Collected by Web scrapping ,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Flat files data?

A

csv or excel files , stored in a local system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is big data ?

A

Non-relational and non-structure data are generally referred to as big data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the Types of Storage Based on the Connectivity ?

A
  1. Device-Attached Storage (DAS)
  2. Network-Attached Storage (NAS)
  3. Storage Area Network (SAN)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is DAS ?

A

Device attached storage
the file system and disk storage are directly connected.are available in the same physical location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is NAS ?

A

In NAS, the file system and disk storage are available remotely

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is SAN ?

A

In SAN, only the disk storage is remote. The file system accesses the storage over the network.The file system is in the system itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the basic difference between SAN & NAS ?

A

The basic difference between SAN & NAS is ,

in NAS both file system and storage are on remote site,
But in SAN storage is on remote side and the file system is on he system itself

NAS is a single storage device while a SAN is a tightly coupled network of multiple devices .
NAS devices deliver shared storage as network mounted volumes and use protocols like NFS and SMB/CIFS,
while SAN-connected disks appear to the user as local drives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the Types of Storage Based on the Location of Nodes ?

A

It is a two types -
* 1. Warehouse storage/ On-premise storage:
* 2. Cloud storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Storage Based on the Location of Nodes ?

A
  1. Warehouse storage/ On-premise storage -Nodes are present in the same physical location. This will ensure that accessing data is quick and network-delays not impact applications.
  2. Cloud storage - Data is stored on cloud nodes.Cloud storage is always less expensive compared to physical storage.The real-time data can be ingested and stored directly into cloud storage, scaling both in and out in response to data volume.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Hadoop Model ?

A

Hadoop is an open-source framework for processing large datasets. Hadoop uses a unique file system called Hadoop Distributed File System (HDFS). Internally, this file system can be connected to any type of storage model- DAS,NAS SAN
HDFS provides an abstraction. As a result, the storage appears as a locally attached disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is HDFS ?

A

HDFS - Hadoop Distributed File System
It is a file system used in Hadoop model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the good solution to handle big data.

A

Hadoop
It provides scaling of storage as the data continues to grow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the basic requirements of big data?

A
  1. Type of storage used
  2. Handle large amount of data
  3. It should continue to scale as the data continuous to grow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is processing of data ?

A

Processing of data means , transforming raw data into required usable format

23
Q

Why data is important ?

A

Importance of data is defined in the terms of how data is going to be used
1. **Improve people’s lives **- Smart watches , Tracking applications
2. Making decisions - organisations take decisions
3. Strategies - data is used for making srategies
4. Anticipated Strategies and decision outcomes - based on the outcomes of strategies
5. Monitor - data is used for real time monitoring
6. Access resource - reusability of data , like search engines & apps

24
Q

What is information and knowledge ?

A

Peace of data gives information, and information gives knowledge about something

25
Q

What is wisdom ?

A

Wisdom is the ability to make the judgement Or decision on knowledge Acquired by information

26
Q

What is the example of data ?

A
27
Q

What are information ?

A

Streamlining Pattern of data gives information

28
Q

What is knowledge

A

Well organised body of information

29
Q

Describe what is data information knowledge and wisdom ?

A

Data - 100
Information - 100 Miles
Knowledge - 100 miles is quite so far
Wisdom - difficult to walk 100 miles but with vehicle commuting would be easier

30
Q

What are the Data Collection methods ?

A
  1. Oral History
  2. Online marketing, social media marketing
  3. Interviews
  4. Questionnaires
  5. Focus Group
  6. Observations
  7. Documents and records
  8. Logs Stored on Servers
31
Q

What is data processing ?

A

Transformation of raw data,this process includes
1. Filtering of data
2. Segregation of data
3. Normalisation of data
4. Cleaning of data

32
Q

data What is bucket ?

A

In terms of data science bucket is an Data Warehouse which hold all the processed Data .

33
Q

What is Data Curation ?

A

Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information.

34
Q

How knowledge is discoverd from Processed Data

A

Knowledge is discoverd from Processed Data by indentifying the patterns in streamlined Data .

35
Q

How discovery of knowledge is represented ?

A

Discovered Knowledge is represented in the form of reports , Tables , characterization rules .

36
Q

What is Database ?

A

A database is an organized collection of interrelated data.

37
Q

What is an Data Science ?

A

Data science is the study of data to extract meaningful insights for business. It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data.

38
Q

What is Information System

A

An information system (IS) is a formal, sociotechnical, organizational system designed to collect, process, store, and distribute information

39
Q

What is curated Data ?

A

Transformed & Processed Data is called Curated Data

40
Q

What is Data Warehouse ?

A

Data Warehouse is an storage or repositotry for Structured & filtered data that has been already been processed for a specific purpose .

41
Q

What is Data Lake ?

A

A vast pool of raw data , the purpose for which is not yet defined, superset of Data Warehouse

42
Q

What is Data Mart ?

A

A subset of Data Warehouse which contain repositories of summarised data , collectyed for analysis on a specific section or unit .We have n numbers of data marts in data warehouse

43
Q

Compare Data Lake , Warehouse , Data Mart ?

A
44
Q

What is KDD ?

A

Knowledge Discovery in Database
KDD is the process of discovering knowledge from a collection of data .Knowledge discovery in a database is a powerful and systematic technique to derive value from raw data.

45
Q

When KDD is formalised ?

A

KDD is formalised in 1989

46
Q

What are the steps in KDD ?

A
  1. Data Selection / Segmentaton
  2. Data Pre-processing
  3. Data Transformation
  4. Data Mining
  5. Interpretation of Discovered Data
47
Q

Importance of Knowledge Discovery for Decision Support ?

A

In present time we have very large amounts of data. For effective and proper decision making , correct information is required from large data sets. For this KDD is introduced, KDD is a high level technique used to present an analyse data for decision makers. It is used to develop an optimal representation of the structure of the data.

48
Q

What is Data Selection/ Segmentation ?

A

It is the first stage in knowledge Discovery. In this stage selection of data a decided based on the criteria or intention

49
Q

What is Data Pre-Processing/ Cleaning ?

A

Data pre-processing and cleaning, depends on the type of data available. The noisy and inconsistent data is removed ,
it is necessary to retain only the require data and detain the redundant data.

50
Q

What is Data Ingest ?

A

Data ingestion is the process of importing large, assorted data files from multiple sources into a single, cloud-based storage medium—a data warehouse, data mart or database—where it can be accessed and analyzed.

51
Q

What is Data Transformation ?

A

Data transformation is where data gets transformed in order to be suitable for knowledge discovery.
Columns are removed or new columns are added based on old columns

52
Q

What is Data Mining ?

A

The process of extraction of patterns from data by using various algorithms and methods.
Data analysts, data engineers, data scientists, etc., use various methods and algorithms to extract patterns

53
Q

What is Interpretation & Evaluation ?

A

At the interpretation and evaluation stage, the extracted patterns are converted into knowledge. This knowledge, in turn, is used to support the decision-making by data scientists, data analysts, or data engineers

54
Q

What is Data Collection ?

A

The systematic process of obtaining observations or measurements is known as data collection