G.1 General Flashcards

1
Q
  1. Define the notion of “Datification”? In which way is it a revolution with respect to smart environments?
A

Data is everything and everything is data. Datafication is to render into data all the aspects of the world, even those that have never been quantified. It is revolutionary because for the first time is open to individuals and companies, making most of the crutial algorithms and evidence-based decision available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Define the characteristics of data centric sciences? What is the role of data for them? What are the two components that make them a new generation of experimental sciences?
A

Data management, greedy algorithms and programming models to be deployed in different target computer architectures.

Data collections as backbone for conducting experiments, drive hypothesis and lead to “valid” conclusions, models, simulations, understanding.

The two components: Big Data handling and architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Define the notion of Big Data. In your opinion how does this notion opens new challenges to data management?
A

“Big Data” refers to datasets that are so large, complex, and dynamic that traditional data processing systems have difficulty efficiently handling and analyzing them. That’s why we refer to the V’s. There are three aspects to consider when talking about big data and those are:
* Data collection – characteristics difficult to process on single machines or traditional DBs.
* New generations of tools, methods and technologies to collect, process and analyze massive data collections.
* Tools imposing the use of parallel processing and distributed storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give 5 properties that characterise Big Data? Explain in which way they are challenging for managing data?

A

Properties - Challenges:
1. Volume – quantity of collected and generated data
2. Veracity – quality and reliability of the data, that they really reflect the natural state.
3. Variety - diverse types of data and formats
4. Variability – changes in the structure and format of data
5. Value – that it can be turned into valuable insights.
6. Velocity – data production rate
These challenges require a combination of technological solutions, organizational strategies, and data management best practices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the case of your domain of expertise, how does Big Data open novel possibilities or problems/challenges?

A

Big Data opens exciting possibilities for advancing natural language processing and machine learning capabilities, it also brings forth challenges related to data quality, bias, computational resources, privacy, interpretability, and ethical considerations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the type of questions that data analytics can answer that cannot be answered by classic factual querying?

A

The type of questions that data analytics can answer that cannot be answered by classic factual querying are the ones in prediction and classification nature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Programming language used to manage and manipulate data stored in Relational Database Management Systems (RDBMS). Allowing operations such as creating and modifying tables, inserting, updating, and deleting data, as well as retrieving information through queries

A

SQL (Structured Query Language)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Refers to a set of database technologies that depart from the relational model. …. this Databases are designed to handle large volumes of unstructured or semi-structured data and provide flexibility in the data schema.

A

NoSQL (Not Only SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In terms of Big Data Volume, which is the basic unit of measure?

A

1B = 1 Byte

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In terms of Big Data Volume, what does a Gigabyte represent?

A

1x10´9 Bytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In terms of Big Data Volume, which is the highest capacity achieved lately?

A

1x10´24 Bytes = Yottabyte. Brontotype and Geaobyte en términos de 1000 de uno a otro

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Social Data, Network Science, Digital Humanities and Computational Science are…

A

Data centric science that emerged and developed new methodologies to explode data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Develop Methodologies weaving data management, greedy algorithms and programming models that must be turned to be deployed in different target computer architectures…

A

Data centric science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Its hypothesis is that any complex system can be represented as a net so operations can be applied and by ende modeled, dupledixs. Graphic theory is applied.

A

Network science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Now days we have many algortihms and mathematical model but we need new ones. The numerical models are less costly than the new solutions now days (A.I) but when there is no solution we should go to the second option. Here we are talking about

A

Computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is about ………… we don’t work on a lab we work on computer support. That’s why we need to clearly define the architecture computing environment (cycles) to carry it out.

A

Experiment setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which are the Big Data properties?

A

Volume, Velocity, Varity, Variability, Veracity, Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The Big Data property that refers to QUANTITY / SIZE

A

Volume

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The Big Data property that refers to PRODUCTION RATE

A

Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The Big Data property that refers to DIVERSITY OF DATA TYPE AND FORMAT (heterogenicity)

A

Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The Big Data property that refers to the QUALITY OF THE DATA

A

Verecity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The Big Data property that refers to CHANGES IN THE STRUCTURE AND FORMAT OF DATA

A

Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The Big Data property that refers to the ABILITY TO TURN DATA INTO USEFUL INSIGHTS

A

Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are three aspects that we can say about Big Data

A
  1. Data collections with characteristics difficult to process on single machines or traditional databases.
  2. A new generation of tools, methods, and technologies to collect, process and analyze massive data collections.
  3. Tools imposing the use of parallel processing and distributed storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Hadoop of industry what can we tell?

A

Processes CNC machine logs, Aggregate logs from multiple machines and industries, builds failure predictions and predetermined thresholds of confidence, SaaS-based to aggregate data

26
Q

What does CNC mean?

A

Computer Numerical Control

27
Q

What is SaaS-based?

A

Software as a Service. En lugar de descargar e instalar el software en dispositivos locales, los usuarios pueden acceder a las aplicaciones y a sus funciones a través de un navegador web.

28
Q

What are the queries that NoSQL stores cannot respond? (Classic solutions / factual Queries)

A

Prediction and classification

29
Q

What drives that NoSQL can not answer prediction and classification?

A

Experimental data processing approaches. Data science

30
Q

It is the representation of complex environments by rich data opening up the possibility of applying all the scientific knowledge regarding how to infer knowledge from data

A

Data science

31
Q

Is the main hypothesis of the methodology by which actionable insights can be inferred from data

A

We have enough data to come up with something / decide

32
Q

What is the objective of data science

A

to produce beliefs informed by data. Being able to explain decition making

33
Q

“Data is everything and everything is data” Pythian

A

Datafication

34
Q

What is Datification

A

It is rendering into data, aspects of the world that have never been quantified.

35
Q

What are the characteristics and importance of Datification?

A

The magnitude of analytical knowledge, most of the crucial algorithms are accessible. It uses rich data to make evidence-based decition OPEN TO “INDIVIDUALS” AND COMPANIES

36
Q

Any process in which measurments and tests are used within a tipically controlled environment to support or refute a hypothesis.

A

Scientific experiment

37
Q

Hypotheses

A

Unsupported propositions, usually based in insights or prior research, from which an investigation can be developed.

38
Q

Experiments are generally designed to identify a casual relationship

A

True

39
Q

Without a good experiment design, no causality can be inferred from your results

A

True

40
Q

Experiments in data science: Initial questions

A

Why to support H0
What is being measured an how
What data and collection
Tools / libraries

41
Q

Experiments in data science: design

A

Experimental workflow
Formulate Ho
Data gathering definition
Possible problems identification
Collecting data

42
Q

When we want to explain what happened we use statistics and the output are descriptive analytics, positioning us in having information and hindsight. If we now pass to the analysis of descriptive analytics we get why it happened, diagnostic analytics, growing in information and generating an insight. If we go further thrying to understand what will happen we culd use the predictive analysis, where machine learning plays a role and we will get a Foresight that with the optimization level can descrive how can we make it happen -> prescriptice analytics.

A

Experiments objectives

43
Q

I have raw dara, from which I will do Data exploration and preparation. Intending Quantitative profiling (descriptive analytics), Cleaning, normalization, attributes engineering and samples selection; all of this with in between uni, bi multivariable observation and inteactive graphs. Then we go to the insights/ foresight. Search, where we will have the sample fragmentation, generative dicriminative moel (training) and validation with in between error analysis and ablative analysis

A

Data science pipe line

44
Q

Observations of phenomena often described as series of features/attributes

A

Data

45
Q

Analytics objective (looks for insights or foresights) expressed as pipeline of operations guided by the conditions and characteristics of the data.

A

Query

46
Q

Model or prediction with associated assessment indexes, not definitive accepted with an associated error margin, accepted by comparison.

A

Result

47
Q

How can we define the data processing spectrum raw to curated spectrum?

A

Querying objective, Data exploration, hindsight, insight, fersight, relational & aggregation querying and information revel

48
Q

When talking about Querying approaches…

A

I have two categories of data: Databases information retrieval and data science from which we can define the query type, execution model, results properties and data content.

49
Q

Math and statistics + Computer science / IT

A

Machine Learning

50
Q

Math and statistics + Domains /Business knowledge

A

Traditional research

51
Q

Domains /Business knowledge + Computer science / IT

A

Software development

52
Q

Math and statistics + Domains /Business knowledge + Computer science / IT

A

Data Science

53
Q

Depends on data science/engineers “expertise.”

A

Artisanal design

54
Q

Using many different libraries, stack, tools difficult to integrate

A

In house programming

55
Q

Espacio dedicado a la práctica y experimentación en el campo de la ciencia de datos (Kaggle, Google colab, azure notebooks)

A

Data Science Lab

56
Q

Combinación de herramientas, tecnologías y lenguajes de programación utilizados en el campo de la ciencia de datos.

A

Data Science stack

57
Q

FaaS, DaaS, PaaS, STaaS, IaaS

A

BD services platforms

58
Q

When talking about data science pipeline, which are the challenges faced?

A

Execution on distributed arquitectures and tuning for improving data management across nodes.

59
Q

Data science toolboxes enactment environments:

A

Big Data Platforms & Stacks
WIDE environments
Machine Learning Services

60
Q

Essential tool designed to maximize programmer productivity

A

IED Integrated Development Environment

61
Q

Basic pieces of IDE

A

Editor , compiler, debugger