G.1 General Flashcards

(61 cards)

1
Q
  1. Define the notion of “Datification”? In which way is it a revolution with respect to smart environments?
A

Data is everything and everything is data. Datafication is to render into data all the aspects of the world, even those that have never been quantified. It is revolutionary because for the first time is open to individuals and companies, making most of the crutial algorithms and evidence-based decision available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Define the characteristics of data centric sciences? What is the role of data for them? What are the two components that make them a new generation of experimental sciences?
A

Data management, greedy algorithms and programming models to be deployed in different target computer architectures.

Data collections as backbone for conducting experiments, drive hypothesis and lead to “valid” conclusions, models, simulations, understanding.

The two components: Big Data handling and architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Define the notion of Big Data. In your opinion how does this notion opens new challenges to data management?
A

“Big Data” refers to datasets that are so large, complex, and dynamic that traditional data processing systems have difficulty efficiently handling and analyzing them. That’s why we refer to the V’s. There are three aspects to consider when talking about big data and those are:
* Data collection – characteristics difficult to process on single machines or traditional DBs.
* New generations of tools, methods and technologies to collect, process and analyze massive data collections.
* Tools imposing the use of parallel processing and distributed storage.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give 5 properties that characterise Big Data? Explain in which way they are challenging for managing data?

A

Properties - Challenges:
1. Volume – quantity of collected and generated data
2. Veracity – quality and reliability of the data, that they really reflect the natural state.
3. Variety - diverse types of data and formats
4. Variability – changes in the structure and format of data
5. Value – that it can be turned into valuable insights.
6. Velocity – data production rate
These challenges require a combination of technological solutions, organizational strategies, and data management best practices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the case of your domain of expertise, how does Big Data open novel possibilities or problems/challenges?

A

Big Data opens exciting possibilities for advancing natural language processing and machine learning capabilities, it also brings forth challenges related to data quality, bias, computational resources, privacy, interpretability, and ethical considerations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the type of questions that data analytics can answer that cannot be answered by classic factual querying?

A

The type of questions that data analytics can answer that cannot be answered by classic factual querying are the ones in prediction and classification nature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Programming language used to manage and manipulate data stored in Relational Database Management Systems (RDBMS). Allowing operations such as creating and modifying tables, inserting, updating, and deleting data, as well as retrieving information through queries

A

SQL (Structured Query Language)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Refers to a set of database technologies that depart from the relational model. …. this Databases are designed to handle large volumes of unstructured or semi-structured data and provide flexibility in the data schema.

A

NoSQL (Not Only SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In terms of Big Data Volume, which is the basic unit of measure?

A

1B = 1 Byte

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In terms of Big Data Volume, what does a Gigabyte represent?

A

1x10´9 Bytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In terms of Big Data Volume, which is the highest capacity achieved lately?

A

1x10´24 Bytes = Yottabyte. Brontotype and Geaobyte en términos de 1000 de uno a otro

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Social Data, Network Science, Digital Humanities and Computational Science are…

A

Data centric science that emerged and developed new methodologies to explode data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Develop Methodologies weaving data management, greedy algorithms and programming models that must be turned to be deployed in different target computer architectures…

A

Data centric science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Its hypothesis is that any complex system can be represented as a net so operations can be applied and by ende modeled, dupledixs. Graphic theory is applied.

A

Network science

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Now days we have many algortihms and mathematical model but we need new ones. The numerical models are less costly than the new solutions now days (A.I) but when there is no solution we should go to the second option. Here we are talking about

A

Computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is about ………… we don’t work on a lab we work on computer support. That’s why we need to clearly define the architecture computing environment (cycles) to carry it out.

A

Experiment setting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which are the Big Data properties?

A

Volume, Velocity, Varity, Variability, Veracity, Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The Big Data property that refers to QUANTITY / SIZE

A

Volume

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The Big Data property that refers to PRODUCTION RATE

A

Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The Big Data property that refers to DIVERSITY OF DATA TYPE AND FORMAT (heterogenicity)

A

Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The Big Data property that refers to the QUALITY OF THE DATA

A

Verecity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The Big Data property that refers to CHANGES IN THE STRUCTURE AND FORMAT OF DATA

A

Variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The Big Data property that refers to the ABILITY TO TURN DATA INTO USEFUL INSIGHTS

A

Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are three aspects that we can say about Big Data

A
  1. Data collections with characteristics difficult to process on single machines or traditional databases.
  2. A new generation of tools, methods, and technologies to collect, process and analyze massive data collections.
  3. Tools imposing the use of parallel processing and distributed storage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Hadoop of industry what can we tell?
Processes CNC machine logs, Aggregate logs from multiple machines and industries, builds failure predictions and predetermined thresholds of confidence, SaaS-based to aggregate data
26
What does CNC mean?
Computer Numerical Control
27
What is SaaS-based?
Software as a Service. En lugar de descargar e instalar el software en dispositivos locales, los usuarios pueden acceder a las aplicaciones y a sus funciones a través de un navegador web.
28
What are the queries that NoSQL stores cannot respond? (Classic solutions / factual Queries)
Prediction and classification
29
What drives that NoSQL can not answer prediction and classification?
Experimental data processing approaches. Data science
30
It is the representation of complex environments by rich data opening up the possibility of applying all the scientific knowledge regarding how to infer knowledge from data
Data science
31
Is the main hypothesis of the methodology by which actionable insights can be inferred from data
We have enough data to come up with something / decide
32
What is the objective of data science
to produce beliefs informed by data. Being able to explain decition making
33
“Data is everything and everything is data” Pythian
Datafication
34
What is Datification
It is rendering into data, aspects of the world that have never been quantified.
35
What are the characteristics and importance of Datification?
The magnitude of analytical knowledge, most of the crucial algorithms are accessible. It uses rich data to make evidence-based decition OPEN TO “INDIVIDUALS” AND COMPANIES
36
Any process in which measurments and tests are used within a tipically controlled environment to support or refute a hypothesis.
Scientific experiment
37
Hypotheses
Unsupported propositions, usually based in insights or prior research, from which an investigation can be developed.
38
Experiments are generally designed to identify a casual relationship
True
39
Without a good experiment design, no causality can be inferred from your results
True
40
Experiments in data science: Initial questions
Why to support H0 What is being measured an how What data and collection Tools / libraries
41
Experiments in data science: design
Experimental workflow Formulate Ho Data gathering definition Possible problems identification Collecting data
42
When we want to explain what happened we use statistics and the output are descriptive analytics, positioning us in having information and hindsight. If we now pass to the analysis of descriptive analytics we get why it happened, diagnostic analytics, growing in information and generating an insight. If we go further thrying to understand what will happen we culd use the predictive analysis, where machine learning plays a role and we will get a Foresight that with the optimization level can descrive how can we make it happen -> prescriptice analytics.
Experiments objectives
43
I have raw dara, from which I will do Data exploration and preparation. Intending Quantitative profiling (descriptive analytics), Cleaning, normalization, attributes engineering and samples selection; all of this with in between uni, bi multivariable observation and inteactive graphs. Then we go to the insights/ foresight. Search, where we will have the sample fragmentation, generative dicriminative moel (training) and validation with in between error analysis and ablative analysis
Data science pipe line
44
Observations of phenomena often described as series of features/attributes
Data
45
Analytics objective (looks for insights or foresights) expressed as pipeline of operations guided by the conditions and characteristics of the data.
Query
46
Model or prediction with associated assessment indexes, not definitive accepted with an associated error margin, accepted by comparison.
Result
47
How can we define the data processing spectrum raw to curated spectrum?
Querying objective, Data exploration, hindsight, insight, fersight, relational & aggregation querying and information revel
48
When talking about Querying approaches…
I have two categories of data: Databases information retrieval and data science from which we can define the query type, execution model, results properties and data content.
49
Math and statistics + Computer science / IT
Machine Learning
50
Math and statistics + Domains /Business knowledge
Traditional research
51
Domains /Business knowledge + Computer science / IT
Software development
52
Math and statistics + Domains /Business knowledge + Computer science / IT
Data Science
53
Depends on data science/engineers “expertise.”
Artisanal design
54
Using many different libraries, stack, tools difficult to integrate
In house programming
55
Espacio dedicado a la práctica y experimentación en el campo de la ciencia de datos (Kaggle, Google colab, azure notebooks)
Data Science Lab
56
Combinación de herramientas, tecnologías y lenguajes de programación utilizados en el campo de la ciencia de datos.
Data Science stack
57
FaaS, DaaS, PaaS, STaaS, IaaS
BD services platforms
58
When talking about data science pipeline, which are the challenges faced?
Execution on distributed arquitectures and tuning for improving data management across nodes.
59
Data science toolboxes enactment environments:
Big Data Platforms & Stacks WIDE environments Machine Learning Services
60
Essential tool designed to maximize programmer productivity
IED Integrated Development Environment
61
Basic pieces of IDE
Editor , compiler, debugger