Terminology Flashcards

1
Q

what is discrete data? (add example)

A

Discrete data can only take particular values. There may potentially be an infinite number of those values, but each is distinct and there’s no grey area in between. Discrete data can be numeric – like numbers of apples – but it can also be categorical – like red or blue, or male or female, or good or bad.

number of students, eyes of dice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is continuous data? (add example)

A

Continuous data are not restricted to defined separate values, but can occupy any value over a continuous range. Between any two continuous data values there may be an infinite number of others. Continuous data are always essentially numeric.

It sometimes makes sense to treat numeric data that is properly of one type as being of the other. For example, something like height is continuous, but often we don’t really care too much about tiny differences and instead group heights into a number of discrete bins.

ex. heights and weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

weighted goal programming?

A

Business problems usually have more than one overall measure of performance (objectives) and goal programming is to strike the right balance between multiple objectives.

weighted goal programming allows for weights to be assigned to each of your goals so that differences in their importance can be modelled.

the weights will be first assigned equally and optimise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

lexicographic goal programming?

A

Lexicographic goal programming is used when they are major differences in the importance of goals.
works by putting goals into an order of importance and then ensuring that the most important goals are met first.
you can set the priority of the setting variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the differences between weighted and lexicographic goal programming? when they are used?

A

the priority

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the term cloud computing?

A

cloud computing allows for the use of remote vertical (install more processors, memory and better/ faster hardware in a single machine/ server) or horizontally (spread workload across many machines) scaled servers for data storage and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the limitation of AHP

A
  1. AHP Scale
  2. Rank Reversal
  3. Pairwise Consistency
  4. Time Consumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are primary and foreign keys?

A

A primary key is set as a normal attribute that is always unique (student ID number) and generated by the DBMS. A generated key is often referred to as a Globally Unique Identifier(GUID)

A foreign (referring) key is a column/ field in a table that matches the primary key of another table. these keys are used to create relationships between tables and a key through which we can identify of refer another table and find out the details of that student.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

the procedure for K-means Clustering

A

K-means clustering is the most commnly used clustering alogorithm. K refers to the number of clusters you want to classify your data into.
The procedure for K-means clustering is:
1. Choose a value for K, the number of clusters.
2. Randomly choose K points as centroids.
3. Assign items to cluster with nearest centroid(mean)
4. Recalculate centroids as the average of all data points in a cluster.
5. Repeat steps 3 and 4 till no more reassignments or reachmax number of iterations(繰り返し).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is Spark? and the advantages of it over other big data systems

A

Spark is a new tool (2010) that can run directly on HDFS, inside MapReduce and alongside MapReduce on the same cluster.
Itbhas ability to perform in-memory computations and it allows data to be catched in memory, thus eliminating Hadoop’s hard disk overhead limitation for iterative tasks.

It is tested to be up to 100x faster than MapReduce when the data can fit in the memory and up to 10x faster when data resides on the hard disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly