Topological Data Analysis Flashcards

1
Q

What is the definition of a topological space and what are the conditions for a collection of subsets to be a topology on a set?

A

A topological space is a set endowed with a structure, called a topology, which allows defining continuous deformation of subspaces, and, more generally, all kinds of continuity. A topology on a set is a collection of subsets, called open sets, that satisfies the following three conditions: 1) The empty set and the whole set are in the collection. 2) Any union of sets in the collection is also in the collection. 3) Any finite intersection of sets in the collection is also in the collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between homotopy and homeomorphism and how do they relate to the concept of topological equivalence?

A

Homotopy and homeomorphism are both ways of defining a kind of equivalence between topological spaces. A homeomorphism is a bijective and continuous function between two topological spaces that has a continuous inverse function. Two spaces with a homeomorphism between them are called homeomorphic, and from a topological viewpoint they are essentially identical. Homotopy is a more general concept, where one function can be continuously deformed into another. The concept of homotopy equivalence is a coarser way to classify spaces, and a homotopy equivalence gives a relation of equivalence in homotopy theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a simplicial complex and what are the two ways of representing it (abstract and geometric)?

A

A simplicial complex is a set composed of points, line segments, triangles, and their n-dimensional counterparts (which are called simplices). Simplicial complexes should be closed under the operation of taking subsets, meaning that every face of a simplex is also in the complex. They can be represented in two ways: 1) Geometrically, where each simplex is a true geometric object (i.e., a set of points in space). 2) Abstractly, where each simplex is just a set of vertices and the geometric realization is not considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a filtration and how does it help us study the evolution of topological features over different scales?

A

A filtration is a nested sequence of simplicial complexes, each one contained in the next, which results in a final complex. Filtrations allow us to study the ‘birth’ and ‘death’ of topological features as we add simplices according to their filtration values. This is the basis of persistent homology, a method in topological data analysis that can quantify the multi-scale shape of a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is persistent homology and how can we visualize it using barcodes and persistence diagrams?

A

Persistent homology is a method for computing topological features of a space at different spatial resolutions. More persistent features are detected over a wide range of spatial scales and are deemed more likely to be features of the underlying space rather than artifacts of sampling, noise, or particular choice of parameters. Barcodes and persistence diagrams are two ways of visualizing these persistent topological features. A barcode is a collection of intervals on the real line, each corresponding to a topological feature, while a persistence diagram is a scatter plot on the plane, where the x-coordinate of a point corresponds to the birth time of a feature, and the y-coordinate to its death time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the most common distance metrics for comparing persistence diagrams and how are they defined?

A

The most common distance metrics for comparing persistence diagrams are the bottleneck distance and the Wasserstein distance. The bottleneck distance is the minimum value over all bijections between the diagrams such that the maximum matching distance between pairs is minimized. The Wasserstein distance is a more sensitive measure, defined as the minimum value over all bijections of the p-th root of the sum of the p-th powers of the distances between matched pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Vietoris Rips complex and how can we construct it from a metric space?

A

The Vietoris–Rips complex is a type of simplicial complex that can be defined from any metric space by forming a simplex for every finite set of points that has diameter less than ε. It is a way to encode the topological information of a metric space and is commonly used in topological data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a group and what are the properties of a group operation?

A

A group is a set equipped with an operation that combines any two of its elements to form a third element in such a way that four conditions called group axioms are satisfied: 1) Closure: For all a, b in the group, the result of the operation, or the ‘product’ a * b, is also in the group. 2) Associativity: For all a, b and c in the group, (a * b) * c equals a * (b * c). 3) Identity element: There is an element e in the group such that, for every element a in the group, the equations e * a and a * e return a. 4) Inverse element: For each a in the group, there exists an element b in the group, commonly denoted 1/a or a−1, such that a * b and b * a are both equal to the identity element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a field and how does it differ from a group?

A

A field is a set on which addition, subtraction, multiplication, and division are defined, and behave as the corresponding operations on rational and real numbers do. A field is thus a fundamental algebraic structure, which is widely used in algebra, number theory and many other areas of mathematics. The main difference between a field and a group is that a field has two operations (addition and multiplication) that must satisfy the field axioms, while a group has one operation that must satisfy the group axioms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an invariant and what are some examples of topological invariants?

A

An invariant is a property of a mathematical object (or a class of mathematical objects) which remains unchanged, under some transformation. Examples of topological invariants include the number of connected components, the number of holes in different dimensions (as captured by the Betti numbers), the Euler characteristic, and the fundamental group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Sybil attack and how does it exploit trust relationships in distributed networks?

A

A Sybil attack is an attack wherein a reputation system is subverted by forging identities in peer-to-peer networks. It is named after the subject of the book Sybil, a case study of a woman diagnosed with dissociative identity disorder. The name was suggested in or before 2002 by Brian Zill at Microsoft Research. The term has been widely used in the computer science community for over a decade.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the advantages and disadvantages of using a global vs. a local view for Sybil detection in online social networks?

A

A global view for Sybil detection has the advantage of being able to detect Sybil nodes that are far away from the honest nodes in the network. However, it requires knowledge of the entire network, which may not be feasible in large-scale or dynamic networks. A local view, on the other hand, only requires knowledge of a small portion of the network around a particular node. This makes it more scalable and adaptable to changes in the network. However, it may not be able to detect Sybil nodes that are far away from the node under consideration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is an ego network and how can we use it to obtain a local view of a network?

A

An ego network is a subgraph of a network that is centered on a single node, known as the ego. The ego network includes the ego, the nodes to which it is directly connected (called alters), and all the links among those nodes. Ego networks provide a local view of a network, which can be used to study the properties of individual nodes and their immediate neighborhoods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we use topological data analysis to detect Sybil attacks in ego networks?

A

Topological data analysis can be used to detect Sybil attacks in ego networks by identifying topological features that are characteristic of Sybil nodes. For example, Sybil nodes may form tightly-knit communities that are loosely connected to the rest of the network. These communities can be detected as topological ‘holes’ in the network. Additionally, the persistence of these holes across different scales can be used to distinguish between real communities and Sybil communities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the steps of the topological pipeline for Sybil detection and what are the inputs and outputs of each step?

A

The topological pipeline for Sybil detection consists of several steps: 1) Constructing a simplicial complex from the network data. 2) Computing the persistent homology of the simplicial complex. 3) Extracting features from the persistence diagram. 4) Classifying the nodes based on these features. The inputs and outputs of each step are: 1) Input: Network data. Output: Simplicial complex. 2) Input: Simplicial complex. Output: Persistence diagram. 3) Input: Persistence diagram. Output: Feature vector. 4) Input: Feature vector. Output: Classification of nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can we embed ego networks into a low-dimensional space based on the Wasserstein distance between their persistence diagrams?

A

Embedding ego networks into a low-dimensional space based on the Wasserstein distance between their persistence diagrams can be done using multidimensional scaling (MDS) or other dimensionality reduction techniques. The idea is to construct a distance matrix where the entry at the i-th row and j-th column is the Wasserstein distance between the persistence diagrams of the i-th and j-th ego networks. This distance matrix can then be input to MDS or another dimensionality reduction technique to obtain a low-dimensional embedding of the ego networks.

17
Q

How can we use the persistence codebooks to cluster ego networks based on their topological features?

A

Persistence codebooks can be used to cluster ego networks based on their topological features by treating each persistence diagram as a ‘document’

18
Q

What is the concept of topological data analysis (TDA)?

A

Topological data analysis (TDA) is a method of applying topology, a branch of mathematics, to datasets that exist in high-dimensional spaces. The goal of TDA is to describe the shape, or topology, of the data.

19
Q

What is the role of simplicial complexes in TDA?

A

Simplicial complexes are used in TDA to create a topological space that can be used to study the data. They are a generalization of graphs and allow for higher-dimensional faces.

20
Q

What is homology in the context of TDA?

A

Homology is a concept in algebraic topology that can be used to count the number of holes in a topological space. In the context of TDA, it is used to identify features in the data such as clusters, loops, and voids.

21
Q

How does persistent homology extend the concept of homology?

A

Persistent homology is an extension of homology that provides a tool for measuring the ‘persistence’ of a feature across different scales. It is used in TDA to identify features that persist across multiple scales of analysis.

22
Q

What are barcodes in the context of TDA?

A

Barcodes in TDA are a way of visualizing the persistence of topological features. Each bar in the barcode represents a feature, and the length of the bar represents the persistence of the feature.

23
Q

What is a Sybil attack in the context of online social networks?

A

A Sybil attack in online social networks is a type of security threat where an attacker creates multiple fake identities, or Sybils, to gain a disproportionately large influence in the network.

24
Q

How can TDA be used to detect Sybil attacks?

A

TDA can be used to detect Sybil attacks by identifying topological features that are characteristic of Sybil nodes. For example, Sybil nodes may form tightly-knit communities that are loosely connected to the rest of the network.

25
Q

What is the role of ego networks in detecting Sybil attacks?

A

Ego networks, which are subgraphs centered on a single node, can be used to obtain a local view of the network. This local view can then be analyzed using TDA to detect potential Sybil nodes.

26
Q

What is the Wasserstein distance in the context of TDA?

A

The Wasserstein distance is a measure of the distance between two persistence diagrams. It is used in TDA to compare the topological features of different datasets or different parts of the same dataset.

27
Q

How can the Wasserstein distance be used to embed ego networks into a low-dimensional space?

A

The Wasserstein distance can be used to create a distance matrix between the persistence diagrams of different ego networks. This distance matrix can then be input to a dimensionality reduction technique, such as multidimensional scaling, to obtain a low-dimensional embedding of the ego networks.

28
Q

What is the role of persistence codebooks in clustering ego networks?

A

Persistence codebooks are a way of quantifying the topological features of ego networks. They can be used to cluster ego networks based on their topological features, which can help in identifying Sybil nodes.

29
Q

What are the limitations of using TDA for Sybil detection?

A

The limitations of using TDA for Sybil detection include the computational complexity of computing persistent homology, the difficulty of choosing appropriate scale parameters, and the challenge of interpreting the results of the topological analysis.

30
Q

What is the concept of a filtration in TDA?

A

A filtration in TDA is a nested sequence of simplicial complexes, each one contained in the next, which results in a final complex. Filtrations allow us to study the ‘birth’ and ‘death’ of topological features as we add simplices according to their filtration values.

31
Q

What is the role of a field in the context of TDA?

A

A field is a set on which addition, subtraction, multiplication, and division are defined, and behave as the corresponding operations on rational and real numbers do. In the context of TDA, fields are used in the computation of homology groups.

32
Q

What is an invariant in the context of TDA?

A

An invariant is a property of a mathematical object (or a class of mathematical objects) which remains unchanged, under some transformation. In the context of TDA, topological invariants are used to identify and quantify features in the data.

33
Q

What is the concept of a group in the context of TDA?

A

A group is a set equipped with an operation that combines any two of its elements to form a third element in such a way that four conditions called group axioms are satisfied. In the context of TDA, groups are used in the computation of homology groups.

34
Q

What is the concept of a Vietoris Rips complex?

A

The Vietoris–Rips complex is a type of simplicial complex that can be defined from any metric space by forming a simplex for every finite set of points that has diameter less than ε. It is a way to encode the topological information of a metric space and is commonly used in TDA.

35
Q

What is the role of multidimensional scaling in TDA?

A

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. In the context of TDA, MDS can be used to embed high-dimensional data into a lower-dimensional space while preserving the pairwise distances between points.

36
Q

What is the concept of a metric space in the context of TDA?

A

A metric space is a set for which distances between all members of the set are defined. In the context of TDA, metric spaces are used as the input for the construction of simplicial complexes such as the Vietoris-Rips complex.

37
Q

What is the role of the bottleneck distance in TDA?

A

The bottleneck distance is a measure of the distance between two persistence diagrams. It is used in TDA to compare the topological features of different datasets or different parts of the same dataset.