Artificial Intelligence and natural language Processing(CS404) Flashcards

1
Q

What is subject­verb agreement and how is it enforced on a parse tree? How
does subject­verb agreement contribute to the process of parsing the
following sentences?
(i) I am a man
(ii) I are a man *

A

Agreement in Number + Person between the
subject and verb of a sentence.
The parse tree identifies the subject and verb of a sentence as separate nodes and can identify if features like number (singular or plural) and person (first, second, or third) are consistent
(i) {1S}∩{1S} = {1s}
(ii) {1S}∩{3P} = Ø

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a synset – such as those found in WordNet? Describe how WordNet
makes use of any two relations to structure its knowledge­base. Make use
of examples to illustrate your answer.

A

A synset is a distinct meaning for each word.
hyponymy relation refers to “is a kind of”
a polar bear is a kind of bear.
hypernym(general term) ie bear
hyponym(specific term) ie polar bear
instance hyponymy relation refers to “is an instance of”
“Einstein is an instance of a physicist”
this is different from regular hyponymy relation because it is not a subcategory it is an example.
Meronymy relation refers to “is a member of”
the (meronym) is a member of (the holonym).
a tree is a member of a forest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe any two similarity metrics used to compare WordNet synsets.

A

Path which is based on the number of connections between the two synsets.It is calculated as 1 over the path length + 1.

Wu & Palmer (WuP) Method which is based on the least common subsummer(LCS) which is the clostest common ancestor of the two synsets. It is calculated by 2* LCS depth over synset1 depth plus synset2 depth.

lin method is Information content is based on the probability of encountering an instance of a synset in a specific corpus. It is calculated by 2* LCS information content over synset1 information content plus synset2 information content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Write down Google’s PageRank formula. What do each of the terms in this
formula signify?

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the relevance of a random walk to Google’s PageRank function?
Make use of an appropriate diagram to illustrate your answer.

A

The relevance of a random walk to PageRank is that it represents the likelihood that a person randomly clicking on links will arrive at a particular page. The pages that are more ‘reachable’ in this random walk (i.e., more frequently encountered) are deemed to be more important, and thus receive a higher PageRank score.
(add diagram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consider the following sentence
“John is a pig.” Compare a literal interpretation of this sentence to a metaphoric
interpretation. What are the main differences between these two
interpretations?

A

A literal interpretation of “John is a pig” means John is actually the animal, a pig. A metaphoric interpretation suggests John has characteristics associated with pigs, like messiness or greed. The main difference is that the literal interpretation takes the words at face value, while the metaphoric interpretation uses “pig” symbolically to describe John’s traits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is meant by the “features” of a word, and what is the relation
between features and “agreement”? Briefly describe three different
categories of agreement and how they may be enforced on a parse tree.
Illustrate your answer using appropriate examples in English, Irish,
French or other natural language.

A

“features” of a word refer to its attributes like part of speech, tense, number, gender, and semantic properties that define its role and meaning in language.
The features and “agreement” involves ensuring that words in a sentence match in certain features, such as number, gender, or case.

Agreement in Number + Person between the
subject and verb of a sentence.
The parse tree identifies the subject and verb of a sentence as separate nodes and can identify if features like number (singular or plural) and person (first, second, or third) are consistent
I am tired
{1S}∩{1S} = {1s}
He is tired
{3S}∩{3s} = {3s}
*He am tired
{3S}∩{1S} = {} or Ø
*I is tired
{1S}∩{3S} = Ø

Quantifier Noun agreement
Agreement in number between Noun and quantifier
– This man
*This men
– These men
*These man

Determiner Noun agreement (in French etc.)
Gender agreement between Determiner and Noun
Le stylo
*La stylo
– La femme
*Le femme

Cardinal-number Noun agreement (in Irish)
Agreement in number the ordinal term and Noun
Beirt {daoine}
* dha daoine
– Dha sciain
*beirt {sciain}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Discuss the use of WordNet to represent semantic information. Describe
any two relationships used by WordNet to organize its taxonomy.

A

WordNet is a widely-used lexical database that represents semantic information of English words. It organizes words into sets of synonyms (synsets), each representing a unique concept. These synsets are linked by various types of semantic relationships, creating a rich network that reflects the structure of human language.

hyponymy relation refers to “is a kind of”
a polar bear is a kind of bear.
hypernym(general term) ie bear
hyponym(specific term) ie polar bear
instance hyponymy relation refers to “is an instance of”
“Einstein is an instance of a physicist”
this is different from regular hyponymy relation because it is not a subcategory it is an example.
Meronymy relation refers to “is a member of”
the (meronym) is a member of (the holonym).
a tree is a member of a forest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe in detail any one similarity metric for comparing two terms,
which is based on the WordNet lexical database.

A

Path which is based on the number of connections between the two synsets.It is calculated as 1 over the path length + 1.

Wu & Palmer (WuP) Method which is based on the least common subsummer(LCS) which is the clostest common ancestor of the two synsets. It is calculated by 2* LCS depth over synset1 depth plus synset2 depth.

lin method is Information content is based on the probability of encountering an instance of a synset in a specific corpus. It is calculated by 2* LCS information content over synset1 information content plus synset2 information content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is meant by metaphoric language and how does metaphoric
language differ from more standard (literal) language? Illustrate your
answer with suitable examples.

A

Metaphoric language uses words symbolically, unlike literal language which uses words in their standard, direct sense. For example, saying “He has a heart of stone” metaphorically suggests he is unemotional or unsympathetic, while literally it would mean his heart is made of stone. In contrast, a literal statement like “He is unkind” uses words in their usual, direct meaning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Write down the PageRank function, explaining each of the terms.
Describe the distinction between inlinks and outlink, using an appropriate
diagram to illustrate your answer.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

In PageRank, inlinks are links from other pages to a given page, contributing to its rank. Outlinks are links from that page to others, distributing its PageRank among them. Inlinks boost a page’s rank, while outlinks share it.

(add a diagram)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is heuristic search? Describe a simple heuristic that might be useful in
solving the Farmer-Fox-Goose problem (or the 8-tile puzzle). Briefly
describe what problem is presented a local minimum and a plateau

A

To solve problems like the 8-tile puzzle or the Farmer-Fox-Goose problem, a search mechanism typically involves defining a state space, where each state represents a possible configuration of the problem. The search algorithm then systematically explores this space, moving from one state to another via legal moves or actions, until it reaches a goal state.

For example, in the 8-tile puzzle, states represent different tile arrangements, and actions are tile moves. In the Farmer-Fox-Goose problem, states are different distributions of the characters across the river, and actions are the farmer’s choices of who to ferry across the river.

Common search algorithms used include depth-first search, breadth-first search, or more advanced strategies like A* search, which can be guided by heuristics to find the optimal solution efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is meant by a zero-sum game? Describe how the MiniMax algorithm
can be used to solve adversary-search problems, such as chess or the OX0
game.

A

A zero-sum game is a situation in which one participant’s gain or loss is exactly balanced by the losses or gains of the other participants. The MiniMax algorithm is used in zero-sum games like chess or tic-tac-toe to find the optimal move: it evaluates the possible moves and their outcomes to maximize the player’s advantage while minimizing the opponent’s, assuming both players play optimally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A

D value of 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe each of the steps in the A* search algorithm.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is WordNet and how does it represent information? Make reference to the relationships that are used by WordNet in your answer.

A

WordNet is a large lexical database of English, linking words into semantic relations including synonyms, antonyms, hypernyms, hyponyms, meronyms, and holonyms. It groups English words into sets of synonyms called synsets, providing short definitions and usage examples, and records the various semantic relations between these synonym sets. The most common relations are:

Synonyms: Words that have the same or similar meanings are grouped into synsets.
Hypernyms: Terms that are more generic or abstract than the given word.
Hyponyms: More specific terms derived from a generic term.
Meronyms: Words that denote a part of something.
Holonyms: Words that denote a whole of which the given word is a part.
Antonyms: Words that have opposite meanings.
WordNet’s structure makes it a useful tool for computational linguistics and natural language processing, providing a rich network of meaningfully related words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Describe any two similarity metrics used to compare WordNet synsets.
Why would one attempt to quantify the similarity (or difference) between
any two concepts?

A

Two common similarity metrics used to compare WordNet synsets are:

Path Similarity: Measures the similarity based on the shortest path that connects the synsets in the hypernym/hyponym taxonomy. A shorter path implies higher similarity.

Wu-Palmer Similarity (WUP): Measures the similarity by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of their least common subsumer (most specific ancestor node).

Quantifying the similarity (or difference) between two concepts is important for various natural language processing tasks, such as word sense disambiguation, semantic similarity assessment, information retrieval, and machine translation. It helps in understanding the semantic and contextual relationships between words, thereby improving the performance of these tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define the terms precision and recall, explaining their relevance to
information retrieval.

A

In information retrieval, precision and recall are two critical metrics:

Precision: Measures the fraction of retrieved documents that are relevant to the query. It answers the question, “Of all the documents the system retrieved, how many were actually relevant?”

Recall: Measures the fraction of relevant documents that were successfully retrieved. It answers the question, “Of all the relevant documents that exist, how many did the system successfully retrieve?”

These metrics are crucial in evaluating the effectiveness of information retrieval systems, balancing the need for retrieving all relevant documents (recall) while minimizing the retrieval of irrelevant ones (precision).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Write down the PageRank function, explaining each of the terms. Describe
the distinction between inlinks and outlinks, using an appropriate diagram to
illustrate your answer.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

Inlinks: Links from other pages pointing to pi. They contribute to the PageRank of pi as other pages ‘vote’ or ‘endorse’ it.

Outlinks: Links from page pj
pointing to other pages. The PageRank value of pj
is shared among these outlinks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What problem is presented by encountering a novel word-pair when
calculating the joint probability of a word sequence? How can these
problems be overcome?

A

Encountering a novel word-pair when calculating the joint probability of a word sequence presents the problem of data sparsity, where the word-pair has not been seen in the training data, leading to a probability estimate of zero. This makes the model unable to handle unseen word combinations effectively.

To overcome these problems, techniques like smoothing and back-off models are used. Smoothing assigns a small probability to unseen word-pairs by redistributing some probability mass from more frequent pairs. Back-off models, on the other hand, fall back to calculating the probability of smaller units (like single words) when the data for larger units (like word pairs) is insufficient or unavailable. These methods help in dealing with the zero-probability issue in sparse data scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe how a search mechanism can be adapted to solve problems like the
8-tile puzzle or the Farmer-Fox-Goose problem.

A

To solve problems like the 8-tile puzzle or the Farmer-Fox-Goose problem, a search mechanism typically involves defining a state space, where each state represents a possible configuration of the problem. The search algorithm then systematically explores this space, moving from one state to another via legal moves or actions, until it reaches a goal state.

For example, in the 8-tile puzzle, states represent different tile arrangements, and actions are tile moves. In the Farmer-Fox-Goose problem, states are different distributions of the characters across the river, and actions are the farmer’s choices of who to ferry across the river.

Common search algorithms used include depth-first search, breadth-first search, or more advanced strategies like A* search, which can be guided by heuristics to find the optimal solution efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Describe how the MiniMax algorithm can be used to solve adversary-search
problems, such as chess or the OXO game. What are the main limitations of
the MiniMax algorithm?

A

The MiniMax algorithm solves adversarial search problems like chess or OXO by simulating all possible moves and their outcomes for both players. It assumes that both players play optimally and alternates between maximizing the player’s advantage and minimizing the opponent’s advantage, thereby choosing the best possible move.

Main limitations of the MiniMax algorithm include:

Computational Complexity: The number of possible moves in games like chess is enormous, leading to a vast search tree that is computationally expensive to explore fully.
Requirement for Effective Heuristics: For deeper search trees, effective heuristics are needed to evaluate game positions and make practical decisions without exploring the entire tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q
A

D with a value of 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Describe how α−β pruning improves the performance of adversary search
algorithms. You may use an appropriate example to illustrate your answer.

A

Alpha-Beta (α-β) pruning improves the performance of adversarial search algorithms by reducing the number of nodes evaluated in the search tree. It works by eliminating branches that don’t influence the final decision.

In this process, two values, alpha (α) and beta (β), are maintained. Alpha is the best value that the maximizer currently guarantees at that level or above, and beta is the best value that the minimizer currently guarantees. If at any point alpha becomes greater than or equal to beta (α ≥ β), the branch is pruned, as it won’t affect the final outcome.

For example, in a game like chess, if the maximizer has found a move with a value of +3, and on another branch, the minimizer can force a move with a value of -2, the maximizer will never choose this branch, so further exploration of it is unnecessary. This way, α-β pruning skips evaluating many nodes, significantly speeding up the search process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Discuss how the WordNet represents information. Make reference to the
relationships that are used by WordNet in your answer.

A

WordNet represents information by organizing English words into sets of synonyms called synsets, each expressing a distinct concept. Synsets are interlinked through various semantic relationships, including:

Synonyms: Words with similar meanings grouped in the same synset.
Hypernyms: Words representing broader concepts, linking more specific words (hyponyms) to general ones.
Hyponyms: More specific words linked to broader ones (hypernyms).
Meronyms: Words denoting a part of something (the part-to-whole relationship).
Holonyms: Words representing the whole to which a part (meronym) belongs.
Antonyms: Words that have opposite meanings.
These relationships form a rich network, facilitating an understanding of word meanings and connections in the English language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Describe any two similarity metrics used to compare WordNet synsets.

A

Two similarity metrics commonly used to compare WordNet synsets are:

Path Similarity: This metric measures the similarity based on the shortest path in the hypernym/hyponym hierarchy between two synsets. A shorter path indicates greater similarity. It is computed as the inverse of the path length between the synsets.

Wu-Palmer Similarity (WUP): This metric evaluates the similarity by considering the depths of the two synsets in the WordNet taxonomy, as well as the depth of their most specific common ancestor (the least common subsumer). The similarity is calculated based on the depth of the synsets and their least common subsumer relative to the overall taxonomy depth.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is meant by a “random walk” of the internet? How does the
random walk help identify important documents?

A

A “random walk” of the internet refers to a process where a hypothetical “surfer” randomly follows links from one web page to another. This concept is a key part of algorithms like Google’s PageRank. In a random walk, the surfer randomly chooses one of the outbound links on each page to navigate to the next page, continuing this process indefinitely.

The random walk helps identify important documents through the principle that more important or relevant pages are more likely to be visited frequently over many random walks. The idea is that pages with many inlinks (links from other pages) or links from other important pages are more likely to be reached in a random walk. Thus, the frequency of visiting a page in random walks reflects its importance or authority.

PageRank, for example, uses this concept to rank web pages. Pages frequently visited in random walks have higher PageRank scores, implying they are more important or authoritative. This approach helps search engines prioritize relevant and authoritative pages in their search results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Write down the PageRank function, explaining each of its terms.
What is the difference between inlinks and outlinks? Illustrate your
answer with an appropriate example.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

Inlinks: Links from other pages pointing to pi. They contribute to the PageRank of pi as other pages ‘vote’ or ‘endorse’ it.

Outlinks: Links from page pj
pointing to other pages. The PageRank value of pj
is shared among these outlinks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Briefly explain what is meant by the “generate and test” approach to
problem solving. Briefly describe how an heuristic search might
differ from blind search when solving problems such as the 8-tile
puzzle or the 15-tile puzzle. What are the main problems with
developing an heuristic search for these games?

A

The “generate and test” approach to problem-solving involves generating potential solutions and then testing them to see if they meet the criteria for a solution. This method is straightforward but can be inefficient if there are many possible solutions.

In the context of solving puzzles like the 8-tile or 15-tile puzzle, heuristic search differs from blind search in the following ways:

Heuristic Search: Uses heuristics (smart guesses or rules of thumb) to guide the search towards the goal more efficiently. For example, a common heuristic for tile puzzles is the number of tiles out of place or the total distance of tiles from their goal positions. It prioritizes moves that seem to lead more directly to the solution.

Blind Search: Also known as uninformed search, it explores the solution space without any information about the likelihood of a path leading to the goal. It methodically checks all possible moves, which can be highly inefficient for complex puzzles.

The main problems with developing a heuristic search for these puzzles are:

Finding Effective Heuristics: Identifying heuristics that consistently lead to the solution quickly can be challenging. Effective heuristics should estimate the “closeness” to the goal accurately.

Balance Between Efficiency and Accuracy: A heuristic must strike a balance between guiding the search efficiently and not misleading it away from the goal.

Computational Overhead: Complex heuristics can add computational overhead, potentially negating the benefits of a more directed search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Describe the problem that is presented by a local maximum or a
plateau of the heuristic function. You may use a diagram to illustrate
your answer.

A

Local Maximum: This is a point where the heuristic function reaches a value higher than its immediate neighbors but not necessarily the highest possible value (the global maximum). The problem arises when a search algorithm, aiming to maximize the heuristic, reaches a local maximum and erroneously concludes that the best solution has been found, even though a better solution might exist elsewhere in the search space.

Plateau: A plateau is an area of the search space where the heuristic function has the same value across multiple states. This flatness in the landscape makes it difficult for the search algorithm to determine which direction to move in, as all neighboring states appear equally promising (or unpromising). The algorithm can get “stuck” on a plateau, unable to find a path that leads to a better solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Describe in detail how you would construct an heuristic for one of the
following board games: chess, checkers (draughts), connect-4.

A

Piece Value: Assign values to each piece based on its importance (e.g., Pawn = 1, Knight/Bishop = 3, Rook = 5, Queen = 9).

Board Evaluation: Calculate the total value of pieces for each player. The board’s score could be the difference in total values, favoring the player with the higher total.

Positional Play: Include bonuses or penalties for specific strategic elements, such as pawn structure, control of the center, king safety, and piece mobility.

Endgame Adjustments: Modify evaluations for the endgame phase, where the king’s activity and pawn structure may become more crucial.

This heuristic helps a chess AI to evaluate board positions, guiding move decisions to maximize its advantage and minimize the opponent’s. The simplicity or complexity of the heuristic can vary based on the desired balance between accuracy and computational efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Describe the A* search algorithm in detail.

A

The A* search algorithm is a pathfinding and graph traversal method. It finds the shortest path from a start node to a goal node in a weighted graph. A* uses a best-first search approach, prioritizing paths that seem promising. It combines the actual cost from the start node to a given node (G-cost) with a heuristic estimate of the cost from that node to the goal (H-cost). The sum of these costs (F-cost = G-cost + H-cost) guides the search, with lower F-cost paths explored first. A* terminates when the goal node is reached, ensuring an optimal path if the heuristic is admissible (never overestimates the true cost).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How does WordNet represent lexico-semantic information? Briefly
describe any two metrics for estimating the similarity between terms,
based on WordNet. Refer to specific WordNet relations in your
answer.

A

WordNet represents lexico-semantic information by organizing words into sets of cognitive synonyms (synsets), each expressing a unique concept. Synsets are interconnected through semantic relations like hypernyms (more general terms), hyponyms (more specific terms), meronyms (part-whole relations), and antonyms (opposites).

Two metrics for estimating similarity between terms in WordNet are:

Path Similarity: Measures similarity based on the shortest path in the hypernym/hyponym hierarchy between synsets. A shorter path indicates greater similarity.

Wu-Palmer Similarity (WUP): Measures similarity by the depths of the synsets and their least common subsumer (most specific common ancestor) in the hypernym/hyponym hierarchy. Higher similarity is indicated by a closer common ancestor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Discuss in general, how the use of heuristic functions improve
on the efficiency of the blind search process.

A

Heuristic functions improve the efficiency of blind search processes by providing guidance on which paths to follow, thus reducing the search space. While blind search methods like Depth-First or Breadth-First Search explore paths indiscriminately, heuristic functions estimate the “closeness” to the goal, allowing the search to prioritize more promising paths. This targeted approach often leads to faster discovery of the goal state without exhaustively searching all possibilities, significantly improving computational efficiency and reducing time complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Describe the difference between the ordinary search process
used to solve problems and adversary search. Why are these
differences relevant and under what circumstances?

A

Ordinary search processes, used in solving standard problems, involve finding a path or solution in a defined space, where the outcome is determined solely by the searcher’s actions. Adversary search, on the other hand, is used in competitive settings like games, where the outcome depends on the actions of multiple agents or players, often with opposing objectives.

These differences are relevant in contexts where an “opponent” or adversarial element affects the decision-making process. In ordinary search, the focus is on overcoming environmental challenges, while in adversary search, the focus is on outmaneuvering an opponent who actively counters the searcher’s moves. Adversary search is crucial in games like chess or checkers, where each player must anticipate and react to the other’s strategies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Describe in detail one specific heuristic function and its use in
some relevant problem domain. You may choose any problem,
from either ordinary search or adversary search problems.
Discuss any good and bad points on the heuristic function you
have described. (Topics may be selected from, but are not
restricted to, any one of the following: Farmer-Fox-Goose
problem, 8 or 16 Tile puzzle, Blast, OXO, Chess, Draughts etc.)

A

Heuristic Function: Manhattan Distance in the context of the 8-tile puzzle.

Description:

The Manhattan Distance heuristic calculates the total distance each tile is from its goal position, using only vertical and horizontal moves. For each tile, the distance is the sum of the horizontal and vertical distances to its target position.
In the 8-tile puzzle, where the goal is to move tiles within a grid to reach a specific end configuration, this heuristic estimates the minimum number of moves required to get each tile to its correct position.
Good Points:

Admissibility: The Manhattan Distance never overestimates the cost to reach the goal, making it an admissible heuristic for A* search.
Efficiency: It provides a good balance between accuracy and computational simplicity, guiding the search effectively without significant processing overhead.
Bad Points:

Not Always Perfectly Accurate: It assumes tiles can move independently without considering the interactions between them, which can sometimes lead to underestimating the true number of moves required.
Does Not Account for Sequential Constraints: In scenarios where tiles block each other, the Manhattan Distance might not accurately reflect the complexity of rearranging the tiles.
In summary, while the Manhattan Distance heuristic is efficient and generally effective for solving the 8-tile puzzle, its simplifications can sometimes lead to suboptimal performance in more complex scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Briefly describe how MiniMax search improves the efficiency of
adversary search.

A

MiniMax search improves the efficiency of adversary search by systematically exploring the game tree to identify the optimal move, assuming both players act optimally. It does this by alternating between maximizing the player’s advantage and minimizing the opponent’s advantage at each level of the tree. This approach narrows down the choices to the most strategically sound moves, thus reducing the need to evaluate every possible move in games like chess or tic-tac-toe. It allows for a focused and effective search strategy in competitive, turn-based games.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How is the Nearest Neighbours approach used to solve
problems? How does k-Nearest Neighbours approach improve
upon the 1-Nearest Neighbours approach? You may make use
of an example to illustrate your answer.

A

The Nearest Neighbours approach is used to solve problems, particularly in classification and regression, by finding the closest data points (neighbours) to a query point and making predictions based on these neighbours. In the 1-Nearest Neighbour approach, the prediction is based solely on the single closest data point.

The k-Nearest Neighbours (k-NN) approach improves upon this by considering the ‘k’ closest data points, where ‘k’ is a predefined number greater than 1. This allows for a more robust prediction as it reduces the impact of noise and outliers present in the dataset. For example, in a classification task, k-NN classifies a new point based on the majority class among its ‘k’ nearest neighbours, providing a more representative and reliable decision than basing the classification on a single neighbour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Discuss the use categories in WordNet. What relationships are
used to impose a structure on the WordNet hierarchy? Make
use of specific examples in your answer.

A

In WordNet, categories are represented by sets of synonyms called synsets, each expressing a distinct concept. The relationships used to impose structure on the WordNet hierarchy include:

Hypernyms: These indicate a general category or class to which a term belongs. For example, the word “canine” is a hypernym of “dog.”

Hyponyms: These signify more specific instances of a general category. For example, “dog” is a hyponym of “canine.”

Meronyms: These denote parts of a whole. For instance, “wheel” is a meronym of “car.”

Holonyms: These represent the whole of which a term is a part. For example, “car” is a holonym of “wheel.”

These relationships create a hierarchical structure in WordNet, allowing navigation from more general to more specific terms and vice versa, and understanding the part-whole relationships among terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How is the nearest neighbours approach used to solve problems?
How does K nearest neighbours improve upon the 1 nearest
neighbours approach?

A

The nearest neighbours approach solves problems, especially in classification and regression, by identifying the closest data point(s) to a query point and making predictions based on these neighbouring points.

The K nearest neighbours (K-NN) approach improves upon the 1 nearest neighbour method by considering ‘k’ closest data points instead of just one. This makes the prediction more robust, as it reduces the influence of noise and outliers. For example, in classification, K-NN classifies a query point based on the majority vote of its ‘k’ nearest neighbours, providing a more reliable decision than relying on a single neighbour.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Describe in detail how the nearest neighbours approach can be
used to estimate the sale value of a house. You may assume you
have access to a large database of recent house sale records –
detailing all relevant details of each sales transaction.

A

The nearest neighbours approach can estimate the sale value of a house by analyzing a database of recent house sale records. Here’s how it can be used:

Feature Selection: Choose relevant features from the database that influence house prices, such as location, size, number of bedrooms, age of the house, and amenities.

Find Nearest Neighbours: For a house to be valued, find ‘k’ houses in the database that are most similar to it based on the chosen features. Similarity is typically measured using a distance metric like Euclidean distance.

Average Sale Values: Calculate the average sale value of these ‘k’ nearest houses. This average is used as the estimated sale value of the house in question.

This method assumes that houses similar in key characteristics will have similar sale values, making it a practical tool for real estate valuation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Describe in detail how you would go about the task of building up a
large index of documents, such as that used by internet search
engines. Describe how anchor text can be used to index pages
under terms that do not even appear on the listed page. Make use
of appropriate examples to illustrate your answer.

A

Crawling: Use web crawlers or spiders to systematically browse the web and gather documents or web pages.

Parsing and Processing: Extract relevant information from these documents, including text, links, and metadata. This step involves tasks like removing HTML tags and dealing with different formats and languages.

Indexing: Create an index to store and retrieve information efficiently. This usually involves creating a database of words and their occurrences in documents. Data structures like inverted indices are commonly used, mapping terms to their locations in documents.

Use of Anchor Text: Anchor text, the clickable text in a hyperlink, is crucial for indexing. Search engines use anchor text to index pages under terms that might not appear on the page itself. For example, if many sites link to a certain webpage with the anchor text “best coffee recipes,” the search engine might index that page under “coffee recipes” even if those exact words don’t appear on the page.

Ranking Algorithm: Develop algorithms to rank the indexed pages based on relevance and authority. This often involves complex algorithms like PageRank, which considers factors such as the number and quality of inbound links to a page.

Updating: Regularly update the index to include new pages, remove obsolete ones, and refresh existing ones to reflect content changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Briefly characterize the difference between ordinary heuristic
search and adversary search. In what way does the MiniMax
search algorithm cater for this difference in adversary search
problems?

A

Ordinary heuristic search is used in problems where the outcome depends solely on the choices of a single agent in a static environment. In contrast, adversary search is used in competitive scenarios where multiple agents (like players in a game) make decisions that affect each other, often with opposing goals.

The MiniMax search algorithm caters to adversary search by considering the strategy of both players. It alternates between maximizing the advantage for the player whose turn it is (the maximizer) and minimizing it for the opponent (the minimizer). This approach allows for predicting the opponent’s best possible moves and choosing the best counter-strategy, making it well-suited for two-player games like chess or tic-tac-toe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q
A

D value of 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Briefly outline how α-β pruning further improves the efficiency of
the MiniMax algorithm. Use a diagram to illustrate your answer.

A

Alpha-Beta (α-β) pruning improves the efficiency of the MiniMax algorithm by reducing the number of nodes it needs to evaluate in the game tree. It does this by pruning (cutting off) branches that cannot possibly influence the final decision.

The process involves two parameters, alpha (α) and beta (β), which represent the minimum score that the maximizing player is assured of and the maximum score that the minimizing player is assured of, respectively. When the algorithm determines that a certain branch cannot improve the outcome for the current player (when α ≥ β), it stops exploring that branch, as it will not affect the final decision. This pruning significantly reduces the number of nodes visited, leading to faster decision-making in games.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Briefly describe how N-grams are used in the parsing process.
How do they distinguish between valid and invalid syntactic forms?

A

N-grams are used in the parsing process as a statistical tool to predict the likelihood of a sequence of words or characters. An N-gram model uses the probability of a word or character occurring based on the occurrence of its preceding N-1 words or characters.

In parsing, N-grams help distinguish between valid and invalid syntactic forms by analyzing the probability of word sequences. Sequences that occur more frequently in the training corpus are considered more likely to be syntactically valid. Conversely, sequences with low probability, which rarely or never appear in the training data, are more likely to be syntactically invalid.

For example, in a bigram (2-gram) model, the model checks the probability of each word occurring after its immediate predecessor. If a particular word sequence like “the cat” appears often in the training data, but a sequence like “cat the” rarely appears, the model will favor “the cat” as a valid syntactic form over “cat the.”

This statistical approach does not guarantee perfect syntactic validity but provides a practical method for approximating language patterns based on observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Write down the PageRank formula, explaining each of its terms.
Describe the relevance of a random walk to this formula

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Describe how you would use a heuristic function to identify a solution
to the 16-tile puzzle. Describe your heuristic function in detail.

A

For the 16-tile puzzle, a heuristic function can be used to estimate the cost (or steps) remaining to reach the goal state from the current state. A popular and effective heuristic is the “Manhattan Distance” heuristic:

Manhattan Distance: This heuristic calculates the total distance each tile is from its goal position, considering only vertical and horizontal moves. For each tile, the distance is the sum of the horizontal and vertical distances to its correct location.

Implementation: For each tile in the current state, compute how many rows and columns it is away from its target position in the goal state. Sum these distances for all tiles (except the blank or zero tile).

Advantages: The Manhattan Distance heuristic is admissible (never overestimates the true cost to reach the goal) and consistent, making it suitable for use in A* or similar search algorithms. It’s effective because it gives a clear idea of the minimum number of moves needed to get each tile into its correct position.

This heuristic helps guide the search algorithm by favoring moves that bring tiles closer to their correct positions, thereby leading to a more efficient search for the solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q
A

D value of 6.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Briefly describe α-β pruning. How does it improve upon MiniMax
search? Make use of an appropriate example to illustrate your
answer.

A

Alpha-Beta (α-β) pruning improves MiniMax search by eliminating branches in the game tree that don’t affect the final decision. It uses two parameters, alpha (α) and beta (β), to track the minimum score the maximizer is assured and the maximum score the minimizer can achieve. If a move is found to be worse than a previously evaluated move, further exploration of that branch is unnecessary and is pruned. This reduces the number of nodes the algorithm needs to evaluate, significantly enhancing efficiency without compromising the accuracy of the result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is meant by a random walk of a graph? In what way does
this model the behaviour of some “random surfer” and why is
this useful?

A

A random walk of a graph is a process where steps are taken from one vertex to another randomly, following the edges of the graph. This models the behavior of a “random surfer” on the internet, who clicks on links and moves from one webpage to another without a specific goal.

This concept is useful, particularly in algorithms like Google’s PageRank, as it simulates how an average user might navigate the web. It helps in determining the importance or ranking of webpages based on the probability of landing on them through random navigation. Pages frequently visited in random walks are deemed more important, thus are ranked higher in search engine results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Write down the PageRank formula, explaining each of its terms.
Outline the significance of the damping factor (d) to the random
walk.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Briefly outline the generate-and-test approach to problem
solving. How does heuristic search compare with blind search,
when finding solutions to problems?

A

The generate-and-test approach involves creating potential solutions and testing them until a satisfactory one is found. Heuristic search uses rule-of-thumb strategies to guide the search for solutions efficiently, while blind search explores all possibilities systematically without guidance, potentially taking more time and resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How would you adopt the generate-and-test approach to finding
solution(s) to the 16-tile puzzle? Describe your heuristic function
in detail. Make use of an appropriate example to illustrate your
answer.

A

To solve the 16-tile puzzle using the generate-and-test approach:

Generate possible moves.
Use the Manhattan distance heuristic to estimate how close each state is to the goal.
Select the state with the lowest heuristic value.
Repeat steps 1-3 until the goal state is reached.
For example, if tile 1 is moved up, we calculate the Manhattan distance for each move and choose the one with the lowest value to make the next move.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q
A

D value of -2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Using an example or otherwise, illustrate the difference
between the surface structure and deep structure of a
sentence. What is the relevance of phrase structured grammars
to deep structure?

A

Surface structure pertains to the grammatical form of a sentence, while deep structure relates to its underlying meaning. Phrase-structured grammars are important for understanding deep structure by dissecting the sentence’s syntactic structure to uncover its core semantic content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Describe the operation of an heuristic search process. In what way
does the heuristic function improve the search process.

A

A heuristic search process is a problem-solving technique that uses heuristic functions to guide the search for solutions efficiently. It works by evaluating each potential solution based on the heuristic function’s estimate of its desirability or proximity to the goal. The heuristic function improves the search process by providing a “rule of thumb” to prioritize the most promising solutions, reducing the search space and making it more efficient. This helps focus the search on paths that are more likely to lead to the desired solution, saving time and resources compared to blind search methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Describe in detail the specific heuristic you would use to help solve
the 15/16 tile puzzle. Make reference to the specific features of the
board-state that you would look for and how you might identify them.
(You do not need to write code, but code should be very easy to write
from your description of the heuristic). Justify your heuristic.

A

Manhattan Distance: For each tile, calculate the distance from its current position to its goal position in terms of the number of moves (up, down, left, right). The sum of these distances for all tiles provides a heuristic estimate of the minimum number of moves needed to solve the puzzle.

Linear Conflict: This adds to the Manhattan Distance. If two tiles are in their goal row or column but are in the wrong order, they are in a linear conflict. This conflict means additional moves are required. For each pair of conflicting tiles, add 2 to the heuristic (since at least two moves are needed to resolve the conflict).

Identify these features as follows:

For Manhattan Distance, calculate the absolute difference in the row and column indices of a tile’s current and goal positions.
For Linear Conflict, check each row and column. If two tiles are in their goal row or column but out of order, count this as a conflict.
Justification:

The Manhattan Distance is a lower bound of the moves needed, as it assumes no other tiles are in the way.
The Linear Conflict accounts for the fact that tiles can block each other, making it a more accurate and often more efficient heuristic.
This heuristic is admissible (never overestimates the cost to reach the goal) and takes into account both the individual tile positions and their relative positions, leading to efficient puzzle solving.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Describe MiniMax differs from ordinary heuristic guided search.

A

MiniMax is a decision-making algorithm commonly used in two-player, zero-sum games like chess and tic-tac-toe. It differs from ordinary heuristic-guided search by considering both players’ strategies. MiniMax evaluates possible moves by assuming that one player aims to maximize their outcome while the other aims to minimize it. It recursively explores game states by alternating between maximizing and minimizing players, selecting the move that leads to the best possible outcome for the maximizing player. In contrast, ordinary heuristic-guided search uses heuristics to evaluate positions without considering the opponent’s counter-strategy and doesn’t necessarily guarantee optimal decisions in adversarial settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q
A

C value of 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What is a term-document index? Describe the structure of a term
document index such as might be used for the WWW. In particular,
discuss some of the information that you might like this index to
contain.

A

A term-document index is a data structure used in information retrieval systems to efficiently store and retrieve information about the relationships between terms (words or phrases) and documents (web pages, articles, etc.). In the context of the World Wide Web (WWW), a term-document index would contain information like:

Term Frequencies: For each term, it would store the number of times that term appears in each document, helping to assess the relevance of documents to a search query.

Inverse Document Frequencies (IDF): It would include IDF scores for each term, which represent how unique or common a term is across all documents. This helps in ranking the importance of terms in a document collection.

Document Metadata: Information about each document, such as its title, URL, publication date, and author, to provide context and facilitate document retrieval.

Term Positions: Optionally, it might store the positions of terms within documents, which can be useful for proximity-based searches.

Links: Information about hyperlinks between documents, allowing for the analysis of link structures and the development of algorithms like PageRank.

Query History: It may also track user search queries and click-through patterns to improve search relevance and user experience.

A well-structured term-document index enhances the efficiency and accuracy of information retrieval on the web, enabling search engines to quickly identify and present relevant documents to users based on their queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

What is meant by the joint probability of a word sequence? Briefly
state the problem that novel words present when calculating the joint
probability of a word sequence? Briefly outline one technique that can
overcome this problem with novel words.

A

The joint probability of a word sequence refers to the likelihood of a particular sequence of words occurring together in a given context or document. The problem with novel words is that they are words not previously encountered in the training data, making it challenging to estimate their probabilities accurately using traditional language models. One technique to address this issue is to use smoothing methods like Laplace smoothing or add-one smoothing, which assign a small, non-zero probability to unseen words. This helps prevent zero probabilities for novel words and improves the overall robustness of language models when calculating joint probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Detail each of the steps of the A* search algorithm. Briefly state
what is the objective behind the A* algorithm?

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What is adversary search and how does it differ from (ordinary)
heuristic search?

A

Adversary search is a problem-solving technique commonly used in two-player, zero-sum games like chess or tic-tac-toe. It involves considering the strategies of both players, one trying to maximize their outcome while the other aims to minimize it. Adversary search algorithms, such as MiniMax, explore potential game states by alternating between maximizing and minimizing players to find the optimal move for one player while taking into account the counter-strategy of the opponent.

In contrast, ordinary heuristic search focuses on finding optimal solutions in domains without a competing adversary. It uses heuristic functions to guide the search by estimating the desirability of each state, aiming to reach a goal state efficiently. Unlike adversary search, ordinary heuristic search does not consider an opponent’s strategy and is typically used in single-player problem-solving scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q
A

C with a value of 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Briefly state what is the Turing test? What is the significance of
the Chinese Room argument in relation to artificial intelligence?
In what way if any, do CAPTCHA relate to the Turing test?

A

The Turing test is a test of a machine’s ability to exhibit human-like intelligence in natural language conversation, where a human evaluator interacts with both a machine and a human without knowing which is which.
The Chinese Room argument, proposed by John Searle, challenges the notion of true understanding in artificial intelligence systems by depicting a scenario where a person inside a room follows instructions to manipulate symbols in a foreign language without actually comprehending it.
CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) relates to the Turing test as it serves as a practical application of the Turing test’s principles in online security by distinguishing between humans and automated programs through challenges that are easy for humans but difficult for machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

What is meant by the joint probability of a word sequence? Why
does approximating the joint probability of a sequence often
produce better results than calculating the actual probability?
Make use of an appropriate example to illustrate your answer.

A

Joint probability of a word sequence refers to the likelihood of a specific sequence of words occurring together in a given context or document.

Approximating the joint probability often produces better results because calculating the actual probability requires a vast amount of training data and can lead to extremely low probabilities for longer sequences due to the multiplication of individual word probabilities.

For example, consider the sentence “The quick brown fox jumps over the lazy dog.” Calculating the actual joint probability of this sequence by multiplying individual word probabilities can lead to exceedingly small values, making it hard to differentiate between sequences. Approximating allows for more manageable and informative probabilities, improving the effectiveness of language models and natural language processing tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What is tf-idf? Describe how this term can influence the results
returned by a search engine. Illustrate your answer with a brief
example.

A

TF-IDF stands for Term Frequency-Inverse Document Frequency.
It is a numerical statistic used in information retrieval and text mining to evaluate the importance of a term within a document relative to a collection of documents.
TF measures how frequently a term appears in a document, while IDF assesses how unique or common a term is across the entire document collection.
Higher TF-IDF scores indicate that a term is both frequent in a document and relatively rare across the collection, making it more important.
For example, in a search engine, if a user queries “machine learning,” the search engine calculates the TF-IDF scores for documents containing these terms. A document discussing machine learning extensively (high TF) but not mentioned frequently in other documents (high IDF) will have a high TF-IDF score, indicating it’s highly relevant and should be prioritized in search results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

Outline an algorithm to interpret the analogy between two given
concepts. Make reference to any specific representation
requirements of your algorithm. Illustrate your answer by use of
an appropriate example.

A

Algorithm to Interpret Analogy:
Input: The algorithm takes two pairs of concepts, A (A1, A2) and B (B1, B2), where A1:A2 :: B1:?, and it aims to find the missing concept B2.

Representation Requirements: Concepts must be represented in a way that allows for comparison. This could be through vector embeddings or symbolic representations.

Step 1: Find Concept Vectors: If using vector embeddings, represent A1, A2, and B1 as vectors in a high-dimensional space.

Step 2: Calculate Analogy Vector: Calculate the analogy vector as (A2 - A1) + B1.

Step 3: Find Nearest Neighbor: Search for the concept in the dataset that is closest to the analogy vector using cosine similarity or another similarity measure. This concept represents B2.

Example:

A1: Man
A2: Woman
B1: King
We want to find B2, which corresponds to “Queen.”

Represent “Man” and “Woman” as vectors in a word embedding space.
Calculate the analogy vector: (Woman - Man) + King = Queen.
Search for the nearest concept to the “Queen” vector, which returns “Queen” as B2.
This algorithm interprets analogies by leveraging vector representations to identify the missing concept in the analogy, demonstrating how word embeddings can capture semantic relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What is the Turing test? Discuss the relevance of the Chinese
Room argument in relation to the Turing test? What is the
relevance, if any, of CAPTCHA to the Turing test?

A

Turing Test:

Proposed by Alan Turing in 1950.
A test of machine intelligence.
Involves a human evaluator interacting with both a machine and a human, trying to determine which is which based on their responses.
If the evaluator cannot reliably distinguish the machine from the human, the machine is said to have passed the Turing test, demonstrating human-like intelligence.
Relevance of the Chinese Room Argument:

The Chinese Room argument, proposed by John Searle, challenges the idea that passing the Turing test equates to true understanding or consciousness.
In the Chinese Room scenario, someone inside a room follows instructions to manipulate Chinese symbols, producing coherent responses without understanding Chinese.
This argument suggests that purely symbolic or syntactic manipulation, as in traditional AI, may not imply true comprehension or consciousness.
It raises questions about whether the Turing test truly measures machine intelligence or merely surface-level behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Briefly state the benefits offered by the A* algorithm, when
compared to simpler search processes? List and describe each
step of the A* search algorithm.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

How does adversary search differ from standard (ordinary)
heuristic search? What impact does this difference on the
search mechanism used to solve adversary search problems?

A

Adversary Search vs. Standard Heuristic Search:

Adversary Search:

Used in two-player, zero-sum games.
Considers strategies of both players.
One player maximizes while the other minimizes.
Examples: Chess, Tic-Tac-Toe.
Standard Heuristic Search:

Used in single-player problem-solving.
Focuses on finding optimal solutions.
Uses heuristics to estimate desirability.
Examples: Route planning, puzzle solving.
Impact on Search Mechanism:

Adversary Search:

Alternates between maximizing and minimizing players.
Typically employs the MiniMax algorithm.
Requires evaluating possible outcomes of both players’ moves.
Aims to find the best move for the maximizing player while considering the opponent’s counter-strategy.
Standard Heuristic Search:

Focuses on finding the optimal solution.
Utilizes heuristics to guide the search.
Typically uses algorithms like A* or IDA*.
Doesn’t involve an adversary; its goal is to reach a desired state efficiently without considering an opponent’s strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q
A

D with a value of 3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

What is a random walk of the World Wide Web (WWW)? What
is the relevance of the random walk algorithm to the ranking of
web documents?

A

Random Walk of the WWW:

A random walk of the World Wide Web involves navigating the web by following hyperlinks from one web page to another in a random or probabilistic manner.
It simulates a user browsing the web by making random decisions to click on links and visit different web pages.
Relevance of the Random Walk Algorithm to Ranking Web Documents:

The random walk algorithm, particularly the PageRank algorithm developed by Larry Page and Sergey Brin, plays a crucial role in ranking web documents.
PageRank assigns importance scores to web pages based on their connectivity and the links they receive from other authoritative pages.
Pages with higher PageRank scores are considered more relevant and authoritative, impacting their position in search engine rankings.
It helps search engines like Google provide more accurate and valuable search results by prioritizing pages with higher PageRank scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

What is tf-idf? How does it help identify those web-pages that
are most relevant to a given web query.

A

TF-IDF (Term Frequency-Inverse Document Frequency):

A numerical statistic used in information retrieval and text mining.
Measures the importance of a term within a document relative to a collection of documents.
Combines two components: TF (Term Frequency) and IDF (Inverse Document Frequency).
How TF-IDF Helps Identify Relevant Web Pages:

Term Frequency (TF):
Measures how frequently a term appears in a document.
High TF indicates the term’s significance in the document.
Inverse Document Frequency (IDF):
Measures how unique or common a term is across the entire document collection.
High IDF indicates a term’s rarity and importance.
Combining TF and IDF:
TF-IDF calculates a score for each term in a document.
Terms with high TF and high IDF are considered more relevant.
Relevance to Web Queries:
When a user enters a web query, search engines calculate the TF-IDF scores of terms in both the query and web pages.
Web pages with higher TF-IDF scores for query terms are considered more relevant.
This helps search engines identify and rank web pages that best match the user’s query by considering both term frequency within documents and term rarity across the entire web collection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

Describe what is meant by heuristic search. Describe in detail a
specific heuristic to efficiently solve one of the following
problems: 15-tile puzzle, Farmer-Fox-Goose problem, Sudoku
puzzle or other relevant puzzle. Make use of specific examples
of relevant search states in your answer.

A

Heuristic Search:

Problem-solving technique using heuristic functions to guide search.
Heuristics estimate the desirability of states to reach a goal efficiently.
15-Tile Puzzle Heuristic: Manhattan Distance:

Heuristic function: Sum of distances each tile is from its goal position horizontally and vertically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Outline how the A* search algorithm operates. In your answer,
address how (and why) A* imposes a tree structure on its
solution space.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

Describe how the A* algorithm might be adapted to help in pathfinding (path planning) for a real or simulated mobile robot. Make
reference to how the A* algorithm performs optimally

A

Adapting A for Pathfinding in Mobile Robots:*

A* is a popular choice for path planning in mobile robotics as it combines the advantages of both breadth-first and greedy best-first search algorithms.

Adaptations for mobile robots:

Define a grid-based or graph-based representation of the robot’s environment where nodes represent discrete locations or states.
Implement a heuristic function that estimates the cost from the current robot position to the goal.
Incorporate movement constraints, such as obstacles and permissible directions.
Utilize a priority queue to explore states with lower estimated costs first.
Optimality of A:*

A* guarantees optimality when:

The heuristic is admissible (never overestimates true costs).
The robot moves in discrete steps with a finite number of states.
The cost function satisfies the triangle inequality (c(x, y) + h(y) ≤ h(x), where c is the cost, and h is the heuristic).
A* maintains two lists: one for open nodes to be explored and one for closed nodes that have been evaluated.

It expands nodes with the lowest cost (g(n) + h(n)) first, where g(n) is the cost to reach the node, and h(n) is the heuristic cost to reach the goal from the node.

By considering both the cost to reach a state and the estimated cost to the goal, A* explores the most promising paths early, ensuring an optimal solution when the conditions are met.

Adapting A* for path planning in mobile robots involves incorporating the robot’s specific movement capabilities and environmental constraints into the search algorithm while maintaining A*’s optimality under appropriate conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

In what way does adversary search differ from traditional
heuristic search processes? Illustrate your answer with
reference to appropriate examples.

A

Adversary Search vs. Traditional Heuristic Search:

Adversary Search:
Used in two-player games (e.g., chess).
Considers both players’ strategies.
Examples: Chess, Tic-Tac-Toe.
Traditional Heuristic Search:
Used in single-player problem-solving.
Focuses on finding optimal solutions.
Examples: Route planning, puzzle solving.
Illustration: Chess - Adversary Search vs. Traditional Heuristic Search:

Adversary Search: In chess, considers White’s and Black’s moves, optimizing for one player while considering the opponent’s counter-strategy.

Traditional Heuristic Search: In chess, aims to find the best move for one player independently without considering an opponent’s response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

Write down the PageRank formula, explaining each of its terms.
Briefly comment on the relevance of iterative convergence to the
calculations of this formula.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

How can a heuristic function be combined with a simple iterative
search process to generate solutions to seemingly complex
problems? Focus your answer on the role played by the heuristic
function with a search process.

A

Heuristic function estimates state or action desirability.
Guides iterative search by prioritizing states.
Enhances efficiency in complex problems.
Example: In the Traveling Salesman Problem, estimates distances to unvisited cities, guiding the search towards shorter routes.
Crucial for focusing search on promising solutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

List each of the steps of the A* algorithm. If you wish, you may
use diagrams to illustrate your answer.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

What do you think are the most significant differences between
(ordinary) heuristic search and adversary search? Justify your
answer.

A

Significant Differences Between Heuristic Search and Adversary Search:

Objective:

Heuristic Search: Seeks optimal solutions or states in single-player problems.
Adversary Search: Focuses on competing with an adversary to maximize or minimize outcomes in two-player games.
Players’ Strategies:

Heuristic Search: Assumes no opponent; it’s a single-player search process.
Adversary Search: Considers strategies of both players, one aiming to maximize and the other to minimize outcomes.
Algorithm Types:

Heuristic Search: Uses algorithms like A* or hill climbing.
Adversary Search: Utilizes MiniMax, alpha-beta pruning, or other game-specific algorithms.
Scenarios:

Heuristic Search: Applicable to a wide range of single-player problems like route planning, puzzle solving.
Adversary Search: Specifically designed for two-player, zero-sum games such as chess, checkers, or tic-tac-toe.
Justification:

The primary difference lies in their objectives and the consideration of opponent strategies. Heuristic search aims for optimal solutions in single-player problems, while adversary search deals with competitive scenarios in two-player games. This distinction leads to differences in algorithms, players’ strategies, and problem domains where each is most applicable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

Discuss how you would go about the process of measuring the
accuracy of an information retrieval process.

A

To measure the accuracy of an information retrieval process:

Define what constitutes “relevant” documents.
Choose metrics like precision, recall, and F1-score.
Create a test collection with queries and relevance judgments.
Run the retrieval system on the test collection.
Calculate and analyze the chosen metrics.
Iterate to refine the system and report findings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

What role does the document-term index play in information
retrieval on the web?

A

Document-Term Index in Information Retrieval:

Role:

The document-term index, often referred to as an inverted index, plays a central role in information retrieval on the web.
It serves as a data structure that efficiently stores and manages the mapping between terms (words or phrases) and the documents in which those terms appear.
Functions:

Efficient Searching: It enables quick and efficient searching for documents containing specific terms, reducing the search time significantly.
Ranking Relevance: The index can be used to calculate relevance scores for documents based on the frequency and location of terms, which aids in ranking search results.
Filtering: It supports the elimination of stop words (common words like “the” or “and”) and stemming (reducing words to their root form) to improve search precision.
Facilitating Retrieval: By maintaining term-document associations, the index helps retrieve relevant documents in response to user queries.
Example:

When you perform a web search, the document-term index is used to quickly identify web pages or documents containing the keywords from your query, enabling the search engine to return relevant results.
In essence, the document-term index is a fundamental component of web search engines, making the retrieval of relevant information from vast amounts of web content fast and efficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

What is meant tf-idf in the context of information retrieval? Briefly
describe how you would expect the tf-idf value to differ for the
terms “the” and “aardvark” for a collection of English language
web-pages. You may assume that stop words (like “the”) have
not been removed from the document-term index.

A

TF-IDF in Information Retrieval:

TF-IDF stands for “Term Frequency-Inverse Document Frequency.”
It’s a numerical statistic used in information retrieval to assess the importance of a term within a document relative to a collection of documents.
TF-IDF combines two components: “Term Frequency” (TF), measuring how often a term appears in a document, and “Inverse Document Frequency” (IDF), estimating how unique or common a term is across the entire document collection.
Difference in TF-IDF for “the” and “aardvark”:

For a collection of English language web pages:

“the” is an extremely common word appearing frequently in nearly all documents.
“aardvark” is a relatively uncommon word, appearing in very few documents (if any) compared to “the.”
Expected TF-IDF Difference:

The TF-IDF value for “the” will be very low due to its high term frequency and low document specificity (high IDF).
The TF-IDF value for “aardvark” will be higher than “the” since it’s less frequent and more specific to certain documents (low TF, low IDF).
In summary, the TF-IDF value for “the” will be much lower than that for “aardvark” in a collection of English language web pages, reflecting their differing frequencies and document specificities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

Why does Google’s PageRank algorithm approximate the
behaviour of a theoretical surfer, rather than focusing on the
behaviour of an actual surfer of the web?

A

Google’s PageRank algorithm approximates the behavior of a theoretical surfer, rather than focusing on the behavior of an actual surfer of the web, for several reasons:

Scalability: The web is vast and constantly changing, with billions of pages and hyperlinks. It’s impractical to track the real-time behavior of every individual surfer due to the sheer volume of web activity.

Privacy: Tracking the actual behavior of web users would raise significant privacy concerns. Collecting and analyzing user data without consent would violate privacy norms and regulations.

Data Availability: Google does not have access to the complete browsing history of individual users. Even if they did, such data would be insufficient to comprehensively analyze and rank all web pages effectively.

Consistency: The behavior of individual surfers varies widely, making it challenging to establish a consistent and reliable ranking system. People’s interests, preferences, and actions online are diverse.

Quality Control: Using the behavior of a theoretical surfer allows for controlled and consistent ranking criteria, making it easier to identify and penalize manipulative practices like link farms and keyword stuffing.

By approximating the behavior of a theoretical surfer, PageRank provides a scalable, privacy-respecting, and consistent method for ranking web pages, which is essential for effective and reliable search engine performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

Briefly outline the generate-and-test approach to problem
solving. How does heuristic search compare with blind search,
when generating solutions to complex problems?

A

Generate-and-Test Approach:

In the generate-and-test approach to problem-solving:
Potential solutions are generated systematically.
Each generated solution is then tested to determine if it meets the problem’s criteria or goals.
The process continues until a satisfactory solution is found or the search space is exhausted.
Heuristic Search vs. Blind Search:

Heuristic Search:

Uses heuristics (rules or guiding principles) to prioritize the generation of solutions.
Tends to be more efficient in finding solutions, especially in complex problems, by focusing on promising paths.
Examples include A* search and hill climbing.
Blind Search (or Uninformed Search):

Searches without specific guidance or heuristics.
Tends to explore the search space exhaustively, which can be inefficient in complex problems.
Examples include breadth-first search and depth-first search.
In summary, heuristic search employs heuristics to guide the generation of solutions, making it more efficient and effective in complex problems compared to blind search, which explores the search space without specific guidance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

Describe how you might go about using the A* algorithm to
calculate the navigation path of a NPC (non-player character) in
some ‘first person’ computer game

A

To use the A* algorithm for NPC navigation in a first-person computer game:

Define the game environment, start, and goal.
Create a grid or graph representing the game world.
Design a heuristic function estimating distances to the goal.
Set up data structures (open and closed lists) for exploration.
Implement A* to find the optimal path.
Reconstruct the path from start to goal.
Use the path to guide NPC movement in the game world, considering dynamic changes and optimizing for real-time performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
90
Q

How does changing from “ordinary” heuristic search to adversary
search impact on the search processes used to play these
games? What is the relevance of ‘zero sum games’ to adversary
search?

A

Changing from ordinary heuristic search to adversary search impacts game search processes as follows:

Objective:

Ordinary Heuristic Search: Seeks optimal solutions in single-player problems.
Adversary Search: Focuses on competitive gameplay in two-player, zero-sum games.
Opponent Consideration:

Ordinary Heuristic Search: Assumes no opponent.
Adversary Search: Considers both players’ strategies, one maximizing and the other minimizing outcomes.
Algorithm Type:

Ordinary Heuristic Search: Uses A* or hill climbing.
Adversary Search: Utilizes game-specific algorithms like MiniMax and alpha-beta pruning.
Zero-sum games are relevant to adversary search because they create a balanced framework for competitive gameplay, where one player’s gain corresponds to the opponent’s loss, making it crucial in evaluating strategies and outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
91
Q

Briefly outline the MiniMax search algorithm.

A

The MiniMax search algorithm is a decision-making algorithm used in two-player, zero-sum games. Here’s a brief outline of how it works:

Objective: Find the optimal move for a player while considering the opponent’s best responses.

Key Concepts:

Maximizing Player: The player trying to maximize their outcome.
Minimizing Player: The opponent trying to minimize the maximizing player’s outcome.
Game Tree: Represents all possible moves and their consequences, branching out from the current game state.
Terminal Nodes: End points of the game tree, representing game outcomes (win, lose, or draw).
Algorithm Steps:

Start at the current game state.
Generate all possible legal moves for the maximizing player.
For each move, recursively apply the MiniMax algorithm to determine the opponent’s best response.
The maximizing player selects the move that leads to the maximum possible outcome (highest score) based on the opponent’s best responses.
The algorithm continues to explore and evaluate moves until reaching a terminal node.
At terminal nodes, assign scores to reflect the game outcome (e.g., +1 for a win, -1 for a loss, 0 for a draw).
Backpropagate these scores up the tree, alternating between maximizing and minimizing players, until the root node is reached.
The maximizing player chooses the move at the root node with the highest score as the optimal move.
Outcome: MiniMax provides a strategy for the maximizing player to make decisions that maximize their chances of success, considering the opponent’s best responses. It assumes that both players play optimally, resulting in a best-case scenario for the maximizing player and a worst-case scenario for the minimizing player.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
92
Q

Describe any two selection operators. Briefly compare and
contrast their advantages and/or disadvantages.

A

Here are two selection operators used in genetic algorithms, along with a brief comparison of their advantages and disadvantages:

Roulette Wheel Selection:

Advantages:
Simple and computationally efficient.
Maintains diversity by giving a chance to less fit individuals.
Disadvantages:
Biased towards fitter individuals, leading to premature convergence.
Inefficient for highly imbalanced fitness landscapes.
Tournament Selection:

Advantages:
Control over selection pressure with tournament size.
Provides diversity by allowing less fit individuals to be chosen.
Disadvantages:
Introduces randomness due to random tournament selection.
Smaller tournaments may not exploit high-fitness individuals, while larger ones reduce exploration.
In summary, Roulette Wheel Selection is simple but biased, while Tournament Selection offers control and diversity but introduces randomness. Choice depends on problem characteristics and desired trade-offs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
93
Q

What is meant by a “random walk” of a graph?

A

A “random walk” of a graph refers to a stochastic process where a walker moves from one vertex (node) to another within the graph.
Each step in the walk is determined randomly based on the edges connecting the current node to its neighbors.
It can model various scenarios, including exploring networks, simulating diffusion, or analyzing Markov chains.
Random walks help analyze graph properties, like connectivity and centrality, and can be used in algorithms such as PageRank for web page ranking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
94
Q

What is tf-idf? What is its relevance to the indexing and retrieval
processes of a document search engine?

A

TF-IDF (Term Frequency-Inverse Document Frequency):

A numerical statistic in information retrieval.
Combines term frequency (TF) and inverse document frequency (IDF) to evaluate a term’s importance in a document within a collection.
Relevance to Indexing and Retrieval:

Indexing: TF-IDF helps create an index by assigning scores to terms in documents.
Retrieval: It aids in ranking documents based on query relevance; documents with higher TF-IDF scores for query terms are considered more relevant.
Balances term frequency (importance within a document) and document frequency (importance across the collection).
Improves search engine precision by identifying relevant documents while reducing the impact of common terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
95
Q

Describe how you might use the A* algorithm to plan the
navigation path for a real (or a simulated) mobile robot.

A

A Algorithm for Robot Path Planning:*
Define the robot’s current position and the goal location.
Create a grid or graph representing the environment with obstacles.
Assign costs to grid cells or graph nodes based on distance to the goal and obstacle avoidance.
Initialize open and closed lists to manage explored and unexplored positions.
Apply the A* algorithm to iteratively expand nodes, prioritizing lower-cost paths.
Generate a path from the start to the goal by backtracking from the goal node to the start node using parent pointers.
Execute robot movements according to the calculated path while avoiding obstacles.
Periodically update the path in dynamic environments or when re-planning is necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
96
Q

Describe, in detail, one specific heuristic function for one of the
following problems: Farmer-Fox-Goose problem, 8 tile puzzle,
15 tile puzzle, Blast-search, OXO, Chess or Draughts. How
would you represent a solution? For what features would your
heuristic return good values? How can these features be
detected from your representation?

A

Specific Heuristic Function for 15 Tile Puzzle:
Problem Description: In the 15 tile puzzle, you have a 4x4 grid with 15 numbered tiles and one empty space. The goal is to rearrange the tiles from an initial configuration to a target configuration by sliding them into the empty space.

Heuristic Function:

One heuristic function is the “Manhattan Distance”:
Calculate the sum of the Manhattan distances (also called L1 distances) between each tile’s current position and its goal position.
The Manhattan distance for a tile is the sum of the horizontal and vertical distances between its current and goal positions.
Representation of Solution:

The solution is represented as a sequence of moves (up, down, left, right) to transform the initial configuration into the target configuration.
Features for Good Heuristic Values:

The heuristic returns good values based on the following features:
The sum of the Manhattan distances of each tile to its goal position.
The number of misplaced tiles (tiles not in their goal positions).
Detecting Features:

To calculate the Manhattan distances, measure the horizontal and vertical distances between each tile’s current and goal positions.
Count the number of tiles that are not in their goal positions to detect the number of misplaced tiles.
Advantages:

The Manhattan Distance heuristic provides admissible and consistent values, ensuring it never overestimates the number of moves required.
It effectively guides the search towards optimal solutions.
This heuristic function for the 15 tile puzzle is based on measuring the displacement of tiles from their goal positions, allowing it to estimate the remaining effort needed to reach the solution efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
97
Q

How does the MiniMax search algorithm produce better
solutions to adversarial search problems, than “ordinary”
heuristic search?

A

MiniMax vs. Ordinary Heuristic Search:
Adversarial Search Focus:

MiniMax specifically designed for two-player, zero-sum games with opponents.
Ordinary heuristic search primarily for single-player optimization problems.
Opponent Consideration:

MiniMax considers the opponent’s actions and minimizes their potential gains.
Ordinary heuristic search assumes no opponent or external factors.
Optimal Strategies:

MiniMax aims to find optimal strategies by minimizing the worst-case scenario for the maximizing player.
Ordinary heuristic search may not focus on game theory concepts like optimal strategies.
Depth of Search:

MiniMax often explores deeper into the game tree, considering various moves and counter-moves.
Ordinary heuristic search may not delve as deep into the search space.
Relevance to Adversarial Games:

MiniMax is specifically tailored for competitive gameplay, making it a better choice for adversarial search problems.
Ordinary heuristic search is better suited for non-competitive, single-player scenarios.
In summary, MiniMax is designed for adversarial games, where considering opponents and finding optimal strategies is crucial. It explores deeper into the game tree, making it more effective in such scenarios compared to ordinary heuristic search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
98
Q

Describe in detail how you would go about the task of building
up a large index of web documents, starting from just one URL.
Describe each of the main processes and structures used in
your answer.

A

Web Crawling:

Start with the initial URL.
Use web crawling algorithms to retrieve web pages linked from the initial URL.
Follow hyperlinks to explore additional web pages.
Document Parsing:

Retrieve the content of each web page.
Parse the HTML or other document formats to extract text and metadata (e.g., title, keywords).
Remove HTML tags, scripts, and other non-text elements.
Text Preprocessing:

Tokenize the extracted text into words or phrases.
Remove stop words (common words like “the” and “and”).
Apply stemming or lemmatization to reduce words to their base form.
Indexing:

Create an inverted index data structure.
Map terms (words or phrases) to the documents they appear in.
Store metadata (e.g., document ID, URL) for each indexed document.
Assign term weights using techniques like TF-IDF.
Storing and Managing Documents:

Store the indexed documents in a database or file system.
Keep track of document metadata, such as last modified date.
Implement a mechanism to handle document updates and deletions.
Web Crawler Control:

Implement policies to control the crawling process (e.g., politeness, rate limiting).
Avoid crawling duplicate or already indexed content.
Respect robots.txt files to adhere to website access guidelines.
Scaling and Distributed Computing:

To handle a large volume of web documents, use distributed computing and storage solutions.
Implement load balancing to distribute crawling tasks efficiently.
Error Handling:

Develop error handling mechanisms to deal with broken links, inaccessible websites, and network errors.
Retry policies may be necessary to ensure thorough coverage.
Continuous Updating:

Implement periodic re-crawling and updating of indexed documents to maintain freshness.
Monitor changes in web content and update the index accordingly.
Search Interface:

Create a user-friendly search interface to query the indexed documents.
Implement ranking algorithms to retrieve relevant results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
99
Q

Using an example or otherwise, outline how heuristic search
can produce superior results to “blind” search.

A

Heuristic Search vs. Blind Search:

Heuristic Search:

Utilizes heuristics or rules to guide the search towards promising solutions.
Incorporates domain-specific knowledge to estimate the desirability of states.
Can significantly improve search efficiency.
Blind Search (Uninformed Search):

Searches without specific guidance or heuristics.
Explores the search space exhaustively, often leading to inefficiency.
Example: Finding a Path in a Maze:

Heuristic Search (A):*

Utilizes heuristics such as Manhattan distance to estimate the distance from the current state to the goal.
Prioritizes exploring states with lower estimated distances.
Efficiently finds the optimal path while avoiding unnecessary exploration.
Blind Search (Depth-First Search):

Explores paths without guidance, often backtracking and revisiting states.
Inefficient for large mazes, leading to slower convergence and suboptimal paths.
Superior Results of Heuristic Search:

Heuristic search, like A*, uses domain knowledge to make informed decisions.
It focuses on promising paths, avoiding wasted exploration.
This results in faster convergence to optimal or near-optimal solutions compared to blind search in complex problem spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
100
Q

Describe in detail one heuristic you would use for the 15-tile
puzzle. Make reference to specific features of the solution
space that your solution method would focus upon.

A

Heuristic for the 15-Tile Puzzle:

Heuristic Function: The Manhattan Distance Heuristic

Features of the Solution Space Focused Upon:

Tile Misplacement: It calculates the sum of Manhattan distances (L1 distances) between each tile’s current position and its goal position.
Distance Estimation: The heuristic estimates the total distance to move each tile to its goal location.
Localization: It considers the spatial arrangement of tiles and how far they are from their intended positions.
Operation:

For each tile, calculate the Manhattan distance by summing the horizontal and vertical distances between its current position and its goal position.
Sum the Manhattan distances for all tiles in the puzzle.
This sum represents an estimate of the minimum number of moves required to reach the goal state.
The heuristic guides the search algorithm to prioritize moves that reduce the total Manhattan distance.
Advantages:

It provides an admissible heuristic, never overestimating the required moves.
Effectively guides the search towards optimal solutions.
Focuses on tile displacement and alignment with the goal state, which are critical features of the solution space.
The Manhattan Distance Heuristic for the 15-Tile Puzzle is an effective and widely used heuristic that leverages the spatial relationships between tiles to estimate the distance to the goal state accurately, aiding in the efficient exploration of the solution space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
101
Q
A

D value of 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
102
Q

Outline the main differences between MinMax search and the
addition of - pruning (alpha-beta pruning).

A

MinMax Search:

Algorithm for two-player, zero-sum games.
Evaluates all possible game states to find the best move.
Considers the entire game tree, which can be computationally expensive.
Guarantees optimal play but can be inefficient for large trees.
Alpha-Beta Pruning:

Enhancement to MinMax search.
Reduces the number of evaluated nodes in the game tree.
Introduces alpha (lower bound) and beta (upper bound) values to cut off branches of the tree that cannot influence the final decision.
Significantly reduces search space, leading to faster and more efficient evaluation.
Preserves the same optimal results as MinMax but with fewer node evaluations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
103
Q

Compare and contrast the two information retrieval metrics of
Precision and Recall.

A

Precision and Recall Comparison:

Definition:

Precision: Measures the relevancy of retrieved documents. It calculates the ratio of relevant documents retrieved to the total documents retrieved.
Recall: Measures the coverage of relevant documents. It calculates the ratio of relevant documents retrieved to the total relevant documents in the collection.
Focus:

Precision: Focuses on the quality of retrieved results, aiming to minimize false positives.
Recall: Focuses on the quantity of relevant results, aiming to minimize false negatives.
Trade-off:

Precision-Recall Trade-off: There is often an inverse relationship between precision and recall. Increasing precision may decrease recall and vice versa.
Use Cases:

Precision: Important in scenarios where false positives can have severe consequences (e.g., medical diagnoses).
Recall: Important in scenarios where missing relevant results is costly (e.g., information retrieval systems).
Formula:

Precision: (True Positives) / (True Positives + False Positives)
Recall: (True Positives) / (True Positives + False Negatives)
Objective:

Precision: Maximize precision when false positives are undesirable.
Recall: Maximize recall when missing relevant results is unacceptable.
Interplay:

Balancing precision and recall is critical, and the F1-score is often used to find a compromise between the two metrics.
In summary, precision and recall are both essential metrics in information retrieval, with different focuses and trade-offs. Precision emphasizes result quality, while recall emphasizes result quantity, and they are often considered together to evaluate the performance of retrieval systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
104
Q

What is the significance of differentiating between in-links and
out-links for document ranking on the web? Illustrate your
answer using an appropriate example.

A

Significance of In-Links and Out-Links:

In-Links (Backlinks):

Represent links from other web pages pointing to a specific page.
Indicate the page’s authority and relevance within the web ecosystem.
Used to assess a page’s importance and trustworthiness.
Out-Links (Outbound Links):

Represent links from a specific page to other web pages.
Indicate the content’s context and reference to related topics.
Can influence the ranking of linked pages and contribute to information flow.
Illustration with Example:

Example:

Consider two web pages, Page A and Page B.
Page A has numerous high-quality in-links from authoritative websites in the same domain.
Page B has numerous out-links to relevant, reputable sources on the topic it covers.
Significance:

Page A is likely considered an authoritative source due to its valuable in-links, potentially ranking higher in search results.
Page B, while not an authority itself, contributes to the web’s information flow by referencing authoritative sources. This can indirectly benefit its ranking by enhancing its context and credibility.
In summary, differentiating between in-links and out-links is significant in web document ranking. In-links indicate a page’s authority, while out-links contribute to context and information flow, influencing a page’s relevance and credibility in search engine rankings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
105
Q

What is tf-idf? How does it help identify those web-pages that
are most relevant to a given web query.

A

TF-IDF (Term Frequency-Inverse Document Frequency):

A numerical statistic used in information retrieval.
Combines term frequency (TF) and inverse document frequency (IDF) to evaluate a term’s importance in a document within a collection.
Relevance to Web Page Ranking:

Term Frequency (TF):

Measures how often a term appears in a web page.
High TF for query terms indicates their importance within a page.
Inverse Document Frequency (IDF):

Measures how rare a term is across the entire collection of web pages.
High IDF for query terms indicates their uniqueness.
Ranking Relevance:

TF-IDF assigns higher scores to web pages where query terms have high TF and IDF.
Pages with query terms occurring frequently within them but not too common across the entire collection are considered more relevant.
Example:

For a query containing “artificial intelligence” and “machine learning,” a web page mentioning these terms frequently (high TF) but not overly common across the web (high IDF) is ranked higher, as it is more relevant to the query.
In summary, TF-IDF helps identify the most relevant web pages for a given query by considering both the term’s frequency within a page (TF) and its rarity across the entire collection (IDF), allowing it to prioritize pages with contextually important terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
106
Q

Why does Google’s PageRank algorithm approximate the
behaviour of a theoretical surfer, rather than focusing on the
behaviour of an actual surfer of the web?

A

PageRank Approximates a Theoretical Surfer:

Scalability: Modeling the behavior of every actual web surfer is impractical due to the immense scale of the web.

Consistency: A theoretical surfer consistently explores links and provides a more predictable framework for ranking web pages.

Algorithmic Efficiency: Computationally, simulating a theoretical surfer is more efficient than tracking the diverse behaviors of real users.

Focus on Link Structure: PageRank’s focus is on analyzing the web’s link structure and identifying authoritative pages, which aligns with a theoretical surfer’s goal.

Relevance to Web Ranking:

PageRank’s simplified model still yields valuable insights into web page importance and relevance based on link analysis, helping rank web pages effectively.
In summary, PageRank approximates a theoretical surfer’s behavior for scalability, consistency, and computational efficiency while still providing valuable insights for web page ranking based on link structure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
107
Q

What is K nearest neighbours (KNN) and how are they used to
solve problems?

A

K Nearest Neighbors (KNN):

A supervised machine learning algorithm.
Used for classification and regression tasks.
Operation:

Classifies or predicts a data point based on the majority class or average value of its K nearest data points in the training dataset.
Utilizes a distance metric (e.g., Euclidean distance) to determine proximity.
Usage:

Classification: Assigns a class label to a data point based on the class labels of its K nearest neighbors.
Regression: Predicts a numerical value based on the average of the values of its K nearest neighbors.
Commonly used for pattern recognition, recommendation systems, and anomaly detection.
Parameters:

K value determines the number of neighbors considered.
Choice of K impacts algorithm performance and robustness.
Pros:

Simple and intuitive.
Works well for small to medium-sized datasets.
Non-parametric and adaptable to different data distributions.
Cons:

Sensitive to noisy data.
May require careful preprocessing and tuning.
Slower for large datasets due to calculating distances for all data points.
In summary, KNN is a versatile algorithm used for classification and regression tasks by considering the class or values of the nearest neighbors of a data point, making it suitable for various machine learning applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
108
Q

The solutions space used by KNN can be thought of a
tessellation of that space, creating a Voronoi diagram. Justify
this claim using a diagram to illustrate your answer.

A

Tessellation of Solution Space:

In KNN, the solution space is divided into regions based on the K nearest neighbors.
Each region is centered around a data point from the training set.
These regions collectively create a tessellation of the solution space.
Voronoi Diagram Illustration:

A Voronoi diagram consists of polygons (Voronoi cells) where each polygon represents an area closest to a specific data point.
These polygons correspond to the regions in KNN, centered around the training data points.
The Voronoi diagram visually represents how KNN classifies or predicts based on the closest neighbors.
Justification:

KNN assigns a data point to the class or value most prevalent among its K nearest neighbors.
This corresponds to selecting the Voronoi cell associated with the closest training data point.
The tessellation created by Voronoi cells captures the influence regions of each training data point on the solution space.

In summary, the tessellation of the solution space in KNN resembles a Voronoi diagram, where each region corresponds to a Voronoi cell centered around a training data point. This diagram illustrates how KNN makes classification or regression decisions based on proximity to training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
109
Q

Outline how the KNN algorithm might be applied to the problem
of retrieving source code, for possible reuse.

A

Applying KNN to Source Code Retrieval:

Data Representation:

Represent source code files as numerical feature vectors.
Features can include code structure, comments, variable names, and more.
Training Data:

Build a training dataset containing labeled source code snippets.
Labels indicate the purpose, functionality, or category of each snippet.
Distance Metric:

Choose an appropriate distance metric (e.g., cosine similarity or Euclidean distance) to measure similarity between code snippets.
KNN Configuration:

Determine the value of K (number of nearest neighbors) based on experimentation and dataset size.
Retrieval Process:

Given a new source code snippet for reuse, convert it into a feature vector.
Calculate distances to all snippets in the training dataset.
Select the K nearest neighbors (most similar code snippets).
Decision Making:

For classification purposes, determine the majority class among the K neighbors to classify the new code snippet.
For retrieval, return the K most similar snippets to the user.
Evaluation:

Measure the algorithm’s performance using evaluation metrics like precision, recall, or F1-score.
Refine the model and parameters as needed for better retrieval results.
Advantages:

KNN can capture code similarity based on various features, facilitating code reuse.
It can accommodate different programming languages and coding styles.
Challenges:

Choosing relevant features and an appropriate distance metric is crucial for accurate retrieval.
Handling large codebases may require efficient data structures and search algorithms.
Handling semantic understanding and contextual code relevance is a more complex problem.
In summary, applying KNN to source code retrieval involves feature representation, training with labeled data, and using a distance metric to find the most similar code snippets in a training dataset for reuse or classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
110
Q

Briefly describe how heuristic search can produce superior
results to “blind” search.

A

Heuristic Search vs. Blind Search:

Guidance vs. No Guidance:

Heuristic search uses heuristics or rules to guide the search towards promising solutions.
Blind search explores without specific guidance, often in a brute-force manner.
Efficiency vs. Inefficiency:

Heuristic search focuses on relevant paths, reducing unnecessary exploration.
Blind search can be inefficient for large or complex search spaces.
Optimal vs. Suboptimal Solutions:

Heuristic search aims for optimal or near-optimal solutions using domain-specific knowledge.
Blind search may find solutions but doesn’t prioritize optimality.
Example: Maze Solving:

In a maze, heuristic search may use distance to the goal as a heuristic, guiding towards the shortest path.
Blind search may explore all possible paths, including longer ones, before finding a solution.
In summary, heuristic search’s guidance and domain-specific knowledge lead to more efficient exploration and the potential for optimal solutions compared to blind search in complex problem spaces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
111
Q

Describe in detail the A* search algorithm. You may find it
useful to use examples or diagrams to present your answer

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
112
Q
A

B with a value of 3.

113
Q

Briefly outline the MinMax search algorithm

A

MinMax Search Algorithm:

Objective: Used in two-player, zero-sum games (e.g., chess) to find optimal strategies.

Operation:

Recursively explores the game tree, considering both player’s moves.
Maximizes the utility for one player (Max) while minimizing it for the opponent (Min).
Steps:

Begin at the root node, representing the current game state.
Alternate between Max and Min levels in the game tree.
Max levels choose moves that maximize their utility.
Min levels choose moves that minimize Max’s utility.
Continue until a terminal state or a predefined depth is reached.
Backpropagate utility values up the tree to determine the best initial move.
Optimality: MinMax aims to find a strategy that minimizes the worst-case scenario for the maximizing player.

Alpha-Beta Pruning: An enhancement to MinMax that reduces unnecessary evaluations in the game tree.

Limitations: Can be computationally expensive for deep game trees, but it guarantees optimal play in zero-sum games.

114
Q

What is a random walk of the World Wide Web (WWW)? What
is the relevance of the random walk algorithm to the ranking of
web documents?

A

Random Walk of the WWW:

Definition: A stochastic process where a virtual “surfer” navigates the web by randomly clicking links on web pages.
Objective: Simulates the behavior of a user exploring web pages in a non-directed manner.
Relevance to Ranking of Web Documents:

PageRank Algorithm: Developed by Google, it models the web as a graph and views a random walk as a representation of how web surfers might traverse the web.
Importance: Measures a web page’s importance by analyzing the frequency and quality of visits by a random surfer.
Relevance Score: Pages visited more frequently by the surfer receive higher PageRank scores, which impact their ranking in search results.
Effective Ranking: Random walk algorithms help rank web documents by considering their “popularity” and influence within the web ecosystem.

115
Q

What role does the document-term index play in information
retrieval on the web?

A

Role of Document-Term Index:

Organization: It organizes and stores information about terms (words or phrases) and their occurrences in documents on the web.

Efficient Retrieval: Enables rapid retrieval of relevant documents in response to user queries.

Keyword Matching: Facilitates keyword-based searches by mapping terms to their document locations.

Scalability: Allows search engines to handle vast amounts of web content by indexing and searching efficiently.

Ranking: Supports ranking algorithms by providing data on term frequencies, enabling relevance ranking.

Information Retrieval: Serves as the foundation for web search engines, helping users find web pages that match their queries.

116
Q

Briefly outline the (K nearest neighbours) KNN method of
generating solutions to problems. What is meant by the “curse
of dimensionality”?

A

K Nearest Neighbors (KNN):

Method:

Utilizes labeled data for classification or regression.
Predicts an unknown value based on the majority class or average of K nearest data points.
Steps:

Choose K and a distance metric (e.g., Euclidean distance).
Calculate distances between the unknown point and all data points.
Select the K nearest neighbors.
For classification, predict the majority class among neighbors; for regression, calculate the average.
Curse of Dimensionality:

Definition: Refers to the challenges and issues that arise when dealing with high-dimensional data.

Effects:

Increased data sparsity as dimensions grow.
Increased computational complexity.
Diminished effectiveness of distance-based algorithms like KNN.
Impact on KNN: High-dimensional data can lead to increased distance values and make the nearest neighbors less representative, potentially reducing the accuracy of KNN.

117
Q

How would you use KNN to approximate the value of an
unknown complex function, given a collection (x,y) coordinates
gathered for that function.

A

Using KNN to Approximate an Unknown Function:

Data Collection:

Gather a collection of (x, y) coordinates from the unknown function.
Data Representation:

Represent the collected data as a dataset with x-values as input features and corresponding y-values as output labels.
KNN Configuration:

Choose an appropriate K value (number of neighbors) and a distance metric (e.g., Euclidean distance).
Prediction:

Given a new x-value for which you want to approximate the function’s y-value:
Calculate the distances to all x-values in the dataset.
Select the K nearest neighbors based on distance.
Predict the y-value by averaging the y-values of the K neighbors (for regression).
Evaluation:

Assess the accuracy of the KNN model using evaluation metrics such as Mean Squared Error (MSE) or Root Mean Squared Error (RMSE).
Advantages:

KNN can approximate complex functions with flexibility.
It doesn’t make strong assumptions about the underlying function.
Considerations:

The choice of K impacts the trade-off between bias and variance.
Data preprocessing and feature scaling may be necessary for optimal performance.

118
Q

Briefly outline how the ‘generate and test’ strategy can be used
to solve problems. What role is played by heuristics in this
process?

A

Generate and Test Strategy:

Method:

Generates potential solutions systematically.
Tests each solution against predefined criteria to determine if it’s a valid solution.
Steps:

Generate a set of potential solutions.
Apply a predefined test or evaluation function to each solution.
Select the solutions that meet the criteria as valid solutions.
Role of Heuristics:

Heuristics: Guiding rules or strategies used to efficiently generate and evaluate potential solutions.

Heuristics’ Role:

Heuristics help prioritize the generation of more promising solutions, reducing the search space.
They assist in quickly discarding solutions that are unlikely to meet the criteria.
Heuristics can optimize the generate-and-test process by focusing on relevant solution spaces.
In summary, the generate-and-test strategy involves systematically creating potential solutions and evaluating them against predefined criteria. Heuristics play a crucial role in guiding the generation and evaluation process, making it more efficient and effective.

119
Q

Discuss some of the problems associated with greedy search
(using a look-ahead of just one move).

A

Problems with Greedy Search (One-Move Look-Ahead):

Local Optimality: Greedy search tends to make decisions based on immediate gains without considering long-term consequences, leading to locally optimal but not necessarily globally optimal solutions.

Ineffective for Complex Problems: In complex problem spaces, a one-move look-ahead may overlook superior paths or solutions that require multiple moves to realize their benefits.

Lack of Exploration: Greedy search can become stuck in suboptimal regions of the search space and may not explore alternative paths or solutions effectively.

Sensitivity to Initial State: The quality of the solution often depends on the initial state or starting point, making the algorithm sensitive to the choice of the initial configuration.

Not Guaranteed to Find the Best Solution: Greedy search is not guaranteed to find the best possible solution in a given problem space, especially when it prematurely commits to suboptimal choices.

In summary, greedy search with a one-move look-ahead can suffer from local optimality, ineffectiveness in complex problems, a lack of exploration, sensitivity to initial conditions, and the absence of optimality guarantees.

120
Q
A

b value of -3.

121
Q

Briefly outline - pruning, describing how it improves upon
MiniMax search.

A

Alpha-Beta Pruning:

Purpose: An optimization technique used in MiniMax search to reduce the number of nodes evaluated in the game tree.

Improvements Over MiniMax:

Efficiency: Alpha-beta pruning eliminates branches that cannot affect the final decision, significantly reducing the number of nodes explored.

Faster Search: It speeds up the search process, making it more efficient and feasible for deeper game trees.

Improved Depth: Alpha-beta pruning allows for deeper exploration of the game tree, potentially leading to better-informed decisions.

Same Optimal Solution: It guarantees the same optimal solution as the standard MiniMax algorithm.

In summary, alpha-beta pruning is an enhancement to MiniMax search that improves efficiency by eliminating unnecessary node evaluations while preserving the same optimal solution. It allows for deeper exploration of the game tree, leading to more informed decisions.

122
Q

Write down the PageRank formula explaining each of its terms.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

123
Q

What is meant tf-idf? How would the tf-idf values differ between
the terms “big” and “tibia” on a collection of English language
web-pages.

A

TF-IDF (Term Frequency-Inverse Document Frequency):

Definition: A numerical statistic used in information retrieval to assess the importance of a term within a document collection.

Calculation: Combines term frequency (TF) and inverse document frequency (IDF) to evaluate a term’s significance in a document:

TF (Term Frequency): Measures how often a term appears in a specific document.

IDF (Inverse Document Frequency): Measures the rarity of a term across the entire document collection.

Differences in TF-IDF for “big” and “tibia” in an English language web page collection:

“big” vs. “tibia”:

“big”:

Likely to have a high TF if it appears frequently within individual web pages.
May have a moderate to low IDF because it’s a common word in the English language.
“tibia”:

Likely to have a lower TF as it’s not a common word in most web pages.
Likely to have a high IDF due to its rarity across the entire collection.
In summary, TF-IDF values for “big” are likely to be influenced more by its high TF within documents, while “tibia” would have a higher IDF due to its rarity in the collection, despite potentially lower TF.

124
Q

How can the tf-idf score and the PagePank value be combined
to produce an overall ranking score for some presented query.

A

Combining TF-IDF and PageRank for Overall Ranking:

TF-IDF Score:

Measures term relevance within individual documents for the query.
Reflects the importance of terms within specific content.
PageRank Value:

Measures the importance of web pages based on link structure and popularity.
Reflects the global importance of web pages in the web graph.
Combination Method:

Weighted Sum: Assign weights to both TF-IDF and PageRank scores and calculate a weighted sum.
Formula Example: Overall Score = (α * TF-IDF Score) + (β * PageRank Value), where α and β are weight coefficients.
Customizable Weights:

The choice of weights α and β can be adjusted to prioritize either content relevance (TF-IDF) or global importance (PageRank).
Final Ranking:

Rank documents based on their combined overall scores, with higher scores indicating higher relevance.
Benefits:

Combining TF-IDF and PageRank leverages both content-specific and global factors, producing more comprehensive and relevant search results.
In summary, the TF-IDF score and PageRank value can be combined using a weighted sum, where customizable coefficients determine the emphasis on content relevance and global importance when ranking web pages for a presented query.

125
Q

What is an analogy and how do analogies differ from “literal”
text? Use an example to illustrate your answer.

A

Analogies vs. “Literal” Text:

Analogy:

An analogy is a comparison between two concepts or ideas that highlights their similarities to clarify or explain a point.
Analogies rely on resemblance, inference, or metaphorical connections to convey meaning.
“Literal” Text:

“Literal” text presents information directly, without relying on comparisons or figurative language.
It conveys information straightforwardly and factually.
Example:

Analogy: “Her smile was as bright as the sun, warming everyone’s hearts.”

This analogy compares the brightness of a smile to the sun’s brightness, creating a vivid mental image of the smile’s warmth.
“Literal” Text: “She had a very cheerful and warm smile.”

In this “literal” text, the description directly states that the person had a cheerful and warm smile without using figurative language.
Differences:

Analogies use comparisons to enhance understanding, while “literal” text presents information directly.
Analogies often create vivid mental images or connections, while “literal” text is straightforward and factual.

126
Q

How might an automated process use background knowledge
(such as WordNet or ConceptNet) to identify and complete the
following analogies?
Dog: puppy :: cat:?
Paris : France :: London :?

A

Using Background Knowledge for Analogies:

WordNet or ConceptNet:

Automated processes can leverage these knowledge bases to identify relationships between words and concepts.
Analogies:

For the analogy “Dog: puppy :: cat:?”, the process can:

Find synonyms or hyponyms for “cat” (e.g., feline).
Identify a related word that describes a young cat, like “kitten.”
For the analogy “Paris : France :: London :?”, the process can:

Recognize the relationship between a city and its country.
Retrieve “United Kingdom” as the country associated with London.
Role of Knowledge Bases:

WordNet and ConceptNet provide structured semantic information that helps automated systems find relevant concepts and relationships to complete analogies.
Benefits:

Using background knowledge enhances the ability of automated systems to infer and complete analogies by drawing from a vast network of associations between words and concepts.

127
Q

Briefly outline how a search based process may be used to
solving problem. Illustrate your answer with an appropriate
example.

A

Search-Based Problem Solving:

Method:

Involves systematically exploring a problem space to find a solution or an optimal outcome.
Steps:

Define the problem and its constraints.
Represent the problem space and possible states.
Initialize a search process, typically starting from an initial state.
Generate and evaluate successor states based on predefined rules or heuristics.
Continue searching until a goal state is reached or a termination criterion is met.
If a solution is found, return it; otherwise, report failure.
Example: Solving the 8-Puzzle:

Problem: Rearrange a 3x3 grid of numbered tiles into a target configuration.
Representation: The grid state represents the problem space, with each tile’s position defining a state.
Search Process: Use algorithms like A* search with a heuristic (e.g., Manhattan distance) to explore possible moves and reach the goal state efficiently.
Successor Generation: Generate successor states by moving tiles in the grid.
Termination: Stop when the goal state (solved puzzle) is reached or when a predefined time limit is exceeded.
Benefits:

Search-based processes provide systematic approaches to solving complex problems, allowing automation and efficient exploration of solution spaces.

128
Q
A

D value of 6.

129
Q

Minimax does not apply the heuristic function to the internal
nodes to a search tree. Using the example from (b) above or
otherwise, explain why this is the case.

A

Minimax and Heuristic Function:

Minimax Focus: Minimax is primarily designed for decision-making in two-player, zero-sum games like chess.

Terminal Nodes: It evaluates terminal nodes (end game states) directly, not internal nodes.

Reason:

Internal nodes represent potential future game states, which are uncertain and can change dynamically as players make moves.
Heuristic evaluation at internal nodes would introduce uncertainty into the decision-making process and could lead to suboptimal choices.
Minimax relies on a well-defined game tree structure with known terminal states where the outcome is certain.
Terminal Node Example: In chess, terminal nodes represent positions where the game is over, and the outcome (win, loss, or draw) is known.

Heuristic Use: Heuristics are typically used for the evaluation of terminal nodes to estimate the desirability of those outcomes without introducing uncertainty at internal nodes.

130
Q

Using an appropriate example or otherwise, describe how A*
discovers the optimal path to a solution.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

131
Q

Describe any two selection operators. Briefly compare and
contrast their advantages and/or disadvantages.

A

Selection Operators:

Tournament Selection:

Advantages:
Simplicity: Easy to implement and understand.
Exploration-Exploitation Balance: Can balance between favoring the best individuals and exploring new ones.
Reduces Noise: Less affected by noisy fitness evaluations.

Disadvantages:
Tournament Size: The choice of tournament size affects performance, and finding the optimal size can be a challenge.
No Rank Information: It doesn’t consider rank information, potentially missing more promising individuals.

Roulette Wheel Selection (Fitness Proportional Selection):

Advantages:
Fitness-Based: Selects individuals with a probability proportional to their fitness, favoring better solutions.
Versatility: Can handle both maximizing and minimizing fitness functions.
Diversity: Maintains diversity by giving lower-fitness individuals a chance.

Disadvantages:
Stochastic: Selection can be highly influenced by randomness.
Premature Convergence: It may favor the best solutions too quickly, leading to premature convergence.
Scaling: Requires fitness scaling to prevent very small fitness values from being overlooked.

Comparison:

Tournament selection is simpler and less sensitive to noisy fitness evaluations, while roulette wheel selection is based on fitness proportionality and maintains diversity.
Tournament selection allows for exploration-exploitation balance, while roulette wheel selection may lead to premature convergence.
The choice between them often depends on the specific problem and desired selection pressure.

132
Q

Discuss how you would create an information retrieval system
for use on the web. Describe the two main information structures
(databases) you would need to construct.

A

Creating an Information Retrieval System for the Web:

Main Information Structures (Databases):

Document-Term Index:

Purpose: Stores information about terms (words or phrases) and their occurrences in web documents.
Construction:
Collect and crawl web content.
Tokenize and preprocess documents.
Build an inverted index mapping terms to document IDs and positions.
Benefits: Enables efficient keyword-based searching and relevance ranking of web pages.
Link Graph (Graph Database):

Purpose: Represents the web’s hyperlink structure and relationships between web pages.
Construction:
Crawl the web to discover and record links between pages.
Build a graph database where nodes represent web pages and edges represent links.
Benefits: Supports algorithms like PageRank for ranking web pages based on link popularity and authority.
Functionality:

Users can submit queries to the retrieval system.
The system searches the document-term index to find relevant documents.
It also uses the link graph to apply ranking algorithms like PageRank to order search results.
Users receive ranked and relevant web pages in response to their queries.
Scalability and Maintenance:

Continuous crawling and updating of the document-term index and link graph to reflect the evolving web.
Efficient data storage and retrieval to handle the vast amount of web content.
Regular maintenance and indexing to ensure up-to-date search results.

133
Q

How can a generate and test strategy be used to solve the 8-tile
puzzle? For the 8-tile puzzle, briefly outline one method for
generation process (looking only at all valid moves from the
current state. Also for the 8-tile puzzle, briefly outline one
method for testing process.

A

Generate and Test Strategy for the 8-Tile Puzzle:

Generation Process:

Method: Generate all valid moves from the current state (e.g., move blank tile up, down, left, or right).
For each generated move, create a new state by applying the move to the current state.
Keep track of these generated states as potential solutions or next steps in the puzzle.
Testing Process:

Method: Test each generated state to check if it represents a solved puzzle.
Check if the current state matches the goal state (e.g., all tiles in ascending order).
If a generated state matches the goal state, it is considered a valid solution.
If not, continue generating and testing states until a solution is found or the search space is exhausted.

134
Q

What is meant by the term backtracking, as used in relation to
Depth First Search (DFS) to explore large solution spaces? You
may use an example to illustrate your answer.

A

Backtracking in Depth First Search (DFS):

Definition: Backtracking is a technique used in DFS to efficiently explore large solution spaces by undoing incorrect choices and returning to a previous state when a dead-end is encountered.

Example: Solving the N-Queens problem using DFS and backtracking:

Start by placing a queen in the first row of the chessboard.
Proceed to the next row while ensuring no queen conflicts with the previous row.
If a conflict is detected, backtrack to the previous row and explore alternative placements.
Continue this process until a valid solution (no conflicts) is found or all possibilities are exhausted.
Advantage: Backtracking reduces the need to explore every possible solution, making DFS more efficient for problems with large solution spaces.

Efficiency: It avoids unnecessary exploration, improving the efficiency of the search process in complex problems.

135
Q

How might the A* algorithm be used to plan the path along which
a mobile robot may be sent, in order to reach some destination?
Describe any assumptions and limitations.

A

Using A for Path Planning of a Mobile Robot:*

Purpose: A* algorithm can be employed to find an optimal path for a mobile robot to reach a destination while considering obstacles and terrain.

Assumptions:

Assumes a grid-based or graph representation of the environment.
Assumes the robot’s movements are discrete and can move in predefined directions.
Requires an accurate heuristic function to estimate the cost to reach the goal.
Process:

Start from the robot’s current position and expand possible movement directions.
Use a heuristic function to estimate the cost of reaching the destination from each neighboring cell.
Select the cell with the lowest estimated cost as the next step.
Continue this process until the destination is reached or no valid path is found.
Limitations:

A* may not always find a solution if a path is obstructed or unreachable.
The quality of the solution depends on the accuracy of the heuristic function.
Discretization may lead to suboptimal paths in continuous environments.
Computational complexity increases with grid/graph size, affecting real-time performance.
Overall: A* is a valuable algorithm for robot path planning but requires careful setup, assumptions, and consideration of its limitations.

136
Q
A

D value of 6.

137
Q

What is meant by a zero sum game? What is its relevance to
heuristic search?

A

Zero Sum Game:

Definition: A zero-sum game is a type of competitive game in which the total gains and losses among players sum to zero. What one player gains, another loses.

Relevance to Heuristic Search:

Adversary Search: Zero-sum games are commonly used in adversary search problems, such as chess or tic-tac-toe, where one player’s win is the other’s loss.

Minimax Algorithm: Heuristic search algorithms like Minimax are applied to zero-sum games to determine optimal strategies by considering the opposing player’s moves.

Utility Function: In heuristic search for zero-sum games, a utility function assigns values to game outcomes, typically representing wins, losses, or draws. Minimax aims to maximize this utility.

Importance: Understanding zero-sum games is essential for designing search strategies that can handle competitive scenarios, making them relevant to heuristic search in adversarial contexts.

138
Q

What is tf/idf and what is its relevance to information retrieval?
How does tf/idf impact the results returned from information
retrieval systems?

A

TF-IDF in Information Retrieval:

TF (Term Frequency):

Measures how often a term appears in a document.
Helps identify the importance of a term within a document.
IDF (Inverse Document Frequency):

Measures the rarity of a term across a collection of documents.
Identifies terms that are distinctive and relevant.
Relevance to Information Retrieval:

TF-IDF is used to rank and retrieve documents in information retrieval systems.
It helps identify documents where a query term is both frequent within the document (TF) and relatively rare across the document collection (IDF).
Impact on Retrieval Results:

Higher TF-IDF scores indicate greater relevance to a query.
Documents with terms matching the query’s TF-IDF weights more closely are ranked higher in search results.
TF-IDF improves the precision and relevance of search results in information retrieval.

139
Q

What is anchor text? How might anchor text be used to index a
page, when that page doesn’t contain the anchor text.

A

Anchor Text:

Definition: Anchor text is the clickable text in a hyperlink that provides context and describes the content of the linked page.
Indexing with Anchor Text:

Relevance: Anchor text from external pages can provide information about the linked page’s content even when it doesn’t contain the same text.

Link-Based Indexing: Search engines use anchor text from incoming links to index and categorize web pages.

Algorithmic Associations: Anchor text helps search engines associate keywords with pages, contributing to ranking and relevance determination.

Example: If multiple external pages link to a page with anchor text “best chocolate desserts,” search engines may associate that page with chocolate dessert-related queries, even if those exact words aren’t on the page.

140
Q

What is meant by the phrase Computational Creativity (CC)?
What theoretical problems does Computational Creativity
present in relation to the Turing Test?

A

Computational Creativity (CC):

Definition: Computational Creativity refers to the field of AI that focuses on developing systems capable of generating creative and novel outputs, often in art, music, writing, or problem-solving.
Theoretical Problems in Relation to Turing Test:

Subjectivity: Assessing creativity is subjective, making it challenging to define a clear criterion for passing the Turing Test.

Human Benchmark: The Turing Test originally aimed to distinguish between human and machine responses based on linguistic abilities, not creativity.

Creativity Evaluation: Turing Test evaluators may struggle to distinguish between genuinely creative responses and those generated algorithmically, blurring the line between creativity and automation.

141
Q

What is meant by the term search space, as used in relation to
problem solving? How do branching factor (b) and search
limit/depth (l) influence the size of a search space?

A

Search Space in Problem Solving:

Definition: The search space refers to the set of all possible states, configurations, or solutions that can be explored during a problem-solving process.
Influence of Branching Factor (b) and Search Depth (l):

Branching Factor (b):

Impact: Higher branching factor leads to a larger search space as there are more possible actions or choices at each state.
Example: In a chess game, b represents the number of possible moves at a given state.
Search Depth (l):

Impact: Increasing search depth expands the search space exponentially as it represents the maximum number of steps or levels in the search.
Example: In a maze, l represents the maximum number of steps a solver can take to reach a solution.
Overall: The size of the search space is influenced by both the branching factor and the search depth, making it crucial to manage these factors for efficient problem-solving.

142
Q

What is a heuristic? Detail a heuristic for just one of the
following:
(i) Chess
or else
(ii) Checkers/Draughts

A

Heuristic:

Definition: A heuristic is a problem-solving approach that employs practical and experience-based rules to guide decision-making, often providing quick, though not necessarily optimal, solutions.
Example (Chess Heuristic):

Piece Value Heuristic:
Idea: Assign values to chess pieces (e.g., pawns, knights, bishops) and use these values to evaluate board positions.
Implementation: Assign higher values to more powerful pieces (e.g., queen > rook > bishop > knight > pawn).
Evaluation: Evaluate a chess position by summing the values of the pieces for both players, providing an estimate of their relative strength.
Usage: Helps the player make tactical decisions by considering piece values when deciding which moves to make.
Note: This heuristic provides a simplified evaluation of board positions in chess, aiding players in making informed moves, but it doesn’t guarantee optimal gameplay.

143
Q

Briefly describe the A* search algorithm. You may use an
example to illustrate your answer.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

144
Q

What problem does a local maxima/minima and a plateau
present to a heuristic search process?

A

Local Maxima/Minima and Plateau in Heuristic Search:

Local Maxima/Minima:

Issue: Heuristic search may get stuck at a local maximum or minimum, unable to find the global optimum.
Challenge: The algorithm may terminate prematurely, settling for a suboptimal solution.
Solution: Techniques like simulated annealing or genetic algorithms are used to escape local extrema.
Plateau:

Issue: Plateaus are flat regions in the search space with no improvement in the objective function.
Challenge: Heuristic algorithms may struggle to explore the plateau efficiently, leading to slow convergence.
Solution: Introduce randomness or perturbation to help the algorithm navigate plateaus and continue the search.

145
Q
A

C with a value of 7.

146
Q

Briefly outline how  pruning improves the performance of
MiniMax search.

A

Alpha-Beta Pruning in MiniMax:

Objective: Alpha-beta pruning improves the performance of the MiniMax algorithm by reducing the number of nodes evaluated in the search tree.

Pruning Rule:

Nodes are assigned alpha (α) and beta (β) values to keep track of the best achievable values for the maximizing and minimizing players, respectively.
Efficiency Improvement:

Pruning occurs when a player discovers that a branch will not lead to a better outcome than the current best-known outcome (α-beta bounds).
Unnecessary branches are pruned, reducing the search space and computation time.
Benefits:

Significant reduction in the number of evaluated nodes.
Faster convergence to the optimal solution.
Improved efficiency in solving adversarial search problems.

147
Q

Define the terms precision and recall. In what way do they help
measure the accuracy of information retrieval?

A

Precision and Recall:

Precision: Precision measures the accuracy of retrieved results by calculating the ratio of relevant documents retrieved to the total number of retrieved documents.

Recall: Recall measures the comprehensiveness of retrieved results by calculating the ratio of relevant documents retrieved to the total number of relevant documents in the collection.

Measurement of Accuracy in Information Retrieval:

Precision: Helps assess how many of the retrieved documents are truly relevant, minimizing false positives.

Recall: Helps assess how many of the relevant documents were successfully retrieved, minimizing false negatives.

Combined Assessment: Precision and recall are often used together to provide a balanced view of retrieval system performance, with metrics like F1-score or the precision-recall curve.

148
Q

Why does the PageRank algorithm require multiple iterations to
determine a quality score for documents on the web? What is
the relevance of the term iterative convergence?

A

Multiple Iterations in PageRank:

Complexity of the Web: The web is vast, interconnected, and dynamic, making a single-pass calculation impractical.

Recursive Nature: PageRank relies on the iterative concept that the quality of a page depends on the quality of pages linking to it.

Iterative Convergence:

Definition: Iterative convergence in PageRank refers to the process of refining the PageRank scores in multiple iterations until they stabilize.

Relevance:

PageRank scores converge to a stable state, indicating the relative importance of web pages.
Iterative convergence ensures accuracy by considering the entire web’s link structure and dependencies.

149
Q

What is Computational Creativity (CC)? What are the main
characteristics of CC?

A

Computational Creativity (CC):

Definition: Computational Creativity is a field of AI that focuses on developing computer systems capable of generating creative and novel outputs, often in the domains of art, music, writing, or problem-solving.
Main Characteristics of CC:

Creativity Emulation: CC aims to emulate human-like creative processes and outcomes using algorithms and AI.

Novelty and Originality: CC systems generate outputs that are not direct copies but exhibit novelty and originality.

Problem-Solving: CC can extend beyond artistic domains to tackle problems in various domains by producing innovative solutions.

Automation and Assistance: CC systems can automate creative tasks or assist human creators in generating creative content.

Subjectivity: Creativity is subjective, making the evaluation of CC outputs a challenging aspect of the field.

150
Q

Outline a method for identifying the analogy (counterpart)
between two collections of information.

A

Method for Identifying Analogy:

Step 1: Data Representation:

Represent both collections of information as structured data, such as vectors or graphs, where each element has identifiable features.
Step 2: Feature Extraction:

Extract relevant features from the data to characterize each element or item in both collections.
Step 3: Similarity Measurement:

Measure the similarity or dissimilarity between elements in the two collections using appropriate similarity metrics (e.g., cosine similarity, Jaccard index).
Step 4: Pairwise Comparison:

Perform pairwise comparisons between elements in one collection and elements in the other.
Step 5: Analogy Identification:

Identify analogies by identifying pairs of elements that exhibit similar relationships or patterns across collections.
Example: In text analysis, identifying analogies between two sets of documents by comparing their content and relationships, looking for similar themes or patterns.

151
Q

Describe in detail how you would apply the Generate and Test
strategy to solve one of the following problems. In your answer,
clearly distinguish between the generation and testing parts of
your proposed solution.
The 8-tile puzzle OR ELSE Path planning for a mobile robot

A

Applying Generate and Test Strategy to the 8-Tile Puzzle:

Generation:

Method: Generate all possible moves (up, down, left, right) from the current puzzle state.
Implementation: For each move, create a new puzzle state by swapping the empty tile with an adjacent tile.
Result: Obtain a set of potential successor states (child nodes) from the current state.
Testing:

Method: Test each generated state to check if it matches the goal state, representing a solved puzzle.
Implementation: Compare the current state with the goal state to determine if they are identical.
Result: If a generated state matches the goal state, consider it a valid solution; otherwise, continue generating and testing states.
Iteration:

Iterate: Repeatedly apply the generation and testing process for each generated state until a valid solution is found or the search space is exhausted.
Heuristic (Optional): You can incorporate a heuristic function to prioritize the order of state generation, improving search efficiency.
Conclusion:

The Generate and Test strategy involves iteratively generating potential solutions (puzzle states) and testing them against a predefined goal state until a valid solution is identified, effectively solving the 8-tile puzzle.

152
Q

What problems are associated with Greedy Best First Search
(BFS)? Illustrate your answer with appropriate example(s).

A

Problems with Greedy Best First Search (GBFS):

Heuristic-Dependent: GBFS relies heavily on the quality of the heuristic function used. If the heuristic is not accurate, it may lead to suboptimal solutions.

Lack of Backtracking: GBFS does not backtrack; it always chooses the most promising path. This can lead to getting stuck in local minima/maxima.

Example: In a maze navigation, if the heuristic is distance to the goal, GBFS may choose a path that initially appears shorter but leads to a dead-end, while a longer but viable path exists.

Not Guaranteed Optimal: GBFS doesn’t guarantee finding the optimal solution as it’s often misled by the heuristic’s estimations.

Blind to Obstacles: GBFS can be blind to obstacles or complexities that are not considered by the heuristic, resulting in suboptimal routes.

153
Q

What is a heuristic? Describe in detail how you would implement
a heuristic function for one of the following problems. Clearly
describe how problem states are represented. Use an example to
illustrate your answer.
8-tile puzzle OR ELSE Checkers/Draughts

A

Heuristic:

Definition: A heuristic is a problem-solving approach that employs practical and experience-based rules to guide decision-making, often providing quick, though not necessarily optimal, solutions.
Heuristic for the 8-Tile Puzzle:

Problem Representation:

Represent the puzzle as a 3x3 grid with tiles numbered 1 to 8 and one empty tile.
Example: Initial state (numbers represent tiles, 0 represents an empty space):

Heuristic Function (Misplaced Tiles):

Idea: Count the number of tiles that are not in their goal positions (misplaced tiles).
Implementation: For each tile, compare its current position with the goal position.
Example: In the initial state, tiles 1, 2, 3, 4, 5, 6, 7, and 8 are misplaced, resulting in a heuristic value of 8.
Heuristic Evaluation:

The heuristic function returns a value indicating how far the current state is from the goal state.
It guides the search algorithm to prioritize states with fewer misplaced tiles.
Note: This heuristic provides a simplified evaluation of the 8-tile puzzle, aiding search algorithms in finding a solution efficiently but without guaranteeing optimality.

154
Q
A

B with a value of 4.

155
Q

Briefly describe how  (alpha-beta) pruning improves the
performance of MiniMax search. Using the search tree from part
(d) above, list the nodes would be pruned by  (alpha-beta)
pruning assuming a left-to-right development of the tree.

A

Alpha-Beta Pruning in MiniMax:

Objective: Alpha-beta pruning improves the performance of the MiniMax algorithm by reducing the number of nodes evaluated in the search tree.

Pruning Rule:

Nodes are assigned alpha (α) and beta (β) values to keep track of the best achievable values for the maximizing and minimizing players, respectively.
Pruning Process (Left-to-Right Development):

Assuming left-to-right development of the search tree, nodes that would be pruned are those where:
Beta (β) becomes less than or equal to Alpha (α) for a minimizing player (cut-off).
Alpha (α) becomes greater than or equal to Beta (β) for a maximizing player (cut-off).
Example: In the search tree, if a node’s beta (β) value is less than or equal to its parent’s alpha (α) value (for a minimizing player), or if a node’s alpha (α) value is greater than or equal to its parent’s beta (β) value (for a maximizing player), it would be pruned.

156
Q

What is a confusion matrix? From the following confusion matrix,
calculate the Precision and Recall of an information retrieval
system. State your answer as an expression, you do Not need to
calculate the final numeric value. Briefly state what these values
tell us about this Information Retrieval system.
12 (true positive) 4 (false positive)
3 (false negative) 50 (true negative)

A

Confusion Matrix:

TP (True Positives): 12
FP (False Positives): 4
FN (False Negatives): 3
TN (True Negatives): 50
Precision Expression:

Precision = TP / (TP + FP)
Recall Expression:

Recall = TP / (TP + FN)
Implication:

Precision measures the accuracy of retrieved results, indicating that out of the retrieved results, how many are relevant.
Recall measures the comprehensiveness of retrieved results, showing how many relevant results were successfully retrieved.
In this information retrieval system, Precision and Recall values would provide insights into its trade-off between accuracy and completeness.

157
Q

Write down the PageRank formula explaining each of its terms.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

158
Q

What is the significance of the Random Walk of a graph to
calculating a retrieval score for a document? What is the
relevance of iterative convergence to the process of calculating
the PageRank of pages on the internet?

A

Significance of Random Walk of a Graph:

Relevance to Retrieval Score: Random Walk models are used to estimate the importance or relevance of nodes (documents) in a graph (network). They help calculate retrieval scores by simulating the behavior of an imaginary user navigating through links, indicating the potential visibility and relevance of documents.
Iterative Convergence in PageRank:

Importance: Iterative convergence is essential in the PageRank algorithm for determining the relative importance of web pages. It ensures that PageRank scores stabilize, indicating the final rank of pages based on their link structure.

Relevance: PageRank’s iterative convergence process is crucial for accurately reflecting the dynamics of the web, where pages continually link to each other. It guarantees that the PageRank scores are representative of the web’s actual structure and importance.

159
Q

Calculate the TF-IDF for the words “this” and “another” given the
following corpus. Show the details of your calculation. You may
assume no stop-word removal or stemming will be performed.
Document1: This is the first example.
Document2: This is the second example in the corpus.

A

Term Frequency (TF) Calculation:

TF(“this”) in Document1 = 1 (occurs once in Document1)
TF(“this”) in Document2 = 1 (occurs once in Document2)
TF(“another”) in Document1 = 0 (does not occur in Document1)
TF(“another”) in Document2 = 0 (does not occur in Document2)
Document Frequency (DF) Calculation:

DF(“this”) = 2 (occurs in both Document1 and Document2)
DF(“another”) = 0 (does not occur in any document)
Inverse Document Frequency (IDF) Calculation:

IDF(“this”) = log(2/2) = 0 (logarithm base 2)
IDF(“another”) = log(2/0) = undefined (since it does not occur in any document)
TF-IDF Calculation:

TF-IDF(“this”) in Document1 = TF(“this”) * IDF(“this”) = 1 * 0 = 0
TF-IDF(“this”) in Document2 = TF(“this”) * IDF(“this”) = 1 * 0 = 0
TF-IDF(“another”) in Document1 = TF(“another”) * IDF(“another”) = 0 * undefined = undefined
TF-IDF(“another”) in Document2 = TF(“another”) * IDF(“another”) = 0 * undefined = undefined
Note:

TF-IDF values for “this” in both documents are 0 because it is not a distinguishing term.
TF-IDF values for “another” are undefined since it does not occur in any document (resulting in division by zero in IDF calculation).

160
Q

What is meant by the term search space as used in relation to
problem solving? How do branching factor (b) and search
depth/limit (l) influence the size of a search space?

A

Search Space in Problem Solving:

Definition: The search space refers to the entire set of possible states, configurations, or solutions that can be explored during a problem-solving process.
Influence of Branching Factor (b) and Search Depth/Limit (l) on Search Space:

Branching Factor (b):

Effect: A higher branching factor (b) increases the size of the search space because it implies more possible actions or choices at each state.
Example: In a game, if a player can make multiple moves in each turn, the branching factor is high, leading to a larger search space.
Search Depth/Limit (l):

Effect: The search depth or limit (l) defines how deep into the search space the algorithm explores. A deeper limit increases the portion of the search space considered.
Example: In a maze, setting a deeper search limit allows the algorithm to explore more paths, enlarging the search space.
Overall:

The search space’s size depends on the interaction between the branching factor (b) and the search depth or limit (l). A high branching factor and deep search depth can result in a significantly larger search space.

161
Q

What is a search heuristic? Describe, in detail, one specific
heuristic function for one of the following problems: Farmer-FoxGoose problem, 8 tile puzzle, OXO, Chess or Draughts.

A

Search Heuristic:

Definition: A search heuristic is a rule or function that provides guidance or estimates the desirability of different paths or states in a search space, helping search algorithms make informed decisions.
Heuristic for the 8-Tile Puzzle:

Problem: Solve the 8-tile puzzle where a 3x3 grid contains numbered tiles and one empty space, aiming to move tiles to reach the goal state.

Heuristic Function (Manhattan Distance):

Idea: Measure the total “Manhattan distance” of each tile from its current position to its goal position.
Implementation:
For each tile, calculate the sum of the horizontal and vertical distances between its current and goal positions.
Sum these distances for all tiles to obtain the heuristic value.
Purpose: This heuristic guides the search algorithm to prioritize moves that reduce the total Manhattan distance, aiming to reach the goal state efficiently in the 8-tile puzzle.

162
Q

Briefly describe the A* search algorithm. You may use an
example to illustrate your answer.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

163
Q

Describe how you might use the A* algorithm to plan the
navigation path for a real (or a simulated) mobile robot.

A

Problem: Plan a path for a mobile robot to navigate from a start location to a goal location in a real or simulated environment.

Implementation:

State Representation: Represent the robot’s current position and orientation as a state.
Search Space: Create a grid or graph representing the environment, with nodes representing possible robot states (positions and orientations).
Cost Function: Define a cost function that assigns costs to moving from one state to another, considering factors like distance, obstacles, terrain, and turns.
Heuristic Function: Use a heuristic function (e.g., Euclidean distance or Manhattan distance) to estimate the remaining cost from a state to the goal state.
A Algorithm:* Apply the A* algorithm to explore the search space, prioritizing states with lower total cost (g(n) + h(n)), where g(n) is the cost to reach the current state and h(n) is the estimated cost to reach the goal from the current state.
Open and Closed Lists: Maintain open and closed lists to keep track of explored and pending states.
Path Reconstruction: Once the goal state is reached, reconstruct the path from the start to the goal using parent pointers.
Execution: Implement the planned path for the robot’s navigation, considering real-world factors like sensors, obstacles, and dynamic environments.
Advantages:

A* efficiently finds optimal or near-optimal paths while considering both cost and heuristic.
It can handle complex environments with various obstacles and terrain.
Limitations:

A* requires accurate cost and heuristic functions for optimal results.
In dynamic environments, re-planning may be necessary to adapt to changing conditions.
Applications: A* is widely used in robotics for tasks like autonomous vehicle navigation, path planning for drones, and warehouse automation.

164
Q
A

D with a value of 7.

165
Q

Briefly describe how  pruning improves the performance of
MiniMax search. Make use of an example to illustrate your
answer.

A

Alpha-Beta Pruning in MiniMax:

Objective: Alpha-beta pruning is a technique that improves the performance of the MiniMax algorithm by reducing the number of nodes evaluated in the search tree.

Pruning Process:

It maintains two values, alpha (α) for the maximizing player and beta (β) for the minimizing player.
During the search, if a branch of the tree can be determined as less promising (for the maximizing player) or more promising (for the minimizing player) than the current best option, it can be pruned without further evaluation.
Illustration:

Consider a simplified example of a game tree where the maximum depth is limited to 2 levels:

   A (Max)
 /   \    B     C (Min)   / \     / \  D   E   F   G If at node B, the algorithm discovers that the minimum achievable score at node C (Min) is less than the maximum score obtained at node A (Max), it prunes node C and its sub-tree without evaluating F and G. This is because the minimizing player (Min) will choose a value less than what Max already has at A. Advantages:

Alpha-beta pruning reduces the number of nodes explored, significantly speeding up the MiniMax algorithm.
It doesn’t affect the final result; it only eliminates the evaluation of unnecessary branches.
Example Use Cases: Alpha-beta pruning is commonly applied in game-playing AI systems, such as chess and checkers, to efficiently search for the best move by minimizing unnecessary evaluations of potential game states.

166
Q

What is a term-document index? Describe the structure of a term
document index such as might be used for the WWW. In
particular, discuss some of the information that you might like
this index to contain.

A

Term-Document Index:

Definition: A term-document index is a data structure used in information retrieval systems to map terms (words or phrases) to the documents (web pages, articles, etc.) in which they appear.

Structure of a Web Term-Document Index:

Inverted Index: Typically, web search engines use an inverted index structure.
Key Components:
Term Dictionary: A dictionary or hash table storing unique terms as keys.
Posting Lists: For each term, a list of documents (or web pages) where the term appears.
Document Metadata: Information about each document, such as its URL, title, and publication date.
Term Frequencies: Optionally, the frequency of each term in each document.
Term Weights (TF-IDF): Calculated values indicating the importance of terms in documents.
Information in the Index:

Term Occurrences: The index should contain information about where each term occurs in documents to facilitate efficient retrieval.
Document Metadata: Document-related data like URL, title, and publication date for presenting search results.
Document Ranking Metrics: Metrics like TF-IDF scores or PageRank values to rank documents based on relevance.
Term Frequency: Optionally, the number of times a term appears in a document, useful for ranking.
Usefulness:

A comprehensive term-document index is vital for efficient and accurate web search. It allows search engines to quickly identify relevant documents and rank them based on relevance, improving the user’s search experience on the web.

167
Q

How would you go about the challenge of retrieving and indexing
the entire contents of the World Wide Web?

A

Retrieving and Indexing the Entire WWW:

Crawling: To retrieve web content, use web crawlers (spiders or bots) to systematically navigate the web, visiting web pages, and collecting their content.

Indexing:

Document Parsing: Parse the crawled web pages to extract text, links, and metadata (e.g., title, date).
Text Preprocessing: Process the text data, including tokenization, stemming, and removing stop words.
Inverted Index: Create an inverted index that maps terms to documents and stores document-related information.
PageRank: Calculate PageRank scores to assess the importance of web pages.
Storage: Store the indexed data in a distributed database or file system to handle the vast amount of information efficiently.

Deduplication: Detect and remove duplicate or near-duplicate content to ensure the index is concise.

Update Cycle: Regularly re-crawl and update the index to keep it current with the dynamic web.

Scaling: Use distributed computing and parallel processing to scale the system for the enormous amount of data on the web.

Robots.txt: Respect robots.txt files to follow website owners’ guidelines and avoid overloading servers.

Ethical Considerations: Adhere to ethical guidelines and respect privacy, ensuring compliance with legal requirements.

Challenges:

The challenges include dealing with dynamic content, handling diverse languages, avoiding spam, and addressing ethical concerns, among others.

168
Q

Write down the PageRank formula, explaining each of its terms.
Briefly comment on the relevance of iterative convergence to the
calculations of this formula.

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

169
Q

Briefly outline the K Nearest Neighbours (KNN) algorithm. How
this might be used to predict the value of a second hand (ie preowned) car, given a large database of recent second hand car
sales.

A

K Nearest Neighbors (KNN) Algorithm:

Algorithm Overview: KNN is a supervised machine learning algorithm used for classification and regression tasks. It works based on the principle that similar data points tend to have similar outcomes.

Prediction Process:

Select a value for K (the number of nearest neighbors to consider).
Calculate the distance between the target data point (used car features) and all data points in the database.
Identify the K data points with the smallest distances.
For regression (predicting numerical values), take the average of the target values of the K neighbors as the prediction.
Using KNN for Car Price Prediction:

Data Preparation: Collect a dataset of recent second-hand car sales containing features like make, model, year, mileage, condition, etc., and the sale price.
Feature Standardization: Normalize or standardize numerical features to ensure they have the same scale.
Choosing K: Experiment with different values of K to find the most suitable value for the problem (e.g., K=5).
Prediction: Given the features of a specific used car, calculate the distance to all cars in the dataset.
K Neighbors: Identify the 5 nearest cars in the dataset based on feature similarity.
Price Prediction: Take the average sale price of these 5 neighbors as the predicted value for the target car’s price.
Evaluation: Evaluate the model’s performance using metrics like mean squared error or root mean squared error.
Advantages:

KNN is simple to implement and doesn’t require model training.
It can provide reasonable predictions when similar instances have similar outcomes.
Limitations:

Performance may degrade with high-dimensional data.
Selection of the appropriate value for K is crucial.
KNN can be sensitive to outliers and noise in the dataset.

170
Q

How does heuristic search improve upon blind search as a
means of solving complex problems? Make use of an example
to illustrate your answer.

A

Heuristic Search vs. Blind Search:

Definition:

Blind Search: Blind search methods explore the search space without considering problem-specific information.
Heuristic Search: Heuristic search methods use problem-specific knowledge (heuristics) to guide the search towards more promising solutions.
Improvements of Heuristic Search:

Efficiency: Heuristic search is often more efficient as it reduces the search space and focuses on promising regions.

Example - Traveling Salesman Problem (TSP):

Blind Search: In the TSP, a blind search like brute force would explore all possible permutations of cities, resulting in factorial complexity (n! permutations for n cities).
Heuristic Search: Using heuristics like the nearest neighbor algorithm, the search starts from a city and selects the closest unvisited city at each step. This reduces the search space dramatically, providing a good approximation to the optimal solution with much less computation.
Completeness: Blind searches guarantee finding a solution (if one exists), but they might take an impractical amount of time. Heuristic searches might not guarantee optimality but provide solutions faster in practice.

Resource Efficiency: Blind searches may consume significant memory and computational resources, especially in large search spaces. Heuristic searches are more resource-efficient due to the reduction in explored states.

Example - Chess:

Blind Search: In chess, an exhaustive blind search would explore all possible moves and counter-moves to find the optimal move, which is infeasible.
Heuristic Search: Chess engines use heuristics like piece values, board control, and positional evaluation to guide the search towards promising moves, allowing them to make strong moves efficiently.
Summary:

Heuristic search improves problem-solving by focusing on relevant portions of the search space, making it more practical for solving complex problems compared to blind searches.

171
Q

How can heuristic search help solve problems such as the 15-
tile puzzle? Briefly outline the heuristic function you would use in
solving the 15-tile puzzle.

A

Heuristic Search for 15-Tile Puzzle:

Problem: The 15-tile puzzle involves arranging 15 numbered tiles in a 4x4 grid, leaving one empty space, to achieve a target configuration.

Heuristic Function: A commonly used heuristic is the Manhattan distance heuristic.

It calculates the sum of the Manhattan distances of each tile from its current position to its goal position.
The Manhattan distance is the sum of the horizontal and vertical distances between two positions.
Advantages:

The Manhattan distance heuristic provides an estimate of the minimum number of moves required to reach the goal.
It guides the search towards states that are closer to the solution, reducing the search space.
Example: Consider the initial state and goal state for a 15-tile puzzle:

Initial State: Goal State:
1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
9 10 11 12 9 10 11 12
13 14 15 13 15 14
Using the Manhattan distance heuristic, the initial state has a heuristic value of 5 (distance of 15 from goal position), guiding the search towards states that reduce this distance.

172
Q

Consider a mobile robot navigating to rooms and corridors of a
single-story building. What kinds of problem is the repeated
states anomaly likely to produce, when searching for an optimal
path to the exit of the building? How might A* maintain an
optimal search space when seeking optimal solutions to this
problem?

A

Repeated States Anomaly in Mobile Robot Navigation:

Problem: The repeated states anomaly occurs when the same state is revisited during a search, leading to inefficient exploration and potentially infinite loops.

Issues it May Produce:

In mobile robot navigation, the anomaly can lead to the robot revisiting rooms or corridors unnecessarily, wasting time and energy.
It may prevent the robot from finding an optimal path to the exit, as it keeps revisiting states instead of exploring new, potentially more promising paths.
A Algorithm for Maintaining Optimal Search Space:*

A Algorithm:* A* combines uniform cost search with heuristic-based information to maintain an optimal search space efficiently.

Admissibility: A* ensures that it explores nodes in order of increasing cost, guaranteeing optimality when using an admissible heuristic.

Heuristic Function: A* uses a heuristic function to estimate the cost from the current state to the goal. The heuristic is admissible if it never overestimates the true cost.

Maintaining Optimality:

By using an admissible heuristic, A* avoids revisiting states unnecessarily.
It prioritizes exploring nodes that are likely to lead to optimal solutions, reducing the chances of falling into the repeated states anomaly.
A* efficiently maintains an optimal search space by selecting nodes with the lowest estimated cost to the goal while considering the actual cost from the start.
Conclusion: A* with an admissible heuristic helps mobile robots navigate single-story buildings optimally by avoiding the repeated states anomaly and efficiently finding the best path to the exit.

173
Q
A

D with a value of 3.

174
Q

Describe how (alpha-beta) pruning improves upon MiniMax
search? You may use an example to illustrate your answer.

A

Alpha-Beta Pruning in MiniMax:

Alpha-Beta Pruning: Alpha-beta pruning is a technique used to optimize the MiniMax algorithm by reducing the number of nodes evaluated in the search tree.

Improvement Mechanism:

It maintains two values, alpha (the best value found so far for the maximizing player) and beta (the best value found so far for the minimizing player).
During the search, when it is discovered that a branch (subtree) can be pruned because it cannot lead to a better outcome than what has already been found, the algorithm prunes that branch.
This happens when alpha (for the maximizing player) is greater than or equal to beta (for the minimizing player) for a given node, indicating that the opponent has a better move elsewhere.
Illustration:

Consider a simple game tree with MiniMax and alpha-beta pruning:

Max
/  \    A    B   / \  / \  3   6 2  8 In this example, at node A, the maximizing player has already found a value of 3, and at node B, the minimizing player has already found a value of 2. Since A's value is not less than B's value (3 >= 2), we can prune the subtree rooted at B because the minimizing player will never choose B. Thus, we avoid evaluating the nodes 2 and 8. Advantages:

Alpha-beta pruning significantly reduces the number of nodes examined during the search, leading to faster and more efficient computations.
It does not affect the correctness of the MiniMax algorithm and guarantees the same optimal result when used with appropriate heuristics.
Summary: Alpha-beta pruning improves MiniMax search by eliminating unnecessary node evaluations, making it more efficient while preserving the optimal outcome.

175
Q

What is mean by the term a “linear decision boundary”? In what
way can we see simple neurons as learning the decision
boundary for simple functions such as AND and OR. What
problem does the XOR function present for a linear decision
boundary?

A

Linear Decision Boundary:

A “linear decision boundary” is a boundary that separates data points of different classes in a dataset in a linear (straight-line) fashion in a feature space.
It means that you can separate the data with a single straight line or hyperplane.
Neurons and Learning Decision Boundaries:

Simple artificial neurons can learn decision boundaries for functions like AND and OR by adjusting their weights and biases.
For example, in the case of the AND function, a neuron can learn to assign higher weights to inputs that are required to be true, and lower weights to inputs that are not needed.
XOR Function and Linear Decision Boundary:

The XOR function presents a problem for linear decision boundaries.
It is not possible to separate XOR inputs (0,0), (0,1), (1,0), and (1,1) using a single straight line. XOR’s decision boundary is non-linear and requires more complex models, such as multi-layer neural networks, to capture it.
Conclusion:

Simple neurons can learn linear decision boundaries for functions like AND and OR, but they are inadequate for capturing non-linear decision boundaries like the one required for XOR. More complex models are needed for problems with non-linear separations.

176
Q

Write down the PageRank formula explaining each of its terms

A

The PageRank of a page is the chance that
a random surfer will land on that page.

PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.

(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.

d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.

PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.

C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.

The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.

177
Q

What is WordNet and what kind of information does it contain?
How can it be used to estimate the semantic similarity between
word pairs? Use examples to illustrate your answer.

A

WordNet Overview:

WordNet is a large lexical database of English.
Organizes words into sets of synonyms called synsets.
Provides short, general definitions and usage examples.
Captures word relationships like antonyms, hyponyms, and meronyms.
Information Contained:

Synonyms: Groups words with similar meanings.
Definitions: Offers concise explanations of word meanings.
Relationships: Includes hierarchical (e.g., hyponyms/hypernyms) and lexical relations (e.g., antonyms).
Part-of-speech: Categorizes words into nouns, verbs, adjectives, etc.
Estimating Semantic Similarity:

Methodology:
Utilizes the structure of WordNet, particularly the connections between synsets.
Employs algorithms to measure the ‘distance’ between words in the network.
Path-based Approach:
Example: ‘Dog’ and ‘Canine’.
Measures the shortest path in the WordNet graph between synsets.
Shorter paths imply higher similarity.
Information-content Approach:
Example: ‘Dog’ and ‘Animal’.
Considers how informative a common ancestor is in the WordNet hierarchy.
More abstract common ancestors indicate lower similarity.
Feature-based Approach:
Example: ‘Dog’ and ‘Wolf’.
Compares features or properties inferred from WordNet definitions and relationships.
Similar features suggest higher similarity.

178
Q

What is a knowledge graph? Describe one process for creating
knowledge graphs as a means of representing natural language
documents.

A

Knowledge Graph Overview:

A knowledge graph is a structured way to represent relationships between entities (like people, places, things) in a network format.
Consists of nodes (entities) and edges (relationships) with attributes for each.
Facilitates understanding complex relationships and discovering insights from large datasets.
Creating Knowledge Graphs from Natural Language Documents:

Step 1: Entity Recognition:
Identify and categorize key entities in the text (e.g., names, places, organizations).
Use Natural Language Processing (NLP) tools for this task.
Step 2: Relationship Extraction:
Detect and define relationships between identified entities.
Apply NLP techniques like dependency parsing or pattern matching.
Step 3: Graph Construction:
Represent entities as nodes and relationships as edges in the graph.
Assign attributes to nodes and edges based on document information.
Step 4: Integration and Normalization:
Merge data from different sources, resolving duplicates and inconsistencies.
Normalize entities and relationships for uniformity (e.g., different names for the same entity).
Step 5: Continuous Updating:
Regularly update the graph with new data to maintain relevance and accuracy.
Implement machine learning algorithms for automated updates and refinement.
Example Process: Semantic Analysis:

Extract semantic structures from text (like subject-action-object triples).
Use these structures to create a network where subjects and objects are nodes, and actions are edges.
This can reveal patterns and relationships not immediately apparent in the raw text.

179
Q

What are the main differences between lazy and eager learning?

A

Lazy Learning:

Data Usage: Utilizes data at the time of prediction.
Model Training: Minimal or no training phase; algorithms make generalizations about the data at query time.
Memory Requirement: High, as it needs to store all data.
Computation Time: Slower during query time due to real-time data processing.
Example Algorithms: k-Nearest Neighbors (k-NN), Case-Based Reasoning.
Adaptability: Easily adaptable to new data as there’s no explicit training phase.
Suitability: Effective for problems with smaller datasets and where the data distribution is unknown.
Eager Learning:

Data Usage: Uses data during the training phase to build a model.
Model Training: Involves a comprehensive training phase to create a final model.
Memory Requirement: Lower, as it discards training data post-model construction.
Computation Time: Faster during query time but requires initial time investment for training.
Example Algorithms: Decision Trees, Neural Networks, Support Vector Machines.
Adaptability: Less flexible to new data; often requires retraining for significant updates.
Suitability: Better for large datasets and scenarios where a model’s predictive accuracy is paramoun

180
Q

What is K Nearest Neighbors (KNN)? How can KNN be applied
to approximate the value of a complex function? What is the
relevance of the “curse of dimensionality” to the use of the KNN?

A

K Nearest Neighbors (KNN) Overview:

A simple, instance-based learning algorithm.
Classifies a new sample based on the majority class of its ‘k’ nearest neighbors in the training set.
Works for both classification (discrete output) and regression (continuous output).
Approximating Complex Functions with KNN:

Step 1: Select ‘k’ Value:
Choose the number of nearest neighbors (‘k’) to consider.
Step 2: Distance Measurement:
Measure the distance (e.g., Euclidean) between the new sample and all points in the training set.
Step 3: Identify Nearest Neighbors:
Select the ‘k’ closest points in the training set to the new sample.
Step 4: Aggregate Neighbors’ Values:
For regression, average the values of these ‘k’ neighbors.
For classification, take the majority class among the ‘k’ neighbors.
Step 5: Prediction:
The aggregated value approximates the output for the new sample.
Relevance of “Curse of Dimensionality”:

Increased Complexity: As the number of dimensions (features) increases, the volume of the space increases, making the data sparse.
Distance Distortion: In high-dimensional spaces, the concept of ‘nearest neighbors’ becomes less meaningful, as all points are far from each other.
Overfitting Risk: Higher dimensions can lead to overfitting, where the model captures noise instead of the underlying pattern.
Computational Demand: Computationally more expensive to calculate distances in higher dimensions.
Solution: Dimensionality reduction techniques are often applied before using KNN in high-dimensional spaces.

181
Q

How does distance weighted KNN improve upon the basic KNN
approach? Use a diagram to help illustrate your answer.

A

Distance Weighted KNN - Improvements Over Basic KNN:

Weighted Voting: Unlike basic KNN where each of the ‘k’ neighbors has equal weight, distance weighted KNN assigns more weight to nearer neighbors.
Enhanced Accuracy: Closer points are likely more similar to the query point, thus providing more relevant information for prediction.
Reduction in Influence of Outliers: Distant points (which could be outliers) have less influence on the outcome.
Adaptive Boundaries: Creates more adaptive decision boundaries, especially in areas where classes overlap.
Weight Calculation: Weights can be inversely proportional to distance, emphasizing closer points more significantly.

182
Q

What is Word2Vec? Briefly outline one process by which
Word2Vec creates vector representations and what those
vectors represent.

A

Word2Vec Overview:

A popular model for word embeddings in natural language processing.
Represents words in a high-dimensional vector space.
Captures semantic and syntactic word relationships.
Word2Vec Vector Creation Process:

Training Algorithm Choice:
Choose between Continuous Bag of Words (CBOW) or Skip-Gram model.
CBOW Model:
Predicts a target word based on context words.
Suitable for smaller datasets and capturing more frequent words.
Skip-Gram Model:
Predicts context words from a target word.
Effective for larger datasets and capturing rare words or phrases.
Neural Network Training:
Use a shallow neural network.
Train the model on text data; adjust weights through backpropagation.
Vector Extraction:
After training, extract word vectors from the neural network’s hidden layer.
What Vectors Represent:

Each word is represented as a dense vector in a multi-dimensional space.
Similar words have similar vectors, showing up closer in the vector space.
Vectors encode various linguistic patterns and word associations.

183
Q

How can we usefully combine the vectors created by
Word2Vec? Make use of an example to illustrate your answer.

A

Combining Word2Vec Vectors:

Vector Arithmetic: Perform mathematical operations like addition, subtraction on word vectors.
Semantic Relationships: Capture semantic relationships and analogies.
Average of Vectors: For phrases or sentences, average the vectors of individual words.
Example: Vector Arithmetic for Analogies:

Problem: “King” is to “Man” as “Queen” is to what?
Vectors: Use Word2Vec vectors for ‘King’, ‘Man’, ‘Queen’.
Operation: Compute ‘King’ - ‘Man’ + ‘Queen’.
Result: The resulting vector is closest to the vector for ‘Woman’.
Interpretation: This demonstrates how relationships (e.g., gender roles) can be captured and manipulated in the vector space.
Example: Averaging for Sentence Meaning:

Sentence: “The quick brown fox”.
Vectors: Retrieve Word2Vec vectors for ‘The’, ‘quick’, ‘brown’, ‘fox’.
Operation: Calculate the average of these vectors.
Result: The average vector represents the overall semantic meaning of the sentence.
Usage: Useful in document classification, sentiment analysis, etc., where sentence-level representation is needed.

184
Q

What is a heuristic? How does heuristic search improve upon
ordinary search? Use an appropriate example to illustrate your
answer.

A

Heuristic Definition:

A heuristic is a problem-solving approach that uses practical methods or shortcuts to produce solutions that are not guaranteed to be optimal, but are sufficient for reaching immediate goals.
Involves using experience-based techniques to enhance problem-solving efficiency.
Heuristic Search vs. Ordinary Search:

Ordinary Search:
Explores all possibilities equally without any prioritization.
Can be exhaustive and time-consuming.
Heuristic Search:
Employs heuristics to prioritize exploration of the most promising paths.
Reduces search time by avoiding less promising paths.
Balances between exploration and exploitation.
Example: Pathfinding in Navigation:

Problem: Finding the shortest path from point A to B in a city.
Ordinary Search:
Might evaluate every possible route equally.
Time-consuming and inefficient, especially in large, complex maps.
*Heuristic Search (e.g., A Algorithm)**:
Uses heuristics like distance to the destination (straight-line distance) to prioritize which routes to explore.
Focuses on routes that appear to lead more directly to the destination, reducing the total number of routes evaluated.
More efficient, reaching the solution faster than the ordinary search.

185
Q

Detail a heuristic for one of the following: Chess or else
Checkers/Draughts. Describe the different qualities you would
look for and how you would identify them. You may use a
diagram to illustrate your answer.

A

Heuristic for Chess:

Material Count:
Assign a value to each piece (e.g., Pawn = 1, Knight/Bishop = 3, Rook = 5, Queen = 9).
Calculate the total material value for each player.
Generally, more material equates to a stronger position.
Board Control:
Evaluate the number of squares controlled by each piece.
Control of the center is particularly important.
Piece Mobility:
Assess how freely pieces can move.
More mobility typically indicates a stronger position.
King Safety:
Evaluate the safety of the king (e.g., is it castled? Are there pawns shielding it?).
A vulnerable king can be a critical weakness.
Pawn Structure:
Look for pawn weaknesses like isolated, doubled, or backward pawns.
Strong pawn formations can provide significant strategic advantages.
Identifying Qualities:

Use Algorithms: Implement algorithms to evaluate these factors during gameplay.
Scoring System: Develop a scoring system for positions based on these criteria.
Dynamic Evaluation: Continuously assess the board for changes in these qualities.
Illustrating Heuristic with a Diagram:

A chessboard setup highlighting these factors: material count, board control, piece mobility, king safety, and pawn structure.
Let’s create a diagram to illustrate a typical chessboard setup with these heuristic elements highlighted.

186
Q

In general terms, describe how the A* algorithm ensures that it
only ever considers optimal paths and solutions. Use an
appropriate example to illustrate your answer.

A

A Algorithm Overview*:

A* is a pathfinding and graph traversal algorithm.
Combines features of Dijkstra’s Algorithm (uniform cost search) and Greedy Best-First-Search.
Optimizes for the shortest path while reducing the explored search space.
Working of A Algorithm*:

Heuristic Function (h):
Estimates the cost from a node to the goal.
Should be admissible (never overestimates the true cost).
Cost Function (g):
Represents the exact cost from the start node to the current node.
F-Score (f = g + h):
Sum of g and h, determining the node to explore next.
Balances between path cost and heuristic estimate.
Priority Queue:
Nodes are stored in a priority queue based on their f-score.
Ensures that the node with the lowest f-score is explored next.
Ensuring Optimal Paths:

Admissible Heuristic: Prevents overlooking the shortest path.
Complete Search: Explores all viable paths within its search space.
Best Path First: Continuously chooses the path that appears to be leading most directly to the goal.
Example: Pathfinding in a Grid:

Problem: Find the shortest path from point A to B in a grid with obstacles.
Implementation:
Use A* where each grid cell is a node.
Heuristic h could be the Euclidean distance to the goal.
g is the cost from the start to the current cell.
f-score guides the search, prioritizing cells closer to the goal and with lower overall path cost.
Result: A* efficiently finds the optimal path by balancing between the shortest distance and the least cost path, avoiding unnecessary exploration.

187
Q
A

D with a value of 2.

188
Q

Consider the task of searching for the shortest path between two
nodes in a graph. How large is the search space, assuming a
branching factor of B (eg 12) and a search depth of D (eg 6).
Briefly explain why bi-directional search is more efficient than
unidirectional search.

A

Search Space Size:

With a branching factor

B (e.g., 12) and a search depth

D (e.g., 6), the size of the search space is


B
D
.
For

=
12
B=12 and

=
6
D=6, the search space size is
1
2
6
12
6
.
Calculation of Search Space Size:

At each level of depth, every node branches into

B new nodes.
The total number of nodes is the sum of nodes at each level.
Bi-directional Search Efficiency:

Two Simultaneous Searches: Conducts two searches concurrently, one from the start node and one from the goal node.
Reduced Search Depth: Each search only needs to go half the distance, effectively halving the search depth.
Intersection Point: The searches meet in the middle, which usually occurs before exploring the full depth.
Search Space Reduction: Drastically reduces the total number of nodes explored compared to unidirectional search.
Example:

In a graph, bi-directional search would initiate two searches, one from each end, and these searches are more likely to intersect quickly than a single search reaching all the way from start to goal. This approach is particularly beneficial in large or complex graphs.
Let’s calculate the size of the search space for the given branching factor and depth.

The size of the search space, given a branching factor of 12 (B) and a search depth of 6 (D), is 2,985,984. This demonstrates the exponential growth of the search space with increasing depth and branching factor, which underscores the efficiency of bi-directional search in reducing the search space by simultaneously searching from both start and goal nodes

189
Q

Consider a single neuron such as a perceptron. How can this
learn simple functions such as AND and OR. Briefly explain why
a single Perceptron can it not learn the XOR function?

A

Learning AND and OR with a Single Perceptron:

Structure: A perceptron has inputs, weights, a bias, and an activation function.
AND Function:
Inputs: Two binary values (0 or 1).
Weights and Bias: Set such that the perceptron activates (outputs 1) only when both inputs are 1.
Example: Weights = [1, 1], Bias = -1.5 (only activates if the sum of inputs is greater than 1.5).
OR Function:
Similar setup, but with different weights and bias.
Example: Weights = [1, 1], Bias = -0.5 (activates if at least one input is 1).
Inability to Learn XOR with a Single Perceptron:

XOR Function: Outputs 1 if the inputs are different, 0 if they are the same.
Linear Separability: AND and OR are linearly separable - can be separated with a single line.
XOR’s Non-Linearity: XOR is not linearly separable; it cannot be represented by a straight line in 2D space.
Perceptron’s Limitation: A single perceptron can only learn linearly separable functions. It cannot model the XOR function, which requires a non-linear solution.
To illustrate, let’s visualize the AND, OR, and XOR functions in a 2D space, showing how XOR’s complexity cannot be captured by a single line.

190
Q

How does Deep Learning improve upon the learning capable of
a single neuron? For a deep neural network trained to recognize
faces, what differences might we expect to see in the early
layers (close to the input image), as compared to the later
layers? You may use a diagram to illustrate your answer.

A

Improvements of Deep Learning Over Single Neuron:

Complex Function Modeling: Deep networks can model complex, non-linear functions, unlike single neurons.
Layered Structure: Multiple layers enable hierarchical feature extraction.
Feature Abstraction: Early layers capture basic features, while deeper layers combine them into more abstract representations.
Adaptability: More flexible and capable of learning a wide variety of patterns.
Differences in Layers in a Face Recognition Network:

Early Layers (Close to Input):
Detect simple, low-level features like edges, corners, and colors.
Similar to basic image processing filters.
Intermediate Layers:
Begin to combine basic features into more complex ones like textures and patterns.
Start to recognize parts of faces, like eyes, noses, or mouths.
Later Layers:
Capture high-level features and complex representations.
Recognize entire faces, facial expressions, or unique identifiers.
Perform more abstract tasks like identifying individuals or expressions.
Diagram Illustration:

A diagram showing a deep neural network for face recognition.
Illustrate the progression from simple feature detection in early layers to complex, abstract feature recognition in later layers.

191
Q

Describe tf-idf explaining each of its components.
What limitations arise from using tf-idf as the only basis for
ranking documents?

A

TF-IDF (Term Frequency-Inverse Document Frequency) has two components:

Term Frequency (TF): Measures how often a term appears in a document.
Inverse Document Frequency (IDF): Measures the uniqueness of a term across all documents.
Limitations of using TF-IDF alone:

No semantic understanding.
All terms are treated equally.
Ignores word order and structure.
Sensitive to document length.
Handling of stop words.
Scalability challenges.
Limited document context understanding.
More advanced techniques are needed to address these limitations effectively.

192
Q

What is WordNet and what kind of information does it represent?
Using a suitable example, show how it is used to represent the
meaning of words. Briefly describe how it can be used to
quantify the similarity between two words.

A

WordNet is a lexical database for the English language. Here are the key points about it:

Nature of WordNet: It’s a large, English-language database that groups nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms (synsets), each expressing a distinct concept.

Synsets: These are the fundamental units of WordNet, representing a unique concept. Synonyms are grouped into synsets with a short definition and examples.

Word Relations: WordNet links words into semantic relations including synonyms, antonyms, hypernyms (more general), hyponyms (more specific), meronyms (parts), and holonyms (wholes).

Example Usage:

Word: “Bank”
Synsets: It might be part of multiple synsets representing different meanings like ‘financial institution’ and ‘riverbank’.
Each synset includes a definition and examples, showing how the word “bank” is used in different contexts.
Quantifying Similarity:

Path-Based Similarity: Measures the distance between two synsets in the WordNet graph. Shorter paths imply greater similarity.
Information Content Similarity: Uses the information content of the Least Common Subsumer (LCS) - the most specific ancestor node common to both synsets. Higher information content of LCS indicates greater similarity.
Applications: Used in natural language processing tasks to find semantic similarities between words, aiding in text analysis, AI, and machine learning.
WordNet’s structured approach allows it to represent intricate relationships between words, enhancing understanding of word meanings and their interconnections.

193
Q

Describe how the nearest neighbours algorithm uses past
solutions to solve new problems?

A

The Nearest Neighbors algorithm, often used in machine learning, solves new problems by referencing past solutions. Here are the key points:

Based on Past Data: It uses a database of past cases or solutions.

Feature Space Representation: Each case is represented in a feature space based on its characteristics.

New Problem Representation: A new problem is also represented in the same feature space.

Distance Metric: The algorithm calculates the distance between the new problem and each of the past cases in the feature space. Common metrics include Euclidean, Manhattan, or Hamming distances.

Identifying Nearest Neighbors: It identifies the ‘k’ closest past cases to the new problem, where ‘k’ is a predefined number.

Decision Making:

For classification tasks, the most common class among the nearest neighbors is assigned to the new problem.
For regression tasks, an average or weighted average of the neighbors’ values is calculated.
Adaptability: The algorithm can adapt to new data over time as the database of past solutions grows.

Dependence on Relevant Data: Its effectiveness depends on having a comprehensive and relevant set of past data.

Simple yet Effective: The method is straightforward but can be very effective in scenarios where similar past cases are good predictors of new cases.

194
Q

Outline the use of KNN to estimate the sale value of a 2nd hand
(pre-owned) car. You may assume you have access to a
database of recent car sales. List and briefly describe the
attributes you would to identify the most similar car(s) from a
database of recent sales?

A

Using KNN (K-Nearest Neighbors) to estimate the sale value of a pre-owned car involves comparing the car in question to similar cars in a database of recent sales. Here’s an outline of the process with relevant attributes:

Database of Car Sales: Assume access to a comprehensive database of recent car sales, including various attributes of each car.

Select Relevant Attributes: Key attributes might include:

Make and Model: Identifies the brand and specific model, which heavily influences value.
Year of Manufacture: Older cars typically have lower value.
Mileage: Higher mileage often correlates with lower value.
Engine Size: Can affect performance and value.
Fuel Type: Diesel, petrol, electric, or hybrid, which might influence value based on current trends and efficiency.
Condition: Overall state of the car (e.g., excellent, good, fair).
Previous Owners: Number of previous owners might impact perceived value.
Service History: Regular maintenance can preserve value.
Color: Some colors might be more desirable than others.
Extras: Features like leather seats, navigation systems, etc., can add value.
Feature Space Representation: Each car in the database is represented in a feature space based on these attributes.

Distance Calculation: For the car being valued, calculate the distance to each car in the database within this feature space.

Selecting Nearest Neighbors: Choose a number ‘k’ and identify the ‘k’ closest cars in the database to the car being valued.

Estimate Value: Average or otherwise statistically analyze the sale values of these ‘k’ nearest neighbors to estimate the value of the car in question.

Adjusting ‘k’: The choice of ‘k’ can be adjusted based on the dataset size and diversity; a larger ‘k’ smooths out anomalies but might reduce specificity.

Normalization: Attributes might need to be normalized to ensure that no single attribute disproportionately affects the distance calculation.

This method leverages the principle that cars with similar attributes are likely to have similar market values, providing a data-driven estimate for the value of a pre-owned car.

195
Q

How does distance weighted KNN improve upon the ordinary
KNN technique. Outline how the resulting value can be
calculated.

A

Distance Weighted KNN is an enhancement of the standard KNN (K-Nearest Neighbors) technique. Here’s how it improves upon ordinary KNN and the method of calculating the resulting value:

Weighting by Distance: In distance weighted KNN, closer neighbors are given more importance than farther ones. This is based on the assumption that nearer neighbors are more similar than distant ones.

Improvement Over Standard KNN:

Reduces the impact of irrelevant or noisy data points.
Provides a more nuanced and accurate prediction, especially in cases where neighbors are at varying distances.
Calculating the Resulting Value:

Weight Assignment: Assign a weight to each of the ‘k’ nearest neighbors. Commonly, the weight is inversely proportional to the distance. For example, weight can be 1/distance or 1/(distance^2).
Weighted Sum or Average:
For classification, the class of each neighbor is multiplied by its weight, and the class with the highest weighted sum is chosen.
For regression, multiply each neighbor’s value by its weight, sum these values, and then divide by the sum of the weights for a weighted average.
Normalization of Distance: It’s often important to normalize distances to ensure fair weighting, especially when dealing with variables of different scales.

Handling Zero Distance: Careful handling is required in cases where the distance is zero, to avoid division by zero errors. This can be managed by assigning a very high weight to zero-distance neighbors.

Parameter Tuning: The method of calculating weight and the choice of ‘k’ are crucial parameters that may require tuning for optimal performance.

Complexity: This approach is computationally more complex than simple KNN, but the trade-off is typically more accurate and reliable predictions.

Distance weighted KNN thus offers a more refined approach by factoring in the actual distances to neighbors, leading to potentially more accurate predictions than the standard KNN approach

196
Q

How does Word2Vec use a neural network to create vector
representations? What are the main differences between the two
approaches used to create Word2Vec representation?

A

Word2Vec is a technique for producing vector representations of words using a neural network. Here’s an outline of how it works and the differences between its two main approaches, Continuous Bag of Words (CBOW) and Skip-Gram:

Neural Network Foundation: Word2Vec uses a shallow neural network to learn word associations from a large corpus of text.

Purpose: The goal is to produce high-quality, dense vector representations for words, capturing semantic meanings and relationships.

Vector Representations: Each word in the vocabulary is represented as a unique vector in a multidimensional space. Words with similar meanings are positioned closer together in this space.

Two Main Approaches: Word2Vec uses two architectures - CBOW and Skip-Gram.

Continuous Bag of Words (CBOW):

Predicts Target Word: Tries to predict a target word based on context words.
Input and Output: Takes multiple context words as input and predicts a single target word.
Efficiency: Generally faster and has better representations for more frequent words.
Skip-Gram:

Predicts Context Words: Does the opposite of CBOW - predicts context words from a target word.
Input and Output: Takes a single word as input and predicts surrounding context words.
Performance: Tends to perform better with infrequent words or smaller datasets.
Training Process:

Both models involve feeding word pairs (based on their context) into the neural network.
The network is trained using these pairs, adjusting the vector representations to reduce the prediction error.
Optimization Techniques: Techniques like Hierarchical Softmax or Negative Sampling are used to make training more efficient, especially for large vocabularies.

Layer Structures:

Both models have an input layer, a hidden layer, and an output layer.
The vectors of interest are typically extracted from the weights of the hidden layer.
Handling of Word Order: Neither CBOW nor Skip-Gram captures the order of words; they focus on word co-occurrence within a context window.

In summary, Word2Vec through its CBOW and Skip-Gram models, effectively captures word associations from large datasets, with each model offering distinct advantages depending on the nature of the text data and desired outcomes.

197
Q

How does Word2Vec represent the “meaning” of words? How
can we compare and combine the word vectors generated by
Word2Vec?

A

Word2Vec represents the “meaning” of words through vector representations and allows for comparison and combination of these vectors in various ways:

Vector Representations:

Word2Vec encodes words into fixed-size vectors, capturing semantic information.
Each dimension represents a latent feature, potentially capturing different aspects of meaning.
Contextual Meaning:

It captures meaning based on the context in which words appear.
Words appearing in similar contexts have similar vector representations.
Distance Between Vectors:

Semantic similarity is represented as proximity in vector space.
Similar words are closer together, while dissimilar words are farther apart.
Cosine Similarity:

Commonly used to measure similarity between two word vectors.
Cosine of the angle between two vectors indicates how similar they are in terms of context.
Vector Operations:

Addition and subtraction of vectors can combine meanings.
For example, “king” - “man” + “woman” might closely align with the vector for “queen”.
Algebraic Manipulations:

Word vectors support meaningful algebraic operations that can reveal semantic relationships.
This can uncover analogies and associations between different words.
Normalization:

Often, vectors are normalized to unit length for consistent comparison.
Cluster Analysis:

Groups of similar words can be identified using clustering techniques in vector space.
Dimensionality Reduction:

Techniques like PCA or t-SNE are used to visualize high-dimensional vectors in 2D or 3D space, helping to interpret relationships.
Word2Vec’s representation allows for a rich understanding of word meanings based on their use in language, providing a powerful tool for various natural language processing applications.

198
Q

How do heurisitcs improve upon blind search as a means of
finding solutions to complex problems? Briefly comment on two
problems associated with the use of heuristic search techniques
and particularly associated with greedy best-first search.

A

Heuristics significantly enhance the search process for solving complex problems compared to blind search methods. Here are the key points:

Guided Search: Heuristics provide guidance to the search process, focusing on more promising areas of the search space, thereby reducing the search time.

Efficiency: Heuristics help in quickly finding good enough solutions for complex problems where exhaustive search is impractical due to time or computational constraints.

Practicality: Often, heuristic methods are the only feasible approach for very large or complex problems.

However, there are problems associated with heuristic search techniques, particularly with the greedy best-first search:

Local Optima: Greedy best-first search can get stuck in local optima – it selects the best option at each step without considering the global context, which may not lead to the best overall solution.

Incompleteness: If not carefully designed, a heuristic search might fail to find a solution at all, even if one exists, because it can overlook paths leading to the solution.

Heuristic Accuracy: The effectiveness of a heuristic search heavily depends on the accuracy of the heuristic function. If the heuristic is not a good estimate of the actual cost to the goal, it can lead to inefficient searches.

Balance Between Exploration and Exploitation: Heuristics, particularly greedy ones, tend to exploit current knowledge, which can lead to insufficient exploration of the search space.

In summary, while heuristics improve the search process by focusing on promising areas and increasing efficiency, they can suffer from problems like getting trapped in local optima and depending heavily on heuristic accuracy.

199
Q

How would you go about the challenge of creating just a
heurisitc function (only) for the Rubik’s cube? Describe how you
would represent each state of the Rubik’s cube.

A

Creating a heuristic function for solving the Rubik’s Cube involves designing a method to estimate the ‘closeness’ of a given state to the solved state. Here’s an approach to tackle this challenge:

State Representation:

Each state of the Rubik’s Cube can be represented by the configuration of its individual pieces.
Typically, this involves encoding the position and orientation of each of the 54 stickers (9 stickers on each of the 6 faces).
Simplicity and Efficiency:

The heuristic should be simple and efficient to compute, as it will be calculated many times during the search process.
Admissible Heuristic:

Ideally, the heuristic should be admissible – it should never overestimate the number of moves required to reach the solved state.
Common Heuristic Approaches:

Manhattan Distance: Calculate the sum of the distances each piece is from its solved position, measured in the number of moves.
Misplaced Tiles: Count the number of tiles not in their solved position.
Pattern Databases: Precompute and store the minimum number of moves required to solve certain patterns or subsets of the cube.
Combining Heuristics:

Heuristics can be combined to form more powerful estimations. For instance, summing the Manhattan distances for corner and edge pieces separately.
Avoiding Illegal States:

Ensure the heuristic doesn’t consider illegal cube states (which cannot occur in a physically real Rubik’s Cube).
Testing and Refinement:

The heuristic function should be rigorously tested and refined based on its performance in guiding the search algorithm towards the solution efficiently.
Balancing Accuracy and Complexity:

A balance between the accuracy of the heuristic and its computational complexity is crucial for optimal performance.
Handling Rotations and Symmetries:

Account for the rotational symmetry of the cube to avoid redundant calculations.
In summary, creating a heuristic function for the Rubik’s Cube involves choosing a method to estimate the distance from the solved state, ensuring the heuristic is both effective in guiding the search and efficient to compute.

200
Q
A

C with a value of 7.

201
Q

PageRank is a recurrent function whereby the value associated
with nodes are inter-dependent on one another. Explain how
such a system can produce stable answers. Make use of
examples in your answer.

A

PageRank is an algorithm used in web search engines, like Google, to rank web pages based on their importance and relevance.
It’s a recurrent function because it iteratively calculates the importance of web pages based on the importance of pages linking to them.
The inter-dependence of nodes in PageRank ensures stable answers by repeatedly refining page importance scores until they converge.
Initially, all pages are assigned equal importance (e.g., each page gets a score of 1.0).
In each iteration, the importance score of a page is updated based on the scores of pages linking to it. A page’s score increases if it is linked to by more important pages.
The process continues until the scores stabilize (convergence) or a predefined number of iterations is reached.
Example: Page A has links from pages B, C, and D. Initially, all pages have a score of 1.0. In the first iteration, if page B is important and links to page A, then A’s score might increase to, say, 1.5. In the next iteration, if page C is also important and links to A, A’s score might increase to 2.0, and so on.
Over iterations, the importance scores converge, and stable answers are achieved, indicating the relative importance of web pages in the search results.

202
Q

How does WordNet represent the different meanings of words?
How can we use it to estimate the semantic similarity between
word meanings?

A

WordNet represents different word meanings using synsets, which are sets of synonymous words or phrases that share a common sense or meaning.
Synsets categorize words based on their meanings, and each word can belong to multiple synsets if it has multiple senses.
WordNet also includes hypernyms (superordinate concepts) and hyponyms (subordinate concepts) to establish hierarchical relationships between synsets.
To estimate semantic similarity between word meanings using WordNet, one common method is to calculate the shortest path distance between synsets in the WordNet hierarchy.
The shorter the path between two synsets, the more similar their meanings are considered to be.
Other measures like Wu-Palmer Similarity or Resnik Similarity can also be used to quantify the relatedness between synsets based on their position in the hierarchy and the information content of shared ancestors.
These measures provide numerical values that indicate the degree of similarity between word meanings, helping in tasks like information retrieval, machine translation, and natural language processing.

203
Q

What is a knowledge graph and how can they help represent the
contents of documents? Briefly compare two approaches to
creating knowledge graphs from text.

A

A knowledge graph is a structured representation of knowledge, connecting entities, facts, and their relationships in a graph-like structure.
Knowledge graphs help represent the contents of documents by extracting and organizing information in a way that allows for easy retrieval, navigation, and understanding of relationships.
Two approaches to creating knowledge graphs from text:

Manual Knowledge Graph Construction:

In this approach, experts manually curate and create a knowledge graph.
Human experts read and extract information from documents, identifying entities, relationships, and attributes.
The process is time-consuming, but it ensures high accuracy and domain-specific knowledge representation.
Examples include domain-specific knowledge bases like DBpedia or Wikidata.
Automated Knowledge Graph Construction:

This approach uses natural language processing (NLP) and machine learning techniques to automatically extract information and construct knowledge graphs.
NLP algorithms analyze large volumes of text data, identifying entities, relationships, and facts.
Automated approaches are faster and scalable but may have lower precision and may not capture specialized or domain-specific knowledge as accurately as manual methods.
Examples include Google’s Knowledge Graph or Facebook’s Open Graph.
Both approaches have their strengths and weaknesses, and the choice depends on factors like the available resources, accuracy requirements, and the scope of the knowledge graph. Manual construction is ideal for precision and domain-specific graphs, while automated methods are efficient for broader and larger-scale knowledge representation.

204
Q

Briefly outline one process used by Word2Vec to generate
vector representations for words. Describe the role played by
neural network learning as used by Word2Vec.

A

Word2Vec uses a process called Continuous Bag of Words (CBOW) or Skip-gram to generate vector representations for words.
For Skip-gram:

The neural network learns by predicting context words (words surrounding a target word) based on the target word.
Each word in a text corpus is represented as a unique vector (initially random).
The neural network iteratively adjusts these vectors to minimize the prediction error.
During training, the network calculates probabilities of context words given the target word and updates the word vectors accordingly.
The neural network aims to make the probability of context words high for a given target word.
Over multiple iterations, word vectors are adjusted to capture semantic relationships and word co-occurrence patterns in the text.
The learned word vectors (word embeddings) end up encoding semantic similarity, allowing similar words to have similar vector representations.

205
Q

Using examples or otherwise, outline two distinct uses of
Word2Vec vectors. Answers can involve comparing and/or
combining these vectors.

A

Word Embeddings for NLP Tasks:

Word2Vec vectors can be used to improve various natural language processing (NLP) tasks.
For example, in sentiment analysis, you can use pre-trained Word2Vec vectors to convert words in a sentence to their corresponding vectors.
Then, you can average or concatenate these vectors to represent the entire sentence as a fixed-size vector.
This sentence vector can be used as input to machine learning models, such as a neural network, for sentiment classification.
Example: In a movie review, “The plot was engaging, but the acting was dull,” the Word2Vec vectors for individual words can be combined to represent the overall sentiment of the review.
Semantic Search and Information Retrieval:

Word2Vec vectors enable semantic search and retrieval of documents or sentences.
Documents can be represented as the average or weighted sum of Word2Vec vectors of the words they contain.
When a user enters a query, their query words can also be represented as Word2Vec vectors.
By comparing the similarity (cosine similarity, for example) between the query vector and document vectors, you can rank and retrieve documents that are semantically related to the query.
Example: When searching for “car safety,” documents discussing automobile safety will be ranked higher due to their semantic similarity, even if they don’t contain the exact query terms.

206
Q

What is lazy learning and how does it differ from alternative
approaches? What are the main advantages associated with the
use of lazy learning? You may use examples in your answer.

A

Lazy Learning:

Lazy learning, also known as instance-based learning, is a machine learning approach where the model does not generalize from the training data during the learning phase.
Instead, it memorizes the training instances (data points) and makes predictions by comparing new instances to the stored training instances.
It does not build an explicit model or hypothesis but directly uses the training data during prediction.
Differences from Alternative Approaches:

Lazy learning differs from eager learning (or model-based learning), where models are built during training and predictions are made based on those models.
In eager learning, models like decision trees, support vector machines, or neural networks are trained and used to make predictions without considering individual training instances.
Advantages of Lazy Learning:

Flexibility and Adaptability: Lazy learning can adapt quickly to changes in the data since it relies directly on stored instances. Eager models may require retraining when new data arrives.

Robustness: Lazy learning can handle noisy or complex datasets better because it doesn’t make strong assumptions about the data distribution.

Interpretability: Predictions in lazy learning can be more interpretable as they are based on similar instances from the training data.

Example:

Consider a lazy learning algorithm for handwritten digit recognition. It stores all training images of digits and their corresponding labels.
When given a new image of a digit to classify, it compares the new image to the stored training images to find the most similar ones.
The algorithm then predicts the digit label based on the labels of the most similar training instances.
This approach is robust to variations in handwriting styles and can quickly adapt to recognizing new handwriting styles without retraining a model.

207
Q

Describe the application of the K-Nearest Neighbours (KNN)
approach to predicting the sale value of a property (eg a house
or apartment), given an extensive database of recent sales.
What data would you store on each transaction to derive your
solution?

A

K-Nearest Neighbors (KNN) for Predicting Property Sale Value:

KNN is used to predict the sale value of a property (e.g., a house or apartment) based on its similarity to other recent property sales in a database.
Data Stored for Each Transaction:

To derive a KNN-based solution for predicting property sale value, you would store the following data for each transaction:
Property Features: Information about the property itself, including its size (e.g., square footage), number of bedrooms and bathrooms, location (e.g., zip code or coordinates), age, and any special features (e.g., swimming pool, garage).
Sale Price: The actual sale price of the property in the transaction.
Transaction Date: The date when the property was sold, which can help account for time-related trends and seasonality in property values.
Neighborhood Data: Information about the neighborhood or locality where the property is situated, such as crime rates, school quality, proximity to amenities, and average property values in the area.
Property History: Any historical data on the property, such as previous sale prices, renovations or upgrades, and property tax history.
Market Data: Data on the overall real estate market conditions, including interest rates, housing market trends, and economic indicators that may affect property values.
KNN Predictive Process:

Given a property for which you want to predict the sale value, KNN works as follows:
Calculate the similarity (distance metric, often Euclidean distance) between the target property and all other properties in the database.
Select the ‘K’ most similar properties based on the calculated distances (e.g., the K properties with the smallest distances).
Average the sale prices of these K similar properties to estimate the sale value of the target property.
Adjusting for K:

The choice of ‘K’ (the number of nearest neighbors to consider) can impact prediction accuracy. Smaller K values can lead to more noise, while larger K values can lead to more bias. The optimal K can be determined through cross-validation.

208
Q

How would you represent the current state of a Rubik’s cube? Focus
your answer on your representation of each of the following:
i) colours
ii) sides and
iii) entire cube
Use suitable examples to illustrate your answer, making reference to
the specific data structures you might use.

A

Representation of the Current State of a Rubik’s Cube:

i) Colors:

Each sticker on a Rubik’s cube can be represented by a color code or a numeric value. Common color codes are ‘W’ (white), ‘R’ (red), ‘G’ (green), ‘B’ (blue), ‘Y’ (yellow), and ‘O’ (orange).
For example, you can represent a white sticker as ‘W’, a red sticker as ‘R’, and so on.
ii) Sides:

A Rubik’s cube has six sides: front, back, left, right, top, and bottom.

You can represent each side as a 3x3 grid or matrix, where each cell contains the color code of the sticker in that position.

For example, the front side of a solved cube might be represented as:

F F F
F F F
F F F

iii) Entire Cube:
- To represent the entire Rubik’s cube, you would combine the representations of all six sides into a data structure.
- You can use a 3D array or a list of lists to represent the cube, where each element corresponds to a side.
- Example (simplified representation with color codes):

  cube = [      [['W', 'W', 'W'], ['W', 'W', 'W'], ['W', 'W', 'W']],  # Top
      [['R', 'R', 'R'], ['R', 'R', 'R'], ['R', 'R', 'R']],  # Front
      [['G', 'G', 'G'], ['G', 'G', 'G'], ['G', 'G', 'G']],  # Right
      [['B', 'B', 'B'], ['B', 'B', 'B'], ['B', 'B', 'B']],  # Back
      [['O', 'O', 'O'], ['O', 'O', 'O'], ['O', 'O', 'O']],  # Left
      [['Y', 'Y', 'Y'], ['Y', 'Y', 'Y'], ['Y', 'Y', 'Y']]   # Bottom
  ]
 

Each side is a 3x3 grid, and the entire cube is represented as a list of these 3x3 grids, allowing you to manipulate and solve the Rubik’s cube by changing the colors in each cell accordingly.
209
Q

Briefly describe the purpose of a heuristic function. Describe a
function or process that returns a single heuristic score for a Rubik’s
cube. You can assume simple references are available to you, such
as “for each face”, “for each row” and “for each cell”.

A

Purpose of a Heuristic Function:

A heuristic function is used in search algorithms and optimization problems to estimate the cost or distance from a given state to a desired goal state.
It provides a quick, approximate evaluation of how close or far the current state is from the optimal solution without exploring all possible paths.
Heuristic Score for a Rubik’s Cube:

One common heuristic for a Rubik’s cube is the “Misplaced Stickers Heuristic.”
It calculates the number of stickers that are not in their correct position.
To compute the heuristic score for a Rubik’s cube:
For each face of the cube:
For each row on the face:
For each cell in the row:
Compare the color of the sticker in the cell with the expected color for that cell’s position on the solved cube.
If the colors do not match, increment a counter.
Sum up the counters for all faces, rows, and cells.
The total count represents the heuristic score - the number of stickers out of place.
The higher the heuristic score, the farther the Rubik’s cube is from being solved.

210
Q
A

D with a value of 3.

211
Q

Use an algebraic expression or other means, describe the size of the
search space for the 15-Sliding Tile Puzzle, using “L” to represent the
lookahead and “b” to represent the branching factor. Now consider
the problem of finding the shortest path between two nodes in a
graph. By adapting your equation or otherwise, show how bidirectional search can reduce the search space of finding the shortest
path in a graph - when compared to unidirectional search.

A

Search Space Size for 15-Sliding Tile Puzzle:

For a 15-Sliding Tile Puzzle, the search space size can be represented as O(b^L), where:
“b” is the branching factor, representing the number of possible moves from a given state (typically 4 for the 15-puzzle).
“L” is the lookahead or depth of the search, representing the number of moves required to reach the goal state from the initial state.
Bidirectional Search vs. Unidirectional Search:

In the context of finding the shortest path between two nodes in a graph, unidirectional search explores the entire search space from the start node to the goal node.
Bidirectional search, on the other hand, explores the search space simultaneously from both the start node and the goal node.
To adapt the equation for bidirectional search:
The search space size for bidirectional search can be represented as O(b^(L/2)), where:
“b” is still the branching factor, representing the number of possible moves from a given state.
“L/2” represents the depth of the search from each end (start node and goal node) since both searches progress towards each other.
The key advantage of bidirectional search is that it effectively reduces the search space by exploring from both ends simultaneously, potentially leading to significant time savings compared to unidirectional search.

212
Q

What is mean by Pareto optimal solutions? Choose three distinct
metrics that are likely to be used by an online route-planning
service (such as google maps), briefly describing each metric.
Contrast the Pareto Optimal solutions that are likely to result from
finding the “best” way to travel between two towns (say, from
Maynooth to Dublin city centre), from the use of just one single
metric. You may refer to the Pareto Optimal Front (POF) in your
answer.

A

Pareto Optimal Solutions:

Pareto optimal solutions, also known as Pareto fronts, are solutions in a multi-objective optimization problem where no solution can be improved in one objective without degrading at least one other objective. In other words, they represent a trade-off between conflicting objectives.
Metrics for Online Route-Planning Service:

Travel Time: The time it takes to complete a journey, considering factors like traffic congestion, speed limits, and road conditions.
Distance: The physical distance in kilometers or miles between the starting and ending points of a route.
Fuel Efficiency: The fuel consumption or energy efficiency of a chosen route, particularly relevant for eco-conscious travelers or commercial vehicles.
Contrast Pareto Optimal Solutions:

When finding the “best” way to travel between two towns (e.g., Maynooth to Dublin city center) using a single metric (e.g., Travel Time):
You would optimize solely for travel time, likely resulting in a single optimal route that minimizes time.
The Pareto Optimal Front (POF) in this case consists of variations of the same route, where any deviation from the fastest path increases travel time.
However, when considering all three metrics (Travel Time, Distance, and Fuel Efficiency):
The Pareto Optimal Front expands to represent trade-offs between these objectives.
Some routes may offer shorter distances but longer travel times or better fuel efficiency.
Others may minimize travel time but have greater distances or lower fuel efficiency.
Users can then choose from a range of Pareto optimal routes that align with their preferences and priorities, whether they value time, distance, or fuel efficiency more.

213
Q

Describe one process for creating vector representations
(embeddings) of words, as used by Word2Vec. In your answer
make clear reference to how words are represented and to the
role played by neural network leaning. You can focus your answer
on either skip-gram or CBOW.

A

Word2Vec Process - Skip-gram:

Word Representation:

All words in a corpus are assigned unique vector representations (word embeddings) of a fixed dimensionality.
Initially, these vectors are usually set to random values.
Neural Network Architecture:

Word2Vec employs a shallow neural network with a single hidden layer.
In the Skip-gram model, the input is a single target word, and the goal is to predict the context words (words that appear nearby in the same sentence).
Training Objective:

During training, the neural network aims to minimize the difference between the predicted context words and the actual context words surrounding the target word.
This is achieved by updating the word vectors (embeddings) to improve the accuracy of word predictions.
Context Sliding Window:

A sliding window moves through the text corpus, selecting a target word and its surrounding context words.
For example, for the sentence “I love ice cream,” if “love” is the target word, “I,” “ice,” and “cream” may be chosen as context words within a specified window.
Vector Update:

The input is the vector representation of the target word.
The neural network predicts the probability distribution of context words within the window.
Backpropagation and gradient descent update the word vectors based on the prediction error.
The goal is to adjust the target word’s vector so that it becomes more similar to the vectors of its context words.
Learning Word Embeddings:

Over multiple iterations and training examples, word vectors evolve to capture semantic relationships between words.
Similar words end up having similar vectors, allowing for vector arithmetic operations like word analogies (e.g., king - man + woman ≈ queen).
Role of Neural Network Learning:

Neural network learning helps optimize word embeddings by iteratively adjusting them based on the context word predictions.
The network learns to capture semantic similarities and relationships between words as it minimizes prediction errors.
The resulting word embeddings are vector representations that encode semantic information about words based on their context in the training data.

214
Q

Consider a simple artificial neural network (such as a simple
Perceptron). In your own words, describe how the weights are
modified during the process of learning a simple function such as
binary AND, or OR. Compare and contrast the learning that
occurs when the output is correct and when the output is Not
correct.

A

Modifying Weights in a Simple Neural Network:

Correct Output:

When the network’s output matches the desired output (e.g., for a binary AND operation), weight modifications are minimal.
The network calculates the error, which is the difference between the desired output and the actual output.
The error is used to compute the gradient of the loss function with respect to the weights.
Weight updates are proportional to the gradient and the learning rate (a small constant).
If the output is correct, the network nudges the weights only slightly in the direction that minimizes the error.
Incorrect Output:

When the network’s output does not match the desired output (e.g., for a binary AND operation), weight modifications are more substantial.
The larger the error, the larger the gradient, resulting in more significant weight adjustments.
The network corrects its weights to reduce the error, pushing the output closer to the desired outcome.
This process continues iteratively, with weight updates gradually aligning the network’s output with the correct output.
Comparison:

Correct Output:

Weight updates are small.
The network reinforces its current weights, as they contribute to the correct output.
Learning is a gradual process with minor adjustments.
Incorrect Output:

Weight updates are larger.
The network corrects and adapts its weights more aggressively to reduce the error.
Learning involves significant adjustments to steer the output towards the correct result.
In both cases, the goal of weight modifications is to minimize the error and align the network’s output with the desired output. However, the extent and direction of weight changes differ depending on whether the output is correct or not.

215
Q

Describe how you would adapt the K-Nearest Neighbours (KNN)
algorithm to estimate the sale value of a second-hand (preowned) car, given a database of recent sales of pre-owned (such
as discussed in class). Identify 5 attributes recorded for each carsale that you would use, as a basis for identify similar cars from
this database. Marks will be awarded for considering different
attribute types, and for the range of values that can be usefully
compared. Use example values to illustrate your answer.

A

Adapting K-Nearest Neighbors (KNN) for Estimating Car Sale Value:

Data Collection:

Gather a database of recent sales of pre-owned cars, including details of each sale.
Attribute Selection:

Choose five relevant attributes that can be used to identify similar cars:
Make and Model: The car’s brand and model, e.g., “Toyota Camry” or “Ford Mustang.”
Year: The manufacturing year of the car, e.g., “2018.”
Mileage: The total mileage driven by the car, e.g., “45,000 miles.”
Fuel Type: The type of fuel the car uses, e.g., “Gasoline” or “Hybrid.”
Price: The sale price of the car, which is the target variable for estimation, e.g., “$15,000.”
Normalization:

Normalize numeric attributes (Year, Mileage, Price) to a common scale (e.g., 0 to 1) to avoid biases due to differences in units and magnitudes.
Distance Metric:

Choose an appropriate distance metric, such as Euclidean distance, to measure the similarity between data points.
Search for Neighbors:

When estimating the sale value of a given car, identify the K-nearest neighbors in the database based on the chosen attributes.
For example, if we want to estimate the value of a 2019 Toyota Camry with 30,000 miles and using gasoline, we find similar cars with similar attributes in the database.
Weighted Average:

Calculate a weighted average of the sale prices of the K-nearest neighbors to estimate the sale value of the target car.
The weights can be inversely proportional to the distance from the target car.
Example:

Target Car: 2019 Toyota Camry with 30,000 miles, using gasoline.
K-nearest neighbors with similar attributes:
Car 1: 2018 Toyota Camry, 28,000 miles, gasoline, price $14,500.
Car 2: 2020 Toyota Camry, 32,000 miles, gasoline, price $15,200.
Car 3: 2019 Honda Accord, 29,000 miles, gasoline, price $14,800.
Weighted Average Sale Value:
Estimated Sale Value = (14,500/0.1 + 15,200/0.2 + 14,800/0.1) / (1/0.1 + 1/0.2 + 1/0.1) = $14,933.33.

216
Q

How would you compare the names of two products, in order to
provide search functionality for an online sales web-site. You may
assume both names are stored as simple text strings. In your
answer discuss the potential similarity between for example, the
names of some books and similar movie names. Briefly evaluate
your proposed solution to highlight situations where it works well
– and situations where it does Not work so well.

A

Comparing Product Names for Search Functionality:

Text Preprocessing:

Perform text preprocessing steps like lowercasing, removing punctuation, and handling whitespace to standardize the format of product names.
Tokenization:

Split the product names into individual words or tokens.
Feature Extraction:

Calculate similarity scores using various techniques, such as:
Cosine Similarity: Measure the cosine of the angle between the vector representations of two product names.
Jaccard Similarity: Compare the sets of tokens in two names by calculating the intersection over union.
Levenshtein Distance: Compute the edit distance (number of operations to transform one string into another) between two names.
N-grams: Compare the presence of common sequences of characters (n-grams) in product names.
Thresholding:

Set a similarity threshold above which two product names are considered similar enough to be included in search results.
Evaluation:

Works Well:

This approach works well when comparing product names that have similar words or phrases, like books and movies with shared titles or themes.
For example, it can correctly identify that “Harry Potter and the Philosopher’s Stone” (book) and “Harry Potter and the Sorcerer’s Stone” (movie) are similar.
Does Not Work Well:

It may not work well when comparing products with entirely different types or contexts, making it challenging to establish meaningful similarity.
For instance, comparing the name of a book with the name of a kitchen appliance may not yield meaningful results.
Extremely short or overly generic product names may not provide sufficient information for reliable similarity assessment.
High similarity scores do not always guarantee that products are identical; they may still have significant differences in attributes and features.

217
Q

How can we approximate the joint probability of a word sequence
n-gram? Briefly outline the problem that is overcome by
approximating the joint probability of a (novel) word sequence –
as opposed to calculating its actual?

A

Approximating Joint Probability of an N-gram:

To approximate the joint probability of a word sequence (n-gram), one common method is to use n-gram language models, such as bigrams or trigrams.

These models estimate the probability of a word given its previous n-1 words, allowing for efficient computation of joint probabilities.

The joint probability of a sequence is then approximated by multiplying the conditional probabilities of individual words in the sequence.

The Problem with Calculating Actual Joint Probability:

Calculating the actual joint probability of a novel word sequence is challenging due to the combinatorial explosion of possible sequences.

The number of possible word sequences grows exponentially with the length of the sequence and the size of the vocabulary.

Storing and computing the actual probabilities for all possible sequences in a large corpus would be infeasible in terms of memory and computational resources.

N-gram language models offer a practical solution by providing a way to estimate joint probabilities efficiently while only considering a limited history of previous words

218
Q

Consider the following equation, where Wi is word number i in
some word sequence and ti is the (part of speech) tag for that
word. Describe in your own words how this equation overcomes
some of the limitations of simpler approaches. You may include
examples to illustrate your answer.
arg maxt Pi=1,n P(ti|ti-1) * P(Wi|ti)

A

Equation and Its Purpose:

The equation aims to find the most likely sequence of part-of-speech tags (ti) for a given word sequence (Wi).
Overcoming Limitations:

Transition Probability (P(ti|ti-1)):

This term accounts for the likelihood of transitioning from one part-of-speech tag to another.
It considers the context of previous tags when predicting the next tag.
Example: In the sentence “I saw a bear,” the transition probability helps distinguish between “saw” as a past tense verb or “saw” as a noun based on the preceding tag (“I” suggests a verb, while “a” suggests a noun).
Emission Probability (P(Wi|ti)):

This term represents the probability of observing a specific word given its part-of-speech tag.
It captures the relationship between words and their associated tags.
Example: In the sentence “The cat sleeps,” the emission probability helps distinguish between “cat” as a noun and “cat” as a verb based on the tag (“The” suggests a noun, while “sleeps” suggests a verb).
Sequential Dependencies:

By combining transition and emission probabilities, the equation models sequential dependencies between words and their tags, allowing for more accurate tag predictions.
Example: In “He eats the fish,” the model understands that “eats” is a verb and “fish” is a noun based on the context, even if “fish” could also be a verb in a different context.
Illustration:

Consider the sentence “She plays the piano.” Without considering sequential dependencies, simpler approaches might mistakenly tag “plays” as a noun, whereas the equation considers the context and correctly tags it as a verb based on the surrounding words and their tag

219
Q

Consider potentially ambiguous words such as cat, car, or dog.
Illustrate some different meanings for one such term. Make
reference to resources like WordNet or ConceptNet as a means
of comparing related words.

A

Ambiguous Words: “Cat”

Meaning 1 (Animal):

“Cat” refers to a domestic or wild feline animal.
WordNet: Under the synset for “cat,” it includes terms like “feline,” “kitten,” and “tabby” as related words.
Meaning 2 (Machinery):

“Cat” can also be an abbreviation for “caterpillar,” a brand of heavy machinery.
WordNet: Under the synset for “caterpillar,” it includes terms like “bulldozer,” “tractor,” and “construction equipment.”
Meaning 3 (Slang):

In slang, “cat” can refer to a person, often used informally to address someone.
ConceptNet: ConceptNet, a semantic network, might link “cat” to terms like “person,” “guy,” or “dude” in a relevant context.
These examples illustrate how the word “cat” can have multiple meanings and contexts, making it important to consider surrounding words and context to disambiguate its intended sense. Resources like WordNet and ConceptNet provide valuable information for understanding related words and meaning

220
Q

Describe one process for generating a knowledge graph from the
output of a natural language parser. Use suitable examples to
illustrate your answer.

A

Process for Generating a Knowledge Graph from a Natural Language Parser:

Natural Language Parsing:

Start with the output of a natural language parser, which typically produces syntactic and grammatical structures from a sentence.
Entity Recognition:

Identify entities (e.g., nouns, proper nouns, and named entities) within the parsed sentence.
Example: In the sentence “Barack Obama was born in Hawaii,” entities include “Barack Obama” (person) and “Hawaii” (location).
Predicate Extraction:

Extract predicates or relationships between entities based on syntactic patterns.
Example: In “Barack Obama was born in Hawaii,” the predicate is “was born in” linking “Barack Obama” and “Hawaii.”
Graph Construction:

Create a knowledge graph structure where nodes represent entities and edges represent relationships (predicates) between them.
Example:
Nodes: “Barack Obama” (Person), “Hawaii” (Location)
Edge: “was born in” (BornIn)
Linking to Knowledge Base:

Optionally, link the entities in the graph to a knowledge base like Wikidata or DBpedia to enrich the information.
Example: Link “Barack Obama” to his profile in a knowledge base for additional data.
Triple Generation:

Convert the entity-relationship pairs into triples (subject-predicate-object) to represent the knowledge graph data.
Example: (Barack Obama, was born in, Hawaii)
The result is a knowledge graph that captures structured information from the parsed natural language text, making it easier for machines to understand and query relationships between entities and concepts.

221
Q

How does the generate and test (G&T) search strategy help solve
many search-based problems? How do heuristic functions help
improve the performance of these search algorithms?

A

Generate and Test (G&T) Search Strategy:

Process:

In G&T, potential solutions are generated systematically or randomly.
Each generated solution is then tested to determine if it satisfies the problem’s constraints or criteria.
The process continues until a satisfactory solution is found or a termination condition is met.
Benefits:

G&T is a general problem-solving strategy that can be applied to a wide range of search-based problems.
It is versatile, as it does not require specific domain knowledge or complex heuristics.
It can explore the search space efficiently, especially when combined with heuristic functions.
Heuristic Functions in G&T:

Role:

Heuristic functions provide an estimate of how close a potential solution is to the optimal solution without evaluating all possible solutions exhaustively.
Improvement:

Heuristics guide the search process by prioritizing the evaluation of solutions that are more likely to be promising.
This reduces the need to test every generated solution, making the search more efficient.
Heuristics help avoid exploring unpromising regions of the search space.
Examples:

In the context of a G&T search for the shortest path in a maze:
A heuristic might estimate the distance from the current position to the goal using methods like Manhattan distance or Euclidean distance.
This allows the search to prioritize paths that appear closer to the goal, leading to faster convergence.
Overall:

G&T, when combined with heuristic functions, strikes a balance between exploring the search space and avoiding unnecessary evaluations, making it a powerful approach for solving search-based problems efficiently and effectively.

222
Q

Describe in detail a heuristic function you would use to help solve
the 15-16 sliding tile puzzle. Using an example or otherwise, clearly
describe how you would represent each individual state of the
sliding tile puzzle in your answer.

A

Heuristic Function for 15-Sliding Tile Puzzle:

Heuristic: The Manhattan Distance Heuristic

Representation of Each State:

Represent each state of the 15-Sliding Tile Puzzle as a 4x4 grid or a 2D array.
Use numbers to represent tiles, and a special symbol (e.g., ‘0’ or ‘X’) to represent the empty tile space.
For example, the initial state might be represented as follows:

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 X

Heuristic Calculation:

The Manhattan Distance Heuristic calculates the estimated distance of each tile from its current position to its goal position, summing these distances for all tiles.
Example Calculation:

Suppose we want to calculate the heuristic for the above initial state.
For each tile 1 to 15 (excluding the empty tile):
Tile 1: Manhattan distance from (1,1) to (0,0) = 2.
Tile 2: Manhattan distance from (1,2) to (0,1) = 2.
Continue calculating distances for all tiles.
Sum all distances: 2 + 2 + … + distance for tile 15.
The total sum represents the heuristic value.
Advantages:

The Manhattan Distance Heuristic is admissible, meaning it never overestimates the true cost to reach the goal.
It is also consistent, ensuring that the heuristic value of a state is never greater than the cost of reaching the goal from that state.
Application:

The heuristic guides search algorithms like A* by helping prioritize states with lower heuristic values, making it efficient in finding optimal solutions for the 15-Sliding Tile Puzzle.

223
Q

Describe the main steps in the operation of the A* algorithm. If you
wish, you may use a suitable diagram(s) to illustrate your answer.

A

Initialization: Start with an open list containing only the initial node and a closed list that is empty.

Loop: Repeat the following steps until the open list is empty:

Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).

Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.

Generate Successors: Identify all the neighbors of the current node and calculate their scores.

Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.

Move Node: Move the current node from the open list to the closed list.

Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).

224
Q
A

c value of 1.

225
Q

Describe some of the limitations of the MiniMax search strategy.
You may make use of suitable examples to illustrate your answer.

A

Limitations of the MiniMax Search Strategy:

Explosion of Game States:

MiniMax explores the entire game tree down to the terminal states, which can lead to a vast number of possible states, especially in complex games like chess.
For example, in chess, the branching factor can be over 30, leading to an exponential growth in the number of nodes to evaluate.
Complete Knowledge Assumption:

MiniMax assumes both players have complete knowledge of the game and can foresee all possible moves and counter-moves.
In practice, players often have limited information, and this assumption may not hold.
Example: In card games, players typically do not know the exact cards held by their opponents.
Computational Complexity:

The computational cost of MiniMax grows rapidly with the depth of the search tree.
In deep trees, it may be impractical to search all the way to terminal states within a reasonable amount of time.
Example: In a complex game like Go, MiniMax with brute-force search is computationally infeasible.
Drawbacks in Non-Zero-Sum Games:

MiniMax is primarily designed for zero-sum games (where one player’s gain is the other’s loss).
In non-zero-sum games or cooperative scenarios, MiniMax may not provide suitable strategies.
Example: In a cooperative board game where players work together, maximizing their individual utility may not lead to the best collective outcome.
Lack of Practical Evaluation Function:

MiniMax requires an accurate evaluation function to estimate the desirability of terminal states that are not reached in the search.
Designing such an evaluation function can be challenging and may lead to suboptimal performance if not done well.
Example: In chess, creating an evaluation function that accurately predicts winning positions is a complex task.
These limitations make MiniMax less practical for certain scenarios and necessitate the use of advanced techniques like pruning, alpha-beta pruning, and heuristic evaluation functions to address some of these challenges.

226
Q

Using an example or otherwise, describe what is meant by MultiObjective optimization. What is meant by a non-dominated solution?
Feel free to use examples to illustrate your answer.

A

Multi-Objective Optimization (MOO):

Definition:

MOO is a type of optimization where multiple conflicting objectives or criteria are considered simultaneously when searching for the best solutions.
In MOO, the goal is to find a set of solutions that provide a trade-off between these objectives, rather than a single optimal solution.
Example:

Consider a car design problem where two objectives are fuel efficiency (maximize) and speed (maximize).
MOO aims to find car designs that balance these objectives, such as achieving high speed while maintaining reasonable fuel efficiency.
Non-Dominated Solution:

Definition:

A non-dominated solution (also known as Pareto optimal solution) is a solution in MOO where no other solution in the feasible solution space simultaneously improves all objectives while not making any objectives worse.
Example:

In the car design problem:
Car A has high speed (100 mph) but low fuel efficiency (20 mpg).
Car B has lower speed (80 mph) but higher fuel efficiency (30 mpg).
Both Car A and Car B are non-dominated solutions because improving one objective (speed or fuel efficiency) would worsen the other.
A dominated solution, in contrast, would be a car design that is worse in at least one objective compared to another solution without any improvements in other objectives.

227
Q

In your own words describe what is meant by an embedding – such as
used by Word2Vec. How we can compare two words by comparing
their embeddings?

A

Embedding:

An embedding is a vector representation of an object, typically a word or concept, in a continuous multi-dimensional space.
In the context of Word2Vec, word embeddings represent words as dense, real-valued vectors where similar words have similar vector representations.
Comparing Words by Embeddings:

Words are compared by measuring the similarity between their embeddings in the vector space.
Similarity can be computed using various distance metrics, such as cosine similarity or Euclidean distance.
The closer two word embeddings are in the vector space, the more similar the words are in meaning.
Example: If the embeddings of “king” and “queen” are close, it suggests a semantic similarity, indicating they share common attributes related to royalty.

228
Q

Choose either the either skip-gram or else CBOW approach. Describe
how this approach uses a neural network to generate embeddings for
words, using a large corpus of text.

A

Skip-gram Approach:

Input: A large corpus of text containing words and their contexts.

Neural Network Architecture:

Utilizes a shallow neural network with a single hidden layer.
Input layer represents the target word.
Output layer predicts the context words based on the target word.
Training Objective:

The neural network aims to maximize the likelihood of predicting context words given the target word.
It learns to adjust the word embeddings (vectors) during training to improve prediction accuracy.
Context Sliding Window:

A sliding window moves through the text corpus, selecting a target word and its surrounding context words.
For example, in the sentence “I love ice cream,” if “love” is the target word, “I,” “ice,” and “cream” may be chosen as context words within a specified window.
Vector Update:

The neural network predicts the probability distribution of context words within the window.
Backpropagation and gradient descent update the word vectors (embeddings) to minimize the prediction error.
The goal is to adjust the target word’s vector so that it becomes more similar to the vectors of its context words.
Learning Word Embeddings:

Over multiple iterations and training examples, word vectors evolve to capture semantic relationships between words.
Similar words end up having similar vectors, allowing for vector arithmetic operations like word analogies (e.g., king - man + woman ≈ queen).

229
Q

In your own words, describe how Reinforcement Learning (RL) can
learn a fixed policy by repeated trials over a problem space. You may
use a diagram to illustrate your answer.

A

Reinforcement Learning (RL) Learning Process:

Agent and Environment:

RL involves an agent interacting with an environment.
The agent takes actions to achieve specific goals within the environment.
Trial and Error:

The agent learns by repeatedly taking actions in the environment and receiving feedback in the form of rewards or penalties.
Policy Learning:

The agent aims to learn a fixed policy, which is a strategy that maps states to actions.
The policy guides the agent’s decision-making.
Value Function:

The agent also estimates a value function that quantifies the expected cumulative rewards associated with following a particular policy.
The value function helps the agent evaluate and improve its policy.
Diagram:

[Agent] –> [Environment]
^ |
| |
| v
| [Rewards/Penalties]
| |
| |
————-
Process:

The agent starts with an initial policy.
It interacts with the environment, taking actions based on the policy.
After each action, it receives feedback in the form of rewards or penalties.
Over many trials, the agent adjusts its policy and value function to maximize cumulative rewards.
The learning process continues until the agent converges to an optimal or near-optimal policy.

230
Q

Select 5 attributes that you would use for a KNN (K Nearest
Neighbours) based application to predict the sale value of a house (or
property). How you would use each of these attributes to find the most
similar property to some given problem property? You may assume
you have access to a large database recording all property sales
during the last 12 months.

A

Attributes for KNN-Based Property Sale Value Prediction:

Property Type:

Categorize properties as single-family homes, apartments, condos, etc.
Use this attribute to filter out properties of the same type when searching for similar ones.
Location:

Use geographic coordinates (latitude and longitude) or postal codes to represent the property’s location.
Calculate distances between the problem property and others to find the closest matches in the same area.
Number of Bedrooms and Bathrooms:

Consider the number of bedrooms and bathrooms as key factors influencing property value.
Search for properties with similar bedroom and bathroom counts to the problem property.
Square Footage:

Include the size of the property’s interior living space (square footage or square meters).
Find properties with similar square footage to provide size-based comparisons.
Recent Sale Price:

Use the sale price of properties from the last 12 months as a historical reference.
When searching for similar properties, consider those with recent sale prices close to the problem property’s estimated value.
Finding Similar Properties:

For a given problem property:
Calculate the Euclidean distance or other suitable distance metric based on the selected attributes (property type, location, bedrooms, bathrooms, square footage).
Rank properties in the database by their distance to the problem property.
Select the K-nearest properties with the smallest distances to the problem property.
Average or weigh the sale prices of the K-nearest properties to estimate the sale value of the problem property.
By using these attributes and the KNN algorithm, you can find properties in the database that are most similar to the problem property and provide a sale value estimate based on their sale prices.

231
Q

Briefly outline how distance weighted KNN improves upon the
unweighted simpler alternative to generating predictions.

A

Distance-Weighted KNN vs. Unweighted KNN:

Unweighted KNN:

In unweighted KNN, all nearest neighbors have equal influence on the prediction.
Each neighbor contributes equally to the final prediction, regardless of its distance from the query point.
Distance-Weighted KNN:

In distance-weighted KNN, neighbors’ contributions are weighted based on their distance to the query point.
Closer neighbors have a stronger influence on the prediction, while more distant neighbors have a weaker influence.
Weighting accounts for the fact that closer neighbors are often more relevant and should have a greater impact on the prediction.
Advantages of Distance-Weighted KNN:

Improved Accuracy:

Distance-weighted KNN gives more importance to neighbors that are closer to the query point.
This typically leads to more accurate predictions as it considers the local density of data points.
Sensitivity to Outliers:

Unweighted KNN can be sensitive to outliers since all neighbors have equal influence.
Distance-weighted KNN reduces the impact of outliers by giving them less weight if they are far away.
Better Handling of Varying Neighborhood Sizes:

In cases where the number of neighbors varies across query points, distance weighting provides a consistent way to balance their influence.
Adaptation to Data Distribution:

Distance weighting adapts to the distribution of data points, allowing it to perform well in both densely and sparsely populated regions of the feature space.
Overall, distance-weighted KNN enhances the predictive power of the algorithm by taking into account the relative importance of neighbors based on their proximity to the query point, making it a more refined and robust version of KNN.

232
Q

What use can be made of calculating the joint probability of word
sequence, when processing natural language? Why might we
approximate the joint probability of a word sequence rather than
calculate the actual value of that joint probability?

A

Use of Calculating Joint Probability in Natural Language Processing (NLP):

Language Modeling: Joint probability helps in building language models that estimate the likelihood of a sequence of words in a given language.

Speech Recognition: Joint probability aids in speech recognition systems to identify the most likely word sequences from audio input.

Machine Translation: It is essential for machine translation systems to generate the most probable translations for input sentences.

Text Generation: In text generation tasks, it helps generate coherent and contextually relevant sentences or paragraphs.

Grammar Checking: Joint probability can be used to evaluate the grammaticality and fluency of sentences in language models and grammar checking tools.

Approximating vs. Calculating Actual Joint Probability:

Computational Complexity: Calculating the actual joint probability of a word sequence is computationally expensive, especially for long sentences and large vocabularies. It involves multiplying probabilities for each word, resulting in exponential growth in complexity.

Data Sparsity: The actual joint probability calculation requires a massive amount of training data to accurately estimate probabilities for all possible word sequences. In practice, this data may not be available.

Practicality: Approximating the joint probability using techniques like n-gram models or neural language models is more practical and scalable. These methods provide reasonably accurate estimates without the need for vast amounts of data.

Efficiency: Approximations enable efficient language modeling and natural language processing, making it feasible for real-time applications and tasks like autocomplete, speech recognition, and machine translation.

In summary, while calculating the actual joint probability is theoretically ideal, it is often infeasible due to computational constraints, data availability, and efficiency considerations. Approximating joint probability provides a practical and effective way to model and process natural language.

233
Q

What are the main similarities and difference between part-of-speech
tagging (PoS) and parsing? Choose either a constituency parser or
else a dependency parser in your comparison. Focus your answer on
the outputs generated by the two processes.

A

Similarities between Part-of-Speech Tagging (PoS) and Parsing:

Text Analysis:

Both PoS tagging and parsing are essential components of natural language processing (NLP) that analyze and structure text.
Syntactic Information:

Both processes provide syntactic information about words in a sentence.
They help identify the grammatical role of words and their relationships within the sentence.
Differences between PoS Tagging and Dependency Parsing:

Part-of-Speech Tagging (PoS):

Output:
PoS tagging assigns a specific grammatical category or tag to each word in a sentence (e.g., noun, verb, adjective).
Granularity:
PoS tagging focuses on individual words and does not capture the hierarchical structure of the sentence.
Example:
In the sentence “She sings beautifully,” PoS tagging assigns tags like “pronoun,” “verb,” and “adverb” to individual words.
Dependency Parsing:

Output:
Dependency parsing provides a tree structure that represents the grammatical relationships between words in a sentence.
Granularity:
It captures the hierarchical structure of the sentence by identifying head words and their dependents.
Example:
In the sentence “She sings beautifully,” dependency parsing generates a tree that shows “sings” as the main verb with “She” and “beautifully” as its dependents.
In summary, both PoS tagging and dependency parsing are NLP techniques that provide syntactic information, but they differ in the granularity and nature of their outputs. PoS tagging assigns tags to individual words, while dependency parsing creates a structured representation of word relationships in the form of a dependency tree.

234
Q

Draw a likely parse tree for the following sentence, including all
relevant parts of speech. You may assume the sentence involves on
simple lexical categories such as: Determiner, Noun, Verb, and
Preposition. You may assume the use of either a constituency parser
or else a dependency parser.
“The rat chewed on the seed.”

A

Here’s a likely constituency parse tree for the sentence “The rat chewed on the seed”:

   [Sentence]
      |
 \+----+-----+
 |          |  [Noun Phrase]  [Verb Phrase]
 |              |
 |          +---+---+
 |          |       | [Determiner]  [Noun] [Preposition Phrase]
 |          |          |
"The"      "rat"      |
                      \+----+
                      |    |
                    [Verb] [Noun Phrase]
                      |    |
                    "chewed"  |
                            \+----+
                            |    |
                          [Preposition Phrase]
                            |    |
                          "on"  [Noun]
                               |
                             "the"  "seed" The parse tree breaks down the sentence into its constituent phrases and words, labeling each with its relevant part of speech.
235
Q

How does WordNet represent the distinct meanings of words? What
relationships does it use to specify connections between distinct wordmeanings? Make use of suitable examples to illustrate your answer
(cat, car, dog etc.)

A

WordNet Representation of Distinct Word Meanings:

Synsets (Synonym Sets):
WordNet represents distinct word meanings or senses using synsets.
A synset is a set of words that are synonyms or have closely related meanings.
Example: In WordNet, “cat” has different synsets representing various senses, such as the animal and the heavy equipment part.
Relationships Specifying Connections:

Hypernym-Hyponym Relationship:

WordNet uses hypernyms (superordinate) and hyponyms (subordinate) to relate broader and more specific concepts.
Example: In WordNet, “car” is a hyponym of “motor vehicle,” indicating that a car is a specific type of motor vehicle.
Antonymy Relationship:

WordNet includes antonyms to specify opposite meanings.
Example: “Hot” and “cold” are antonyms, indicating opposite temperature-related meanings.
Meronymy-Holonymy Relationship:

WordNet represents part-whole relationships using meronyms (part) and holonyms (whole).
Example: “Wheel” is a meronym of “car” because wheels are parts of a car.
Similarity (Similar To) Relationship:

WordNet captures similarity relationships to connect words with related but not identical meanings.
Example: “Dog” and “wolf” are similar to each other in the sense that they share characteristics but are not synonyms.
These relationships and synsets in WordNet help organize and define the distinct meanings of words while providing a rich resource for natural language processing and understanding.

236
Q

Describe the word-window technique for generating graph
representations of natural language. You may include suitable
examples to illustrate your answer. Briefly compare this approach to
an alternative technique.

A

Word-Window Technique for Generating Graph Representations:

Definition:

The word-window technique is used to create graph representations of natural language by considering a fixed-sized window of words in a sentence.
Within this window, words are treated as nodes, and their co-occurrence relationships are represented as edges in the graph.
Process:

Define a window size (e.g., 2, 3, or 5 words) for processing a sentence.
Slide the window through the sentence, focusing on each word one at a time.
Create nodes for the words within the window and connect them with edges to represent co-occurrence relationships.
Repeat this process for the entire sentence, generating a graph structure.
Example:

Sentence: “The quick brown fox jumps over the lazy dog.”
With a window size of 3:
Window 1: (“The”, “quick”, “brown”)
Window 2: (“quick”, “brown”, “fox”)
Window 3: (“brown”, “fox”, “jumps”)

Nodes and edges are created for each window, capturing co-occurrence relationships.
Comparison with Alternative Techniques:

Word2Vec:

Word2Vec uses neural networks to generate word embeddings, capturing semantic relationships between words.
Unlike the word-window technique, Word2Vec produces dense vector representations, which are continuous and distributed.
Graph-Based Parsing:

Graph-based parsing techniques, such as dependency parsing, generate graph structures to represent syntactic relationships between words.
These approaches focus on grammatical relationships, whereas the word-window technique emphasizes co-occurrence patterns.
In summary, the word-window technique is a simple method for creating graph representations of natural language based on word co-occurrences. It captures local relationships within a fixed-sized window of words in a sentence. Alternative techniques like Word2Vec and graph-based parsing offer different approaches to represent semantic and syntactic information in language

237
Q

In general terms, describe how the Generate and Test strategy
can help solve some difficult problems.

A

Generate and Test Strategy:

Process:

Generate and Test is a problem-solving strategy where potential solutions are systematically or randomly generated.
Each generated solution is then tested against specific criteria or constraints to determine its validity or quality.
Advantages:

Versatility: It can be applied to a wide range of difficult problems without requiring specialized domain knowledge.
Exploration: It explores the solution space by generating diverse candidates, increasing the chances of finding a suitable solution.
Simplicity: It doesn’t rely on complex heuristics or optimization algorithms, making it accessible and easy to implement.
Examples:

Generate and Test can be used in:
Combinatorial optimization problems like the traveling salesman problem, where different routes are generated and tested for their lengths.
Algorithm design, where various algorithms are generated and tested for their efficiency on a particular task.
Design problems, such as finding the optimal layout for a circuit board by generating different arrangements and testing for connectivity and efficiency.
Overall, Generate and Test is a flexible strategy that systematically explores the solution space and evaluates potential solutions based on predefined criteria, making it valuable for solving difficult problems in various domains.

238
Q

Tic-Tac-Toe (or OXO) is a 2 player turn-based game played on a
3X3 grid. The first player to get three of their markers (either X or
O) in a row wins. Describe in clear English (or pseudocode) a
simple process that will take the current state of the OXO game
and generate all legal (immediate) next moves only. In addition,
you may assume your generator will make a move for the “X”
player only. Your solution should work for any current state of
the game.

A

Here’s a simple pseudocode process to generate all legal immediate next moves for the “X” player in a Tic-Tac-Toe game:

function generateLegalMoves(currentState):
legalMoves = []
for each row in 3x3 grid:
for each column in 3x3 grid:
if currentState[row][column] == empty:
// If the current cell is empty, it’s a legal move
newMove = copy(currentState) // Create a copy of the current state
newMove[row][column] = “X” // Place “X” in the empty cell
legalMoves.append(newMove) // Add the new state to the list of legal moves
return legalMoves
This pseudocode describes a function that iterates through each cell in the 3x3 grid, checks if it’s empty, and if so, generates a new game state with “X” placed in that cell. It then adds this new state to the list of legal moves.

239
Q

In your own words describe the relevance of the following terms,
as used in relation to search driven Chess playing system:
Quiescent Search and the Null Move. How do they improve the
efficiency the search process?

A

Quiescent Search:

In a search-driven chess-playing system, quiescent search is a critical concept.
It refers to a search extension that focuses on examining the positions where the game is relatively “quiet” or stable, with fewer tactical possibilities.
Quiescent search aims to improve efficiency by reducing the search depth, as it avoids delving into complex and highly tactical positions.
By analyzing only calm positions, the system can identify safe moves and avoid costly blunders.
This technique helps prevent the “horizon effect,” where the search may terminate at a point where a tactical opportunity or threat arises just beyond the current search depth.
Null Move:

The null move is another technique used in chess engines to enhance search efficiency.
It involves temporarily passing the turn to the opponent, effectively skipping one of the player’s moves (making a “null move”).
The idea is that if the player can make a null move and still have a good position, it suggests a favorable situation, allowing the engine to prune or reduce the search depth for that branch.
Null moves are especially useful in positions where one player has a significant advantage because they allow the engine to quickly reach a winning outcome.
However, null moves are used cautiously because they can potentially miss important tactics if overused.
By using null moves strategically, the search process becomes more efficient by quickly identifying favorable positions and avoiding exhaustive searches in unwinnable situations.

240
Q
A

D value of 3.

241
Q

Consider the problem of finding the shortest path between two
nodes in a network (social network or other graph structure).
What is bi-directional search and how does it improve upon unidirectional search?

A

Bi-Directional Search:

Bi-directional search is a search algorithm used to find the shortest path between two nodes (start and goal) in a graph or network.
It differs from unidirectional search in that it simultaneously explores paths from both the start node and the goal node, meeting in the middle.
Bi-directional search uses two search processes, one starting from the initial node (forward search) and the other from the goal node (backward search).
The goal is to meet in the middle, where the two searches intersect, and find the shortest path.
Advantages of Bi-Directional Search:

Efficiency:
Bi-directional search is often more efficient than unidirectional search, especially in large graphs.
It reduces the search space by exploring paths from both ends, potentially halving the search effort.
Faster Convergence:
By searching from both directions, it tends to converge faster and find a solution more quickly.
This is especially beneficial in situations where the search space is extensive.
Improved Scalability:
Bi-directional search scales well for finding paths between distant nodes in the graph.
It is particularly useful when the start and goal nodes are far apart.
Reduced Memory Usage:
Bi-directional search often requires less memory compared to unidirectional search because it explores fewer nodes.
Example:

In a social network, finding the shortest path between two users can be time-consuming with unidirectional search.
Bi-directional search simultaneously explores connections from both users, making it faster and more efficient in finding the shortest path.

242
Q

Consider a K Nearest Neighbours (KNN) system to predict the
sale value of a house (or property). Select 5 different attributes
(or parts of attributes, such as the house number taken from the
address) and describe how you would compare each attribute,
when searching for the most similar properties. You may
assume you have access to a large database recording all
property sales during the last 5 years. Marks will be awarded for
the range of approaches you use.

A

Here are five different attributes or parts of attributes that can be used to compare properties in a K Nearest Neighbors (KNN) system for predicting property sale value:

Location Attributes:

Geographic coordinates (latitude and longitude) or postal codes.
Calculate distances between properties to find the closest matches in terms of location.
Use methods like Haversine distance to account for the curvature of the Earth when comparing coordinates.
Property Size and Features:

Square footage or square meters of the property.
Number of bedrooms, bathrooms, and other relevant features (e.g., swimming pool, garage).
Compare properties with similar size and feature sets.
Sale History:

Sale price history of the property over the past 5 years.
Analyze trends in price changes and assess how the current asking price compares to historical data.
Neighborhood Demographics:

Demographic data of the neighborhood where the property is located.
Include factors like average income, crime rate, school quality, and population density.
Find properties in neighborhoods with similar demographic characteristics.
Market Trends:

Real estate market trends over the past 5 years.
Examine factors like overall market appreciation or depreciation rates in the area.
Compare the property’s value changes with broader market trends to assess its potential.
By comparing these attributes, the KNN system can identify properties with similar characteristics, locations, sale histories, and market dynamics, making it a comprehensive approach for predicting property sale values.

243
Q

Describe each of the following activities and their relevance to
Natural Language Processing (NLP). Illustrate each with a
suitable example.
i) Tokenization
ii) Chunking
iii) Coreference Resolution
iv) Named Entity Recognition (NER)
v) Sentiment Analysis

A

Tokenization:

Relevance to NLP: Tokenization is a fundamental step in NLP that involves splitting a text into individual words or tokens. It is essential for various NLP tasks.
Example: In the sentence “I love natural language processing,” tokenization results in the tokens: [“I”, “love”, “natural”, “language”, “processing”].
Chunking:

Relevance to NLP: Chunking involves grouping adjacent words or tokens into meaningful chunks, such as noun phrases or verb phrases. It aids in syntactic analysis and information extraction.
Example: In the sentence “The cat chased the mouse,” chunking may identify the chunks: [“The cat”, “chased”, “the mouse”].
Coreference Resolution:

Relevance to NLP: Coreference resolution aims to identify when different words or phrases in a text refer to the same entity. It helps in understanding the relationships between words and maintaining coherence.
Example: In the sentence “John saw a bird. He was amazed,” coreference resolution would determine that “He” refers to “John.”
Named Entity Recognition (NER):

Relevance to NLP: NER identifies and classifies named entities (e.g., names of people, organizations, locations) in a text. It is essential for information extraction and understanding document content.
Example: In the text “Apple Inc. is headquartered in Cupertino, California,” NER would tag “Apple Inc.” as an organization and “Cupertino, California” as a location.
Sentiment Analysis:

Relevance to NLP: Sentiment analysis determines the sentiment or emotional tone expressed in text, such as positive, negative, or neutral. It is used for tasks like social media monitoring and customer feedback analysis.
Example: In the review “The movie was fantastic! I loved it,” sentiment analysis would classify the sentiment as positive based on the positive words “fantastic” and “loved.”

244
Q

How can we consider many English nouns to represent
categories? Consider the WordNet hierarchy and outline two
strategies to quantify the similarity between a given pair of
concepts.

A

Considering English Nouns as Categories:

English nouns often represent categories or concepts that group similar entities or objects.
For example, the noun “bird” represents a category that includes various species like sparrows, eagles, and robins.
WordNet Hierarchy:

WordNet organizes nouns into a hierarchical structure, where hypernyms (superordinate categories) are linked to hyponyms (subordinate categories).
For example, “animal” is a hypernym of “bird,” which is a hyponym of “animal.”
Strategies to Quantify Similarity:

Path Similarity:

Path similarity quantifies the similarity between two concepts by measuring the length of the shortest path connecting them in the WordNet hierarchy.
Shorter paths indicate higher similarity, and longer paths suggest lower similarity.
Example: The path similarity between “bird” and “mammal” is shorter than the path similarity between “bird” and “insect,” indicating that “bird” is more similar to “mammal” than to “insect.”
Wu-Palmer Similarity:

Wu-Palmer similarity takes into account the depth of the concepts in the hierarchy and the depth of their least common subsumer (LCS).
It calculates similarity as 2 * depth(LCS) / (depth(concept1) + depth(concept2)).
Higher Wu-Palmer similarity values represent greater similarity.
Example: “bird” and “mammal” share a common subsumer “animal” with a depth of 1, while “bird” and “insect” have a common subsumer “animal” with a depth of 1 as well. Therefore, the Wu-Palmer similarity is higher between “bird” and “mammal” than between “bird” and “insect.”
These strategies use the WordNet hierarchy to quantify the similarity between English nouns representing categories, allowing for semantic comparisons based on their positions and relationships in the hierar

245
Q

What is meant by the term Embedding as used in relation to
Word2vec (W2V)? How does W2V use embeddings to calculate
a numerical estimate of the semantic similarity between two
words? Use an example to illustrate your answer.

A

Embedding in Word2Vec (W2V):

In Word2Vec, “embedding” refers to a vector representation of words in a continuous vector space.
Each word is embedded as a vector in such a way that similar words have similar vector representations.
Calculating Semantic Similarity in W2V:

W2V calculates semantic similarity between words based on the cosine similarity between their vector embeddings.
The cosine similarity measures the cosine of the angle between two vectors, where a smaller angle corresponds to a higher similarity.
Example:

Let’s consider two words: “king” and “queen,” and their Word2Vec embeddings as follows:
king = [0.4, 0.6, 0.2]
queen = [0.3, 0.7, 0.1]
Cosine Similarity Calculation:

To calculate the semantic similarity between “king” and “queen” using their embeddings:
Compute the dot product of the two vectors: 0.4 * 0.3 + 0.6 * 0.7 + 0.2 * 0.1 = 0.33
Calculate the magnitudes of each vector: |king| = sqrt(0.4^2 + 0.6^2 + 0.2^2) = 0.72 and |queen| = sqrt(0.3^2 + 0.7^2 + 0.1^2) = 0.74
Compute the cosine similarity: Cosine Similarity = 0.33 / (0.72 * 0.74) ≈ 0.63
Semantic Similarity Interpretation:

A cosine similarity of approximately 0.63 suggests that “king” and “queen” are semantically similar words in the vector space.
The smaller the angle between their embeddings, the higher the cosine similarity, indicating greater semantic similarity.
Word2Vec uses these embeddings and cosine similarity calculations to estimate the semantic similarity between any pair of words, enabling various NLP tasks like word analogy solving, document clustering, and more.

246
Q

Describe how Sense2Vec overcomes some of the limitations
associated with W2V, using a suitable example to illustrate the
difference. Briefly outline some of the advantages and
disadvantages of suing S2V and WordNet as a basis for
estimating similarity between terms.

A

Sense2Vec vs. Word2Vec:

Sense Disambiguation:

Sense2Vec addresses the ambiguity of word meanings by providing multiple embeddings for polysemous words.
In contrast, Word2Vec assigns a single embedding to a word, potentially mixing different senses.
Example:

For the word “bank,” Word2Vec would provide a single vector that may represent both “financial institution” and “riverbank.”
In Sense2Vec, “bank” would have separate embeddings for each sense, making it easier to distinguish between them.
Advantages of Sense2Vec:

Fine-Grained Similarity:
Sense2Vec allows for more fine-grained similarity calculations by considering the specific sense of a word.
Improved NLP Tasks:
Sense2Vec enhances performance in tasks requiring sense disambiguation, such as machine translation and sentiment analysis.
Disadvantages of Sense2Vec:

Complexity:
Managing and using multiple embeddings for polysemous words can be computationally complex.
Data Size:
Sense2Vec models may require larger amounts of training data and storage space compared to Word2Vec.
Using Sense2Vec and WordNet for Similarity Estimation:

Sense2Vec:

Advantage: Provides fine-grained sense-specific similarity measures, improving accuracy for certain NLP tasks.
Disadvantage: Requires more data and computational resources, and may not always outperform Word2Vec for all applications.
WordNet:

Advantage: Leverages a structured lexical database to provide detailed semantic relationships.
Disadvantage: May not capture fine-grained sense differences as effectively as Sense2Vec, and it requires manual curation.
Depending on the specific NLP task and resources available, choosing between Sense2Vec and WordNet as a basis for estimating similarity will depend on the trade-offs between granularity, complexity, and data requirements.

247
Q

Consider the following equation, where Wi is word number i in
some word sequence and ti is the part of speech (PoS) tag for
that word. Describe in your own words how this equation
overcomes some of the limitations of simpler approaches. You
may include examples to illustrate your answer.
arg maxt Pi=1,n P(ti|ti-1) * P(Wi|ti)

A

Equation Overview:

The equation arg max P(i=1 to n) P(ti|ti-1) * P(Wi|ti) is used for part-of-speech tagging (PoS) in natural language processing.
Overcoming Limitations:

Conditional Dependencies:

The equation considers conditional dependencies between adjacent words and their PoS tags. It factors in both the likelihood of a word’s PoS tag given its previous tag (P(ti|ti-1)) and the likelihood of observing the word given its PoS tag (P(Wi|ti)).
Contextual Information:

By incorporating both previous PoS tags and word-PoS tag associations, this approach captures contextual information.
Example: In the sentence “He ate the apple,” the model would consider that “ate” is more likely to be a verb (V) based on the context of “He” as a pronoun (PRP) and “the” as a determiner (DT).
Transition Probabilities:

It models the transition probabilities between PoS tags, helping to distinguish between homonyms or ambiguous words.
Example: In the sentence “She will bank on it,” the model can differentiate between “bank” as a noun (NN) and “bank” as a verb (VB) based on the transition probabilities.
Example:

Consider the word “lead.” Depending on context, it can be a noun (e.g., “a lead”) or a verb (e.g., “to lead”).
The equation helps disambiguate this word by analyzing the context and transition probabilities, leading to a more accurate PoS tagging, such as:
“a lead” → Noun (NN)
“to lead” → Verb (VB)
By considering both context and transition probabilities, this equation improves PoS tagging accuracy compared to simpler approaches that do not account for contextual dependencies.

248
Q

What is parsing? Draw like likely (constituency) parse tree for
the following sentence. You may assume all grammatical terms
(S, VP, NP…) have the same meaning as discussed in class.
“I shot an elephant in my pyjamas”.

A

Parsing:

Parsing is the process of analyzing the grammatical structure of a sentence to determine its syntactic components and their relationships.
Likely Constituency Parse Tree:

The provided sentence: “I shot an elephant in my pyjamas.”
Likely constituency parse tree:

(S)
├───(NP)───(VP)
│       ├───(V)───"shot"
│       ├───(NP)───(DT)───"an"───(NN)───"elephant"
│       └───(PP)
│           ├───(IN)───"in"
│           └───(NP)───(PRP$)───"my"───(NNS)───"pyjamas"
└───"."─── In this parse tree, the sentence is divided into its constituent parts, including noun phrases (NP), verb phrases (VP), determiners (DT), nouns (NN), prepositional phrases (PP), prepositions (IN), possessive pronouns (PRP$), and plural nouns (NNS). Parsing helps to identify the grammatical structure of the sentence, making it easier to analyze and understand its syntactic components.
249
Q

What are Transformers and how are they trained? Briefly
describe how they are used to generate seemingly coherent
language.

A

Transformers:

Transformers are a type of deep learning model used in natural language processing (NLP) tasks.
They were introduced in the paper “Attention Is All You Need” by Vaswani et al. in 2017 and have since become a foundation for many NLP applications.
Training Transformers:

Transformers are trained using large amounts of text data in an unsupervised or semi-supervised manner.
They employ a self-attention mechanism that allows the model to weigh the importance of different words in a sentence when making predictions.
Pre-training involves learning contextual embeddings for words, sentences, or documents by predicting missing words in sentences (masked language modeling) or determining if two sentences are related (next sentence prediction).
Fine-tuning is done on specific NLP tasks, such as text classification, machine translation, or text generation, using labeled data.
Generating Coherent Language:

Transformers generate coherent language by:
Leveraging learned contextual embeddings to capture the context and meaning of words within a sentence.
Using self-attention mechanisms to consider the relationships between words, ensuring that generated text is contextually relevant.
Employing techniques like beam search or sampling to produce diverse but coherent text outputs.
Combining the capabilities of pre-training and fine-tuning to adapt to specific language generation tasks.
In summary, Transformers are trained on large datasets and use self-attention mechanisms to capture context, allowing them to generate seemingly coherent language in a wide range of NLP applications.

250
Q

Briefly outline how heuristic search improves upon blind search.
Describe some limitations encountered by heuristic search,
focusing on local optima and plateau.

A

Heuristic Search vs. Blind Search:

Heuristic Search:

Utilizes domain-specific knowledge (heuristics) to guide the search process.
Makes informed decisions about which nodes or paths to explore next based on estimated cost-to-goal.
Generally more efficient in terms of time and resources compared to blind search.
Examples include A* search and Greedy Best-First search.
Blind Search:

Searches without any specific knowledge about the problem domain.
Explores nodes or paths in a systematic way, such as breadth-first or depth-first.
Can be inefficient, especially in large state spaces.
May not consider the actual goal or problem structure during the search.
Limitations of Heuristic Search:

Local Optima:

Heuristic search algorithms, especially greedy methods, can get stuck in local optima.
These are solutions that appear optimal within a limited search scope but are not the global best.
Overreliance on heuristics can lead to suboptimal results.
Plateau Problem:

Plateaus are regions in the search space where the heuristic values remain nearly constant.
Heuristic search algorithms can struggle to make progress on plateaus as they lack information to distinguish better paths.
Plateaus can cause stagnation in the search.
Heuristic Accuracy:

The effectiveness of heuristic search depends on the quality and accuracy of the heuristics used.
Inaccurate or poorly chosen heuristics may misguide the search, leading to suboptimal or incorrect solutions.
Computation and Memory:

Some heuristic search algorithms require significant computational resources and memory, especially in large state spaces.
The trade-off between heuristic accuracy and computational cost must be considered.
In summary, heuristic search improves upon blind search by incorporating domain-specific knowledge, but it still has limitations related to local optima, plateaus, heuristic accuracy, and resource requirements. These challenges must be addressed to design effective heuristic search algorithms.

251
Q

Consider a heuristic function designed to evaluate the board state
of a simple 2-player game (like checkers or chess). Describe three
game features that you would look for in some given board state,
briefly commenting on the relevance of each to identifying fruitful
areas of the search space.

A

Game Features for Heuristic Evaluation:
Material Advantage:

Evaluating the difference in the number and value of pieces for each player.
Relevance: A significant material advantage can indicate a favorable position, as it often implies a greater number of options and potential threats for the player with more pieces.
Piece Mobility:

Assessing the ability of each player’s pieces to move and control the board.
Relevance: High piece mobility suggests greater control and flexibility in future moves, increasing the chances of achieving a favorable position.
King Safety:

Analyzing the safety of the players’ kings or key pieces from potential threats.
Relevance: A well-protected king is crucial to avoid immediate threats and maintain long-term strategic opportunities, making it a key aspect of board evaluation.
These game features provide insights into the current state of the game, helping to identify fruitful areas of the search space by highlighting aspects such as material advantage, mobility, and king safety, which are critical for strategic decision-making in games like chess or checkers

252
Q

Describe in your own words, how the A* search algorithm might
be used to plan the route of a mobile robot from its current location
to some given destination.

A

A Search for Robot Path Planning*:

A* search is used to plan the route of a mobile robot from its current location to a specified destination by considering the following steps:
State Representation:
Represent the environment as a graph, where each node corresponds to a possible robot position, and edges represent feasible movements between positions.
Heuristic Function:
Define a heuristic function that estimates the cost or distance from each node to the destination. This function guides the search by providing an optimistic estimate of the remaining effort to reach the goal.
Initialization:
Initialize the search with the robot’s current position as the starting node.
Open and Closed Sets:
Create two sets: an “open set” to store nodes to be evaluated and a “closed set” to keep track of nodes already evaluated.
Cost and Evaluation:
Calculate the cost of reaching each neighboring node from the current node. This cost includes the cost to reach the node from the start and the estimated remaining cost (heuristic).
Add neighboring nodes to the open set for future evaluation.
Search Loop:
Repeatedly select the node with the lowest estimated cost (combined cost and heuristic) from the open set for evaluation.
Evaluate the selected node, considering its neighbors and updating their costs and parents if a shorter path is found.
Move the evaluated node to the closed set to prevent redundant evaluations.
Goal Check:
After evaluating each node, check if the goal (destination) has been reached. If the goal is found, reconstruct the path from the goal node to the start node using the parent references stored during the search.
Path Planning:
The path is planned by following the parent references from the goal node to the start node, which represents the optimal route for the robot to reach its destination.
Execution:
The robot can follow the planned path, moving from one position to the next while avoiding obstacles and reaching the desired destination efficiently.
A* search helps the robot navigate through the environment by considering both the cost incurred so far and an estimate of the remaining cost, ensuring an optimal path is found while efficiently exploring the search space.

253
Q

Describe in your own words the most important differences
between A* search and MiniMax search (in the presence of an
adversary)?

A

*A Search vs. MiniMax Search (Adversarial)**:
Objective:

A Search: A search is used for finding the shortest path or optimal solution in a single-player environment where there is no adversary. It minimizes the cost or distance to a goal state.
MiniMax Search: MiniMax search is used for making decisions in adversarial environments, such as two-player games like chess or tic-tac-toe. It aims to maximize the minimum payoff while considering an opponent’s moves.
Environment:

A Search: A operates in a deterministic environment where actions and their outcomes are known with certainty.
MiniMax Search: MiniMax operates in a stochastic environment with uncertainty, where the adversary’s actions are not predictable.
Utility Function:

A Search: A uses a heuristic function to estimate the cost or distance to the goal state.
MiniMax Search: MiniMax uses a utility function to assign values to game states, representing the desirability of a state for the current player.
Search Space:

A Search: A explores a state space to find an optimal path or solution from the initial state to the goal state.
MiniMax Search: MiniMax explores a game tree representing possible sequences of moves and counter-moves in a two-player game.
Player Roles:

A Search: A does not involve player roles as it operates in single-player scenarios.
MiniMax Search: MiniMax involves two players, one maximizing their utility and the other minimizing it, leading to a competitive decision-making process.
Optimality:

A Search: A guarantees finding an optimal solution if the heuristic is admissible and consistent.
MiniMax Search: MiniMax aims to find a strategy that maximizes the minimum outcome for the current player, but optimality depends on the depth of search and the quality of the utility function.
Branching Factor:

A Search: The branching factor in A depends on the state space, and it explores nodes based on their estimated cost.
MiniMax Search: The branching factor in MiniMax depends on the game’s complexity, as it explores all possible moves and counter-moves.
In summary, the most important differences between A* search and MiniMax search in the presence of an adversary are their objectives, use of utility functions, consideration of an opponent’s actions, and their applicability to single-player and two-player environments. A* aims for optimal solutions in deterministic environments, while MiniMax deals with competitive decision-making in adversarial, stochastic environments.

254
Q

List 5 relevant attributes likely to be available to a system
developed to estimate the sale-value of a building (house,
apartment etc). Describe how you would compare such buildings
to identify the most similar buildings from within in a large
national database of recent sales.

A

Relevant Attributes for Building Sale-Value Estimation:

Location:

The geographical location of the building, including factors like neighborhood, city, and proximity to amenities, schools, and transportation hubs.
Property Size:

The size of the building, including the number of bedrooms, bathrooms, total square footage, and the size of the lot or land it occupies.
Condition:

The overall condition of the building, including recent renovations, maintenance, and any structural issues or upgrades.
Age of the Building:

The age of the building, as older buildings may have different characteristics and maintenance needs compared to newer ones.
Comparable Sales:

Data on recent sales of similar buildings in the area, including their sale prices, which can be used for comparative analysis.
Comparing Buildings for Similarity:

To identify the most similar buildings from a large national database of recent sales, you can use various similarity metrics and machine learning techniques:
Feature-Based Comparison:

Calculate a similarity score based on the selected attributes (e.g., location, size, condition).
Use techniques like Euclidean distance, cosine similarity, or Mahalanobis distance to measure similarity between buildings.
Machine Learning Models:

Train a machine learning model, such as a k-nearest neighbors (KNN) algorithm or a regression model, using the attributes as features and the sale price as the target variable.
Use the trained model to predict the sale price of a building and compare it with the prices of other buildings in the database.
Clustering:

Group similar buildings into clusters based on their attributes.
Find the cluster to which the target building belongs and identify the most similar buildings within that cluster.
Recommendation Systems:

Implement a recommendation system that suggests similar buildings based on user preferences and historical sales data.
Collaborative filtering or content-based filtering can be used to make recommendations.
Geospatial Analysis:

Use geospatial analysis to identify buildings in close proximity to the target location.
Consider factors like distance to amenities and neighborhood characteristics to find similar buildings.
By employing these techniques and considering the relevant attributes, you can effectively compare and identify the most similar buildings within a large national database of recent sales, providing valuable insights for estimating sale values.

255
Q

Assuming you have identified the K-Nearest Neighbours (KNN)
for a building, how would you use these K sale values to predict
the sale value of some given building that is new to the market?

A

Using K-Nearest Neighbors (KNN) for Sale Value Prediction:
Select a Value of K:

Determine the value of K, which represents the number of nearest neighbors to consider for prediction. This can be based on a heuristic or determined through cross-validation.
Identify the K Nearest Neighbors:

Calculate the similarity or distance between the new building and all other buildings in the database, using attributes such as location, size, condition, and age.
Select the K buildings with the closest attributes to the new building.
Weighted or Unweighted KNN:

Decide whether to use weighted or unweighted KNN.
Unweighted KNN assigns equal importance to each of the K neighbors.
Weighted KNN assigns different weights to neighbors based on their similarity or distance, giving more weight to closer neighbors.
Calculate the Predicted Sale Value:

For unweighted KNN, calculate the average sale value of the K nearest neighbors.
For weighted KNN, calculate a weighted average of the sale values of the K nearest neighbors, where the weights are based on their similarity or distance to the new building.
Sale Value Prediction:

The calculated average or weighted average is the predicted sale value for the new building.
Evaluation and Fine-Tuning:

Evaluate the prediction accuracy using metrics like mean absolute error (MAE) or root mean square error (RMSE).
Fine-tune the KNN model and the choice of K to optimize prediction performance.
By following these steps, you can use the K-Nearest Neighbors algorithm to predict the sale value of a new building based on the sale values of its nearest neighbors in the database.

256
Q

In your own words outline how Reinforcement Learning (RL) can
find good solutions to problems, even when it receives minimal
feedback (reward)? How is a fixed policy related to RL?

A

Reinforcement Learning (RL) and Minimal Feedback:
Exploration and Learning:

RL agents initially explore their environment by taking various actions without precise knowledge of their consequences.
Even with minimal feedback or sparse rewards, the agent learns through trial and error.
Reward Signals:

RL relies on reward signals to provide feedback on the desirability of actions and states.
Minimal feedback means that the agent may receive rewards infrequently, making it challenging to learn an optimal policy.
Policy Iteration:

RL employs a policy that defines the agent’s strategy or behavior in the environment.
The agent iteratively improves its policy by exploring and adapting based on the observed rewards.
Balancing Exploration and Exploitation:

Despite limited feedback, RL agents balance exploration (trying new actions) and exploitation (choosing actions that are known to yield higher rewards) to gradually improve their policies.
Temporal Credit Assignment:

RL agents must attribute rewards to specific actions taken in the past, which can be challenging with sparse feedback.
Techniques like temporal difference learning help in assigning credit to actions that contributed to favorable outcomes.
Fixed Policy in RL:

A fixed policy in RL refers to a strategy that the agent follows without adaptation or learning.
Initially, an RL agent may have a fixed or random policy when it lacks knowledge about the environment.
Over time, the agent’s goal is to learn and improve its policy to maximize cumulative rewards.
In summary, RL can find good solutions with minimal feedback by exploring the environment, iteratively learning from sparse rewards, and gradually improving its policy. A fixed policy is a starting point in RL, but the ultimate objective is to develop an adaptive and optimal policy through learning and experience.

257
Q

Describe the process of tokenization. Illustrate some of the
different tokens that might be identified from the following
sentence.
I won’t be long; ‘till the 5
th of June.

A

Tokenization Process:
Definition:

Tokenization is the process of breaking down a text or sentence into smaller units called tokens. Tokens are typically words or meaningful chunks.
Tokenization Rules:

Tokenization follows certain rules to separate text into tokens, such as splitting on spaces, punctuation, or specific characters.
Illustration:

In the sentence “I won’t be long; ‘till the 5th of June,” various tokens can be identified:

Tokens:

“I”
“won’t” (contraction of “will not”)
“be”
“long”
“;”
“‘till” (a contraction of “until”)
“the”
“5th”
“of”
“June”
The sentence has been tokenized into individual words and some punctuation marks.

258
Q

Explain each of the following as used in relation to Word2Vec:
i) integerize the corpus
ii) word window
iii) one-hot representation
Briefly describe one process used by Word2Vec to generate
vector representations (embeddings) for words.

A

Word2Vec Concepts:
i) Integerize the Corpus:

This involves assigning a unique integer or numerical identifier to each word in a corpus (collection of text).
It’s a preprocessing step that converts text data into a format suitable for mathematical operations.
Each word in the corpus is replaced with its corresponding integer, creating a numerical representation of the text.
ii) Word Window:

Word2Vec uses a “word window” or “context window” to determine the context words surrounding a target word.
It defines a fixed-sized window that slides over the text, and words within this window are considered as context words for the target word.
The choice of window size affects the context used for learning word embeddings.
iii) One-Hot Representation:

One-hot encoding is a method to represent words as binary vectors.

Each word in the vocabulary is represented as a vector where all elements are zero except for one, which is set to one.

This binary vector indicates the presence or absence of a specific word in a given context.

Word2Vec Process:

Generating Word Embeddings:

Word2Vec uses neural network learning to generate vector representations (embeddings) for words.
It trains a neural network, either using the Continuous Bag of Words (CBOW) or Skip-gram architecture.
In CBOW, the model predicts the target word based on its context words.
In Skip-gram, the model predicts context words given a target word.
During training, the model adjusts word embeddings to minimize prediction errors, resulting in word vectors that capture semantic similarities and relationships between words.
These learned embeddings represent words as continuous-valued vectors in a high-dimensional space, where similar words have similar vector representations.

259
Q

How does Word2Vec estimate the similarity/difference between
two words? Using an example such as the king – man + woman
= ? describe how vector representations can be combined to
generate some multi-word associations.

A

Word2Vec and Word Similarity:

Word2Vec estimates word similarity by measuring the cosine similarity between the vector representations of words.
Cosine similarity quantifies the angle between two vectors in a high-dimensional space, where a smaller angle indicates greater similarity.
Similar words have vectors that point in similar directions, resulting in a higher cosine similarity score.
Vector Arithmetic for Multi-Word Associations:

Vector representations can be combined to generate multi-word associations by adding and subtracting word vectors to capture semantic relationships.
Example: king - man + woman = ?

First, find the vector representation for each word: king, man, woman.

Calculate the vector king - man + woman.

This operation aims to find a word vector that is close in meaning to “king” but has a semantic relationship similar to “woman” instead of “man.”

The resulting vector will be close to the word vector for “queen.”

In this example, by performing vector arithmetic, we obtain a representation that is semantically similar to “queen” based on the relationships encoded in the word vectors. This showcases the ability of Word2Vec to capture semantic associations between words through vector operations.

260
Q

What is WordNet and how does it represent the meanings of
words? Make use of the term sense in your answer. Describe at
lest two different senses for a word in your answer.

A

WordNet and Word Meaning Representation:

WordNet is a lexical database and semantic network that organizes words into a structured hierarchy based on their meanings and relationships.

WordNet represents word meanings through the concept of “senses” or “word senses.” A sense is a specific meaning or interpretation associated with a word.

Example: Word “Bank”

Sense 1 (Financial Institution):

In the context of finance, “bank” refers to a financial institution where people deposit money, obtain loans, and conduct financial transactions.
Sense 2 (River Bank):

In a different context, “bank” can also refer to the side of a river or body of water.
WordNet represents these different senses as separate entries or synsets (synonym sets) in its database, each with a unique identifier. It also captures the relationships between these senses, such as hyponyms (subordinate concepts) and hypernyms (superordinate concepts), providing a rich structure for understanding word meanings and their interconnections.

261
Q

What is Part of Speech (PoS) tagging and how does it differ from
parsing? How might we expect approximating an N-gram (using
say 2-grams) to produce better results than calculating the actual
N-gram frequency?

A

Part of Speech (PoS) Tagging:

PoS tagging is the process of assigning a specific grammatical category or part of speech (e.g., noun, verb, adjective) to each word in a sentence.
It involves categorizing words based on their syntactic and grammatical roles within a sentence.
PoS tagging provides information about the word’s function in the sentence and helps in understanding sentence structure.
Parsing:

Parsing is the process of analyzing the grammatical structure of a sentence to identify its constituents (e.g., nouns, verbs, phrases) and the relationships between them.
Parsing results in a tree-like structure known as a parse tree or syntactic tree, which represents the sentence’s grammatical structure.
Approximating N-gram vs. Calculating Actual N-gram Frequency:

Approximating N-gram frequencies using a smaller N (e.g., 2-grams) can produce better results in some cases because it reduces the data sparsity problem.

When calculating the actual N-gram frequency for larger N values, especially in language modeling or text processing, data sparsity can lead to many N-grams having very low or zero frequencies. This makes it challenging to estimate probabilities accurately.

By using smaller N-grams, the frequency estimation task becomes less sparse, and it becomes easier to estimate the probabilities of word sequences. This can result in more robust language models and better performance in tasks like machine translation or text generation.

However, using smaller N-grams may sacrifice some context and may not capture long-range dependencies in language as effectively as larger N-grams or other language modeling techniques. The choice of N depends on the specific application and the trade-off between data sparsity and modeling accuracy.

262
Q

In PoS tagging, a probability of zero (0) has two possible
interpretations. What are these interpretations and what
limitations arise? Briefly outline one way of overcoming a
limitation arising from a zero probability.

A

Interpretations of Zero Probability in PoS Tagging:
Word Never Occurs with that PoS:

A zero probability may indicate that a particular word has never been observed in the training corpus with a specific part of speech (PoS) tag.
For example, if a word has never been seen as a verb in the training data, the PoS tagging system assigns a zero probability to that word-verb pair.
Data Sparsity:

Zero probabilities can arise due to data sparsity issues, especially when dealing with rare or infrequent words or PoS tags.
Data sparsity can limit the effectiveness of PoS tagging systems, as they struggle to provide accurate PoS tags for less common words or combinations.
Overcoming Limitations of Zero Probabilities:

One way to overcome the limitation arising from zero probabilities is by employing smoothing techniques:

Laplace (Add-One) Smoothing:

Laplace smoothing adds a small constant (usually 1) to all observed word-PoS count combinations, effectively redistributing some probability mass from frequent pairs to unseen or rare pairs.
This ensures that no probability becomes zero, addressing the issue of zero probabilities.
However, Laplace smoothing can lead to overly smoothed estimates and may not always capture fine-grained nuances in language.
Smoothing parameters other than 1 can also be used to control the degree of smoothing.
Smoothing methods like Laplace smoothing help in providing reasonable probabilities for unseen or infrequent word-PoS tag pairs, making PoS tagging systems more robust and less sensitive to data sparsity issues.

263
Q

Consider the challenge of generating parse trees for the following
sentence. What problem does ambiguity present and how might
this ambiguity present itself?
I shot an elephant in my pyjamas.

A

Ambiguity in Parsing:

Ambiguity in natural language can pose challenges when generating parse trees for sentences.
The problem of ambiguity arises when a sentence can be interpreted in multiple ways, leading to different syntactic structures or parse trees.
Example Sentence: “I shot an elephant in my pyjamas.”

Ambiguity presents itself in this sentence in multiple ways:
Prepositional Phrase Attachment:

Ambiguity can arise in deciding how the prepositional phrase “in my pyjamas” attaches to the rest of the sentence.
It can be attached to “shot,” suggesting that the speaker was wearing pyjamas when shooting the elephant.
Alternatively, it can be attached to “elephant,” implying that the elephant was in the speaker’s pyjamas.
Part-of-Speech Ambiguity:

The word “shot” can be interpreted as a past tense verb (action) or a noun (a shot).
Similarly, “pyjamas” can be a noun (clothing) or a verb (referring to an action).
Pronoun Reference:

The sentence lacks explicit clarity on who or what the pronoun “I” refers to. It could refer to the person wearing pyjamas, the person who shot, or someone else.
Semantic Ambiguity:

The sentence carries semantic ambiguity as well. For instance, it’s unlikely that someone would shoot an elephant while wearing pyjamas, but the sentence humorously plays with this idea.
The challenge in parsing such sentences lies in determining the correct interpretation and syntactic structure based on the context and intended meaning. Ambiguity can result in multiple valid parse trees, and disambiguation techniques are often required to resolve these ambiguities.

264
Q

Using a suitable example or otherwise, describe how graphs can
be used to represent the contents(meaning) of documents.
Outline how the word-window technique can be used to generate
graphs, based on word co-occurrence.

A

Graph Representation of Document Contents:

Graphs can be used to represent the contents and meaning of documents by capturing relationships between words or concepts. Each node in the graph represents a word or concept, and edges indicate connections or relationships between them.
Word-Window Technique for Generating Graphs:

The word-window technique is a method to generate graphs based on word co-occurrence, where words that appear together in a certain context are connected in the graph.

Document Preprocessing:

Start by preprocessing the document, including tasks like tokenization and removing stopwords or irrelevant words.
Window-based Co-occurrence:

Define a fixed-size “window” that slides over the text. Words within this window are considered in the context of each other.
For each word in the document, identify the words that appear within the window as its co-occurring words.
Graph Representation:

Create a graph where each word or concept from the document becomes a node in the graph.
Connect nodes (words) with edges based on their co-occurrence within the defined window.
The weight of an edge can represent the strength or frequency of co-occurrence.
Example:

Consider the sentence: “The cat chased the dog.”
With a window size of 2, the word “cat” would have edges connecting it to “the” and “chased.”
Similarly, “dog” would have edges to “the” and “chased.”
Graph Analysis:

Once the graph is constructed, various graph analysis techniques can be applied to uncover relationships, identify central nodes, or extract meaningful information from the document’s contents.

This approach allows for the generation of graphs that represent the semantic relationships and contextual associations between words or concepts in a document, facilitating tasks like document summarization, topic modeling, and information retrieval.

265
Q

How can the Generate and Test (G&T) strategy be used to solve
problems, such as the Rubik’s cube? What are the main strengths
and weaknesses of G&T?

A

Generate and Test (G&T) Strategy:

G&T is a problem-solving strategy that involves generating potential solutions and testing them to find a satisfactory or optimal solution.
Using G&T for Solving the Rubik’s Cube:

In the context of solving the Rubik’s Cube, G&T can be applied as follows:

Generate:

Generate a sequence of moves or algorithms that manipulate the cube’s configuration.
These moves are based on known solving techniques and strategies.
Test:

Apply the generated sequence of moves to the cube’s initial configuration.
Evaluate the result to check if the cube is solved or closer to a solved state.
If the cube is not solved, return to the generation phase and generate a new sequence of moves.
Repeat:

Continue the generate and test process iteratively until the Rubik’s Cube is solved or until a satisfactory solution is found.
Strengths of G&T:

Versatile: G&T can be applied to a wide range of problems and domains.
Simplicity: The strategy is straightforward and easy to implement.
Exploration: It explores potential solutions systematically, making it suitable for problems without known algorithms.
Weaknesses of G&T:

Exhaustive Search: G&T may require an exhaustive search of the solution space, which can be computationally expensive for complex problems like the Rubik’s Cube.
Lack of Optimality: G&T does not guarantee finding the optimal solution; it may settle for a satisfactory solution rather than the best one.
Inefficiency: It may require many iterations and evaluations, leading to inefficiency when alternative, more efficient strategies are available.
G&T is useful for exploring solution spaces, but its efficiency and effectiveness depend on the problem’s complexity and the availability of better-suited problem-solving techniques.

266
Q

What is a heuristic? Describe in detail the specific features your
heuristic function would look for to solve one of the following two
alternate problems. Show how a typical problem state is
represented and the data structure you would use.
15 sliding-tile puzzle or else Rubik’s cube

A

Heuristic Function:

A heuristic is a problem-solving strategy or function that provides an estimate of the cost or distance to reach a goal state from a given state. Heuristics guide search algorithms to prioritize more promising paths.
15-Sliding Tile Puzzle:

Heuristic Features:

In the 15-sliding tile puzzle, a heuristic function may consider the following features:
Misplaced Tiles: Count the number of tiles that are not in their correct position relative to the goal state.
Manhattan Distance: Calculate the sum of the Manhattan distances (horizontal and vertical distances) of each tile from its correct position.
Linear Conflict: Account for tiles that are in the same row or column and need to be moved around each other to reach their goal positions.
Problem State Representation:

Represent each state as a 4x4 grid or matrix, where each cell contains a number from 1 to 15 (representing tiles) and one empty space (usually denoted as 0).
Data Structure:

Use a 2D array or matrix data structure to represent the puzzle state, with each element representing a tile’s position.
Rubik’s Cube:

Heuristic Features:

For a Rubik’s Cube, heuristic features may include:
Cubie Distances: Estimate the distance of each cubie (small cube) from its correct position and orientation.
Edge and Corner Permutations: Evaluate the number of edges and corners in their correct positions.
Facelet Colors: Consider the number of facelet colors that match the target configuration on each face.
Problem State Representation:

Represent the Rubik’s Cube state as a data structure that captures the positions and orientations of all cubies, including edges and corners.
Data Structure:

Use a data structure that stores the current configuration of the Rubik’s Cube, including the arrangement and orientation of cubies. This data structure could involve arrays, matrices, or objects to represent the cube’s different components.
The choice of heuristic features and the data structure for each problem depends on the specific requirements of the heuristic function and the search algorithm used to solve the problem.

267
Q

What is adversary search? Briefly describe why MiniMax
evaluates the heuristic function on leaf-nodes only. How does a
minimax search tree differ from a traditional search tree?

A

Adversary Search:

Adversary search is a type of search algorithm used in games or decision-making scenarios where two opposing players or agents take turns making moves or decisions to maximize or minimize a certain objective function.
MiniMax and Heuristic Evaluation:

MiniMax evaluates the heuristic function on leaf nodes only because leaf nodes represent terminal states where the game or decision-making process reaches an end. At these terminal states, there are no further moves, and the heuristic function provides an evaluation of the outcome.
This evaluation is then propagated upwards through the search tree to determine the best move for the current player.
MiniMax Search Tree vs. Traditional Search Tree:

In a traditional search tree, each node represents a state, and the tree expands to explore possible actions and transitions to new states. The goal is to find a solution or reach a goal state.
In a MiniMax search tree, the focus is on adversarial games or decision-making, where there are two players with opposing goals. The tree represents different game states, and each node evaluates the outcome from the perspective of one player, aiming to minimize the opponent’s advantage (minimize the maximum possible loss).
The key difference is that a MiniMax search tree includes alternating layers of nodes representing players (maximizing and minimizing), and the objective is to find the best move for the current player while considering the opponent’s optimal responses.

268
Q
A

B with a value of 2.

269
Q

How does Word2Vec generate vector representations of words?
Describe the limitations should we might expect from vectorbased representations of ambiguous words such as one of the
following:
 “cook” - consider different context words like: cook/chef,
Tim Cook, Cook County, or else
 “apple” - with differing contexts like, orchard, Apple
corporation, Apple/iPhone, or else
 “Queen” - with differing contexts like Elizabeth, Music.

A

Word2Vec and Vector Representations:

Word2Vec generates vector representations (embeddings) of words using neural network-based models, typically either the Continuous Bag of Words (CBOW) or Skip-gram architecture.
These models learn to map words into high-dimensional vector spaces where words with similar meanings or usage contexts have similar vector representations.
Limitations of Vector-Based Representations for Ambiguous Words:

Ambiguous words like “cook,” “apple,” and “Queen” can pose challenges to vector-based representations due to their multiple meanings and contexts. Limitations include:

Contextual Ambiguity:

Vector representations may struggle to distinguish between different meanings of the same word based solely on context.
For example, the word “cook” can refer to a chef, a cooking action, or a person’s name (Tim Cook).
Polysemy:

Polysemous words (having multiple related meanings) can lead to vector representations that mix these meanings, making it challenging to disambiguate them.
For “apple,” the vector may represent both the fruit and the technology company, making it less useful for specific contexts.
Context Dependency:

The meaning of a word often depends on the surrounding context, and vector representations may not capture all contextual nuances.
In the case of “Queen,” the vector may not fully distinguish between Queen Elizabeth and a musical context, leading to ambiguity.
Sparse Representations:

Vector representations may not handle rare or specific contexts well, resulting in sparse and less informative vectors.
For example, a rare context like “Cook County” may not be adequately represented in the vector space.
Addressing these limitations may involve using more sophisticated contextual models, incorporating additional contextual information, or employing techniques like sense disambiguation to better capture the diverse meanings and usages of ambiguous words.

270
Q

Consider a neural network with 3 hidden layers learning a
complex spiral pattern (such as found on the TensorFlow
playground). What features would you expect to be identified in
the first hidden layer nearest the input? Compare and contrast
these to the features identified by the final hidden-layer nearest
the output.

A

Neural Network with 3 Hidden Layers and Complex Spiral Pattern:

When training a neural network with 3 hidden layers to learn a complex spiral pattern, there are differences in the features identified in the first hidden layer (nearest the input) compared to the final hidden layer (nearest the output).
First Hidden Layer (Nearest Input):

Features in the first hidden layer are typically more primitive and capture basic patterns or edges in the input data.
Expected features in the first hidden layer may include:
Simple edges or lines.
Gradient changes.
Basic shapes like curves or circles.
These features are generic and serve as foundational elements for higher-level representations.
Final Hidden Layer (Nearest Output):

Features in the final hidden layer become more complex and abstract as they capture higher-level representations of the data.
Expected features in the final hidden layer may include:
Spiral patterns or curves resembling the target pattern.
More sophisticated combinations of edges and shapes.
Features that represent specific twists or turns in the spiral pattern.
These features are specialized and closely related to the task at hand (recognizing the spiral pattern).
Contrast:

The key contrast between the two layers is their level of abstraction and specificity:
The first hidden layer extracts basic, low-level features that are generic and applicable to various patterns.
The final hidden layer captures high-level, task-specific features that directly contribute to recognizing the complex spiral pattern.
The first layer focuses on general data transformations, while the final layer fine-tunes representations for the specific problem domain.
As information flows through the layers, it becomes increasingly specialized and tailored to the target pattern, allowing the network to make accurate predictions or classifications.

271
Q

What is Part of Speech (PoS) tagging?

A

Part of Speech (PoS) tagging is a process in natural language processing (NLP) where each word in a sentence is assigned a Part-of-Speech tag. This tag indicates the word’s role in the sentence, such as whether it is a noun, verb, adjective, adverb, etc. The tagging process helps in understanding the grammatical structure of sentences, which is crucial for various NLP tasks like parsing, text-to-speech conversion, information extraction, and more.

272
Q

How does Reinforcement Learning use a fixed policy to learn
using a sparse reward feedback? You should use a diagram to
illustrate your answer.

A

Reinforcement Learning (RL) and Fixed Policy:

RL uses a fixed policy to explore and learn in environments where the reward feedback is sparse or delayed. The fixed policy defines how the agent selects actions based on the current state.
Key Components:

In RL, the key components include:
Agent: The learner or decision-maker.
Environment: The external system with which the agent interacts.
State (S): Represents the current situation or configuration of the environment.
Action (A): The set of possible actions the agent can take in each state.
Policy (π): A strategy that maps states to actions, defining the agent’s behavior.
Using a Fixed Policy with Sparse Rewards:

In scenarios with sparse rewards, learning a fixed policy becomes essential. The agent follows this policy to interact with the environment and collect experience over time.
Diagram:

The agent starts with a fixed policy (π) that dictates its actions in different states.
The agent interacts with the environment by taking actions based on its policy.
The environment responds, and the agent transitions to new states and receives rewards (or lack thereof) based on its actions.
The agent’s experiences (state, action, reward, new state) are collected over multiple interactions.
These experiences are used to update the agent’s policy (π) through a learning process.
The agent aims to improve its policy over time to maximize cumulative rewards.
Learning and Adaptation:

Despite having a fixed policy, the agent can adapt and improve its strategy by learning from feedback. This learning process involves adjusting the policy to favor actions that lead to better rewards, even in environments with sparse reward signals. Over time, the fixed policy becomes more refined and effective at achieving the agent’s objectives.

273
Q

What limitation does model-free Q-learning overcome? Describe
how Q-learning might learn, highlighting any similarities and
differences with fixed policy learning?

A

Model-Free Q-Learning:

Model-free Q-learning is a reinforcement learning approach that overcomes the limitation of requiring a complete model of the environment, which is often impractical or impossible to obtain.
Learning Process:

In Q-learning, the agent learns by interacting with the environment through a trial-and-error process. The learning process involves the following steps:

Initialization: Initialize a Q-table or Q-function that estimates the expected cumulative rewards (Q-values) for taking specific actions in specific states.

Exploration: The agent explores the environment by taking actions based on its current policy (often using ε-greedy exploration).

Observation and Reward: After taking an action, the agent observes the new state and receives a reward from the environment.

Updating Q-Values: The agent updates its Q-values based on the observed reward and the Q-learning update rule. This rule involves adjusting the Q-value for the current state-action pair towards the maximum expected future reward.

Policy Improvement: The agent’s policy may gradually shift to favor actions that lead to higher Q-values.

Similarities and Differences with Fixed Policy Learning:

Similarities:
Both fixed policy learning and Q-learning involve defining a strategy (policy) for the agent to choose actions in different states.
Both approaches aim to maximize cumulative rewards over time.
Differences:
Fixed Policy Learning: In fixed policy learning, the policy remains constant throughout the learning process. The agent learns to execute the same policy more effectively over time.
Q-Learning: Q-learning starts with an initial policy but continually updates it based on the learned Q-values. The agent can explore and potentially discover a better policy during the learning process.
Fixed policy learning does not involve learning Q-values or estimating the value of state-action pairs, whereas Q-learning focuses on precisely estimating Q-values to guide action selection.
Q-learning is more flexible and adaptive as it allows the agent to learn an optimal policy without initially knowing the best actions to take in each state.
Overall, Q-learning overcomes the limitation of requiring an accurate model of the environment by learning directly from experiences and rewards, making it suitable for environments with unknown dynamics or sparse rewards.

274
Q

What is coreference resolution? You should use the following
sentence to illustrate your answer.
John was in the car when he said it was time to sell it.

A

Coreference resolution is a process in natural language understanding where the aim is to find all expressions that refer to the same entity in a text. It’s a crucial part of building a coherent understanding of a text, as it involves identifying various phrases that refer to the same thing.

For example, let’s consider the sentence: “John was in the car when he said it was time to sell it.” In this sentence, coreference resolution involves:

Identifying that “he” refers to “John.”
Recognizing that the first “it” refers to “the car.”
Understanding that the second “it” also refers to “the car.”
This process is important for understanding the relationships between different parts of the text and for maintaining the coherence of the narrative or discussion. Without coreference resolution, understanding the full context of a conversation or text can be challenging, especially in complex narratives or discussions where multiple entities are referenced.

Another example from your PDF is: “A man went driving in his car and then parked it.” Here, coreference resolution would identify that “it” refers to “his car”​

275
Q

Compare and contrast Dependency Parsers (DP) with
Constituency Parsers (CP). You should make use of a suitable
example to illustrate your answer.

A

Dependency Parsers (DP) and Constituency Parsers (CP) are two approaches used in Natural Language Processing (NLP) for parsing sentences, each with its unique methodology and representation.

Constituency Parsers:

Constituency parsers break down a sentence into sub-phrases or constituents.
These parsers use a parse tree to match word categories such as sentence (S), noun phrase (NP), verb phrase (VP), prepositional phrase (PP), etc.
The nodes in a constituency parse tree represent lexical categories, and internal nodes represent abstract structures within the sentence.
Example: In the sentence “The cat sat on the comfortable chair,” a constituency parser would identify sub-phrases like “The cat” (NP) and “sat on the comfortable chair” (VP).
Dependency Parsers:

Dependency parsers, on the other hand, focus on the relationships between words in a sentence, particularly the dependencies between words.
The lexical structure in dependency parsing is represented through binary relations. Each word in the sentence becomes a node in the dependency parse tree.
Unlike constituency parsing, there are no abstract structures in dependency parsing; all nodes are words from the sentence itself.
Example: For the same sentence “The cat sat on the comfortable chair,” a dependency parser would identify relationships such as ‘cat’ (subject) - ‘sat’ (verb), ‘sat’ - ‘on’ (preposition), ‘on’ - ‘chair’ (object), etc.
Comparison:

Focus: Constituency parsers focus on the hierarchical structure of sentences, breaking them down into constituent parts, while dependency parsers focus on the relationships and dependencies between individual words.
Representation: Constituency parsing results in a tree structure with abstract nodes representing different types of phrases, whereas dependency parsing results in a tree where each word is a node, and edges represent dependencies between these words.
Use Cases: Constituency parsing is useful for understanding the overall structure of a sentence and is often used in syntax analysis, while dependency parsing is more suited for understanding the relationships between words, which is crucial in tasks like relation extraction and semantic role labeling

276
Q

What are the main strengths and limitations of the word-window
technique for generating Knowledge Graphs (KG)?

A

The word-window technique for generating Knowledge Graphs (KGs) involves creating nodes from Named Entity Recognition (NER) names, non-dictionary nouns, and places, and then connecting these nodes with edges based on their co-occurrence within a specific window of words (typically within 15 words of each other in a text).

Strengths of the Word-Window Technique:

Simple Implementation: The word-window technique is relatively straightforward to implement, as it mainly involves identifying entities and their co-occurrences within a defined word range.
Captures Local Context: By focusing on word co-occurrences within a short range, this method can effectively capture local context and immediate relationships between entities.
Useful for Certain Types of Analysis: This approach can be particularly useful for extracting and visualizing immediate relationships or connections in texts, such as in narratives or reports.
Limitations of the Word-Window Technique:

Manual Intervention Needed: Some degree of manual filtering and intervention might be required, for example, to remove irrelevant place names or to disambiguate entities with the same name (e.g., distinguishing between “Jon Snow” and “Jon Arryn” in a text).
Limited by Word Proximity: The technique primarily relies on proximity, so it might miss important relationships between entities that occur beyond the defined word window.
Lacks Edge Labeling and Direction: Edges in the resulting graph are un-directed and un-labeled, which limits the ability to understand the nature of the relationships (like causal, temporal, etc.) between entities.
May Overlook Broader Context and Deeper Relationships: Since the focus is on immediate word co-occurrences, the technique might not effectively capture broader context or deeper, more complex relationships that span larger sections of the text or require more sophisticated semantic understanding.
An example to illustrate this might be a sentence from a text: “A gift from the Magister Illyrio,” Viserys said, smiling. In a word-window KG, nodes would be created for “Magister Illyrio” and “Viserys,” and an edge would connect them based on their proximity in this sentence, capturing their immediate relationship in this context​

277
Q

How does WordNet represent the meaning of words and how can
words be compared using WordNet? How does ConceptNet
extend WordNet’s representations? Illustrate your answer with a
suitable example.

A

WordNet represents the meaning of words by organizing them into sets of synonyms called synsets. Each synset encompasses a concept and includes definitions and examples. Words within a synset are semantically similar. WordNet also provides relationships between these synsets, such as hypernymy (more general terms) and hyponymy (more specific terms), allowing for the comparison of meanings and understanding of word hierarchies.

ConceptNet extends WordNet’s representations by incorporating common sense knowledge and real-world facts. ConceptNet includes relationships not covered in WordNet, making it broader and more suited for understanding context and practical use-cases. It connects words and phrases with labeled edges, representing various types of relations, such as “PartOf,” “UsedFor,” or “CapableOf.”

a general example could be comparing the words “apple” and “fruit” in WordNet, where “apple” is a hyponym of “fruit.” In ConceptNet, the relationship might be extended with additional context, such as “apple” being “UsedFor” eating or being part of a “HealthyDiet.”(

278
Q

How do the attention layers of GPT-3 help it to more accurately
learn natural language? Use an illustrative diagram in your
answer.

A

Contextual Understanding: Attention mechanisms allow the model to weigh different parts of the input text differently. This means the model doesn’t treat all words or phrases equally; instead, it “pays more attention” to those that are more relevant in a given context. For instance, in a sentence like “The cat sat on the mat,” attention helps the model understand that “cat” is closely related to “sat” and “mat.”

Handling Long-Range Dependencies: In natural language, sometimes the meaning of a word or phrase depends on other words far away in the text. Traditional neural network architectures struggle with this. Attention layers help by enabling the model to create connections between distant words, thereby preserving the meaning across longer texts.

Dynamic Focus: Unlike earlier models that processed input in a fixed manner, attention layers dynamically determine what to focus on based on the current input. This flexibility is crucial for the nuanced understanding of language, where the significance of a word or phrase can change based on context.

Improved Memory and Recall: Attention mechanisms can be thought of as a way for the model to remember and utilize important information from earlier in the text. This is essential for maintaining coherence in longer passages and for remembering key details.

Enhanced Learning: During training, the attention layers help GPT-3 to better learn the relationships and structures in natural language. This leads to more accurate predictions about what word or phrase should come next in a given sentence(diagram need to added)
It shows multiple layers of a neural network, each with attention heads.
The attention heads use three vectors (smaller than the input embedding) to produce a weighted sum of inputs.
The diagram highlights how these attention heads learn to represent words in context, demonstrating dependencies between words and their positions.
Both forward and backward attention flows are indicated in different layers.

279
Q

Discuss three limitations associated with GPT-3. Why might
producing novel analogical comparisons represent a significant
challenge for GPT and the Large Language Models (LLM)?

A

Facts vs Nonsense: GPT-3 and similar LLMs often blend facts with random ideas. This reflects the model’s tendency to generate content that, while coherent in structure, may mix accurate information with fabricated or irrelevant details.

Understanding Concepts: GPT-3 faces challenges in generating infinite conceptualizations of a category. For example, when considering the concept of “trees,” the model should ideally be able to conjure different kinds of trees, whether real or imaginary, realistic or cartoonish, concrete or metaphorical, such as natural trees, family trees, or organizational trees. The model, however, struggles with “out of distribution” (OOD) observations and members, which are scenarios or examples that fall outside the patterns it has learned from the training data.

Forming Abstractions and Drawing Analogies: Creating novel, deep comparisons between arbitrary collections of information represents a significant challenge for GPT-3 and other LLMs. These models often struggle to abstractly compare different collections of information to draw meaningful analogies​​.

Producing novel analogical comparisons is particularly challenging for GPT and LLMs because it requires a deep understanding of the underlying concepts and the ability to draw abstract similarities between seemingly unrelated items or ideas. LLMs like GPT-3 are trained on large datasets and can recognize patterns and similarities based on the data they have been exposed to. However, they lack genuine understanding or the ability to think conceptually and creatively, which are crucial for generating meaningful and novel analogical comparisons. This limitation stems from the fact that these models primarily rely on statistical correlations and pattern recognition, rather than true comprehension or innovative thinking.