PageRank Flashcards
(14 cards)
What is the PageRank algorithm and what is it used for?
An algorithm developed by Google to discern the popularity of a webpage
What is PageRank intended to simulate?
If a web surfer clicks on pages at random, what is the probability he will eventually reach that page?
How does PageRank calculate the popularity of a webpage?
As the sum of the rank of its neighbours each divided by the number of outbound links on that website
What is the score that PageRank calculates called?
Authority
What kind of webpage will have the highest authority in PageRank?
Pages with lots of high-ranking pages linking to it
How can we perform PageRank in a step-by-step process?
Initialise a vector P with each value set to an initial value, and a transition matrix H. Iterate M times such that P is equal to the transition matrix multiplied by the value of P at the previous time step. Repeat until we approach infinity and iteration doesn’t change anything
What is a sink page?
A page that has no outgoing links to any pages
What is the problem caused by sink pages?
They cause PageRank to approach 0 even for important pages
How do we solve the sink page problem?
Distribute the rank of the sink page over all pages of the web, such that each page shares 1/N
What are cycle pages?
Webpages that are linked to a closed cycle
What is the problem caused by cycle pages?
They lead to an infinite authority increase
How do we solve the cycle page problem?
The random surfer model
What is the random surfer model?
An observation that states that a web surfer will either click the link on a webpage, or randomly start a new session
How do we incorporate the random surfer model into PageRank?
We denote a fixed probability d that represents the probability our user will click on a website, and multiply it by our original PageRank algorithm. Then, add on the probability that our user will get bored divided by the number of pages