2014年10月17日 星期五

Post 3 - Advantages and Disadvantages of RageRank and HITS

PageRank

PageRank is a link analysis algorithm which is invented is Sergey Brin and Larry Page. This PageRank algorithm is using a recursive scheme which is similar to Kleinberg's HITS  algorithm but PageRank calculate the page ranking by know the webpage  linking relationships. The ideal is very simple, it assumed s page is important if it is pointed by other important pages. The PageRank algorithm decided that the ranking score of a page is determined by summing up all the ranking score of all the pages that pointed at that page. This algorithm is reviewing and weighting all the elements which is posted on a page.

Here are some maths. for the PageRank algorithm. Ref from 3.2
http://www.slideshare.net/shatakirti/pagerank-and-hits













HITS

HITS is developed by Jon Kleinberg, he is a professor in the Department of Computer Science at Cornell. HITS is designed to solve the Web Search problem. The HITS algorithm is making use of the link structure of the web in order to rank the page relevant for a particular searching keyword.  

Here are some maths. for the HITS algorithm. Ref from 2.4

 



























Advantages and Disadvantages of RageRank and HIT

After the lecture on 17 Oct, we went thought the RageRank and HITS algorithm. So
let us discuss more about the advantages and disadvantages of PageRank and HITS.


Advantages of PageRank


1. PageRank algorithm make the webpage link analytic become more robust.
2. PageRank is a global scale measurement
3. PageRank is query independent.

Disadvantages of PageRank


1. Older pages may have higher rank. It is because a new page even have some very good contents but it may not have many links in the early state.
2. PageRank can be easily increased by the "link-farms"  see figure 1

Link-Farms(definition from http://en.wikipedia.org/wiki/Link_farm) - link farm is any group of web sites that all hyperlink to every other site in the group.[1] In graph theoretic terms, a link farm is a clique.



















figure 1

3. Rank can be raised by buying "links".



Advantages of HITS

1. Allow query by topics, which may be able to provide more relevant authority and hub pages.
2. Allow to build the adjacency matrices from the neighbourhood graphs and power iterations does not present any computational burdens.

Disadvantages of HITS

1. Neighbourhood graphs must must need to constructed in "On-Fly" manner.
2. it suffers from topic-drift.
The neighbourhood graphs N could contain nodes which have high authority scores  for a topic unrelated to the original query.
3. HITS can not detect advertisement.
4. HITS can easily be spammed by adding out-links in one's own page.
5. The query time is slow. It is because  "On-Fly" neighbourhood graphs creation.
 

1 則留言:

  1. Truly link analysis algorithms have advantages and disadvantages. Google is proud of its searching ranking algorithm and holds it as a monopolistic company in the internet world. But Google is still improving its algorithms to provide better search results. PageRank is that pages are ranked based on its "hubness" and "authority" and they mutually reinforce themselves. In reality, search engine are using more diverse and complicated ranking techniques besides PageRank or HIT algorithms. For example, even a page ranked 10th, but every searcher will click into the page; another page ranked 2rd, but fewer searchers will click into the page. Actually, the former one should rank higher, and the latter one should rank lower in next keyword search.

    回覆刪除