|
|
Topic-specific Crawling Algorithm Based on EIPageRank |
Chunxia Yin; Jian Liu b; Huiying Zhang |
|
|
Abstract Topic-specific crawler is developed to collect pages related to a given topic from the web, it starts with a list of URLs and recursively retrieves all linked pages. PageRank is a link analysis algorithm used by
the Google Internet search engine, it is a topic-independent measure of the importance of a web page, and must be combined with one or more measures of query relevance for ranking the results of a search.
IPageRank algorithm was developed based on extensive metadata method RW and hyperlink analysis method PageRank, but when it is used in topic-specific crawling, the crawler often has a lower recall rate. In this paper, firstly, EIPageRank is proposed based on IPageRank algorithm, it uses vector space model to compute the relevance score of extensive metadata, and add parent page relevance weight to the authority that the page pass on to. Secondly, a URL ordering algorithm with EIPageRank is presented for topic-specific crawler. Experiment indicates that crawler based on EIPageRank can get higher harvest rate and target recall rate than crawler based on PageRank and crawler based on IPageRank.
|
|
|
|
|
[1] |
Lingxiao Ma;Yi Li;Hancong Tang;Weilai Chi;Depeng Dang. Parallel Chameleon Clustering Based on MapReduce[J]. , 2015, 12(6): 2053-2062. |
[2] |
Jun Chen;Zhengyang Luo;Chengying Gao. An Improved Hole-filling Technology Based on MLS[J]. , 2015, 12(6): 2063-2072. |
[3] |
Weixin Xie;Hongbin Huang;Haotian Zhai;Weiping Liu. Features Extraction and Classification of Rice Paper Images Based on Wavelet Transform[J]. , 2015, 12(6): 2073-2079. |
[4] |
Yehong Du;He Cui;Bing Li;Jie Li. Research on Regional Coverage with LAVs Based on MOPSOA[J]. , 2015, 12(6): 2081-2092. |
[5] |
Xiaojian You;Xiaohai He;Xuemei Han;Chun Wu;Hong Jiang. A Novel Cognitive Radio Decision Engine Based on Chaotic Quantum Bee Colony Algorithm[J]. , 2015, 12(6): 2093-2106. |
[6] |
Yao Fan;Yanli Chu. pplication of Improved ART Algorithm in Concrete Ultrasonic Imaging[J]. , 2015, 12(6): 2107-2116. |
[7] |
Lun Xie;Xin Liu;Zhiliang Wang. Micro-expression Cognition and Emotion Modeling Based on Gross Reappraisal Strategy[J]. , 2015, 12(6): 2117-2132. |
[8] |
Xiaoxue Guo;Haosen Lin. Remain Resource Reallocation DRA Algorithm with Multiple QoS Parameters Constraint[J]. , 2015, 12(6): 2133-2141. |
[9] |
Zhiwei Ni;Xuhui Zhu;Liping Ni;Meiying Cheng;Yiling Wang. An Improved Discrete Optimization Algorithm Based on Artificial Fish Swarm and Its Application for Attribute Reduction[J]. , 2015, 12(6): 2143-2154. |
[10] |
Siyuan Liu;Meng Wang;Haosong Hu. A Method and Application of Signal Demodulation Based on Wavelet Packet and Wavelet Ridge Decomposition[J]. , 2015, 12(6): 2155-2164. |
[11] |
Yuan Xi;Kai Cheng;Tao Xiao;Xitong Lou;Lei Cheng;Yanjuan Hu. Parametric Design of Reverse Blowing Pickup Mouth Based on Flow Simulation[J]. , 2015, 12(6): 2165-2175. |
[12] |
Hui Zhang;Peng Zhao;Jian Gao;Chengxiang Zhuge;Xiangming Yao. An Effective Intelligent Method for Optimal Urban Transit Network Design[J]. , 2015, 12(6): 2177-2184. |
[13] |
Xin Wang;Fulian Yin;Jianping Chai;Xinran Wang. The Research of Broadcast Television Community Discovery Technology Based on Double-weight Gaussian Kernel Similarity[J]. , 2015, 12(6): 2185-2196. |
[14] |
Yanli Huang;Beibei Xu;Xiaoliang Li. Properties of Rational General Solutions for First Order Multivariate Autonomous Rational Differential Systems[J]. , 2015, 12(6): 2197-2204. |
[15] |
Jianli Feng;Xiaomin Zhang. An Identification Algorithm of Passive Millimeter Wave Detection Armored Targets Based on Signal Complexity[J]. , 2015, 12(6): 2205-2212. |
|
|
|
|