Web search engine companies are intensively running learning to rank algorithms to improve the se... more Web search engine companies are intensively running learning to rank algorithms to improve the search relevance. Neural network (NN)-based approaches, such as LambdaRank, can significantly increase the ranking quality. While, their training is very slow on a single computer and inherent coarse-grained parallelism could be hardly utilized by computer clusters. Thus an efficient implementation is necessary to timely generate acceptable NN models on frequently updated training datasets. This paper presents our work in accelerator. A SIMD streaming architecture is proposed to i) efficiently map the query-level NN computation and data structure to FPGA, ii) fully exploit the inherent fine-grained parallelism, and iii) provide scalability to large scale datasets. The accelerator shows up to 17.9X speedup over the software implementation on datasets from a commercial search engine.
ABSTRACT This paper describes a FPGA-based hardware acceleration system for LambdaRank algorithm.... more ABSTRACT This paper describes a FPGA-based hardware acceleration system for LambdaRank algorithm. LambdaRank Algorithm is a Neural Network (NN)-based learning to rank algorithm. It is intensively used by web search engine companies to increase the search relevance. Since i) the cost function for the ranking problem is much more complex than that of traditional Back-Propagation(BP) NNs, and ii) no coarse-grained parallelism exists, LambdaRank is hard to be efficiently accelerated by GPU or computer clusters. We presents a FPGA-based accelerator solution to provide high computing performance. A compact deep pipeline is proposed to handle the complex computing in the batch updating. The area scales linearly with the number of hidden nodes in the NN model. We also carefully design a data format to enable streaming consumption of the training data from host computer. The accelerator shows up to 24.6 speedup compared with the pure software implementation on datasets from a commercial search engine.
Web search engine ranks web pages according to their relevance to user queries, which is critical... more Web search engine ranks web pages according to their relevance to user queries, which is critical for the success of commercial search engines. Rank Boost algorithm is promising in Web relevance ranking area, while its computation complexity makes our existing implementations (including single node software-based implementation and a FPGA-based accelerator) too slow to reflect the dynamics of the Web. Moreover, previous implementations can not handle the huge web-scale data. As such, in this paper, we present the RankBoost implementation on a MPI-based distributed FPGA-based accelerators. Our results show that the combination of the coarse parallel efficiency of distributed system and the fine parallel efficiency of reconfigurable hardware accelerators can significantly increase the computing performance.
ACM Transactions on Reconfigurable Technology and Systems, 2009
ABSTRACT Search relevance is a key measurement for the usefulness of search engines. Shift of sea... more ABSTRACT Search relevance is a key measurement for the usefulness of search engines. Shift of search relevance among search engines can easily change a search company's market cap by tens of billions of dollars. With the ever-increasing scale of the Web, machine learning technologies have become important tools to improve search relevance ranking. RankBoost is a promising algorithm in this area, but it is not widely used due to its long training time. To reduce the computation time for RankBoost, we designed a FPGA-based accelerator system and its upgraded version. The accelerator, plugged into a commodity PC, increased the training speed on MSN search engine data up to 1800x compared to the original software implementation on a server. The proposed accelerator has been successfully used by researchers in the search relevance ranking.
Web search engine companies are intensively running learning to rank algorithms to improve the se... more Web search engine companies are intensively running learning to rank algorithms to improve the search relevance. Neural network (NN)-based approaches, such as LambdaRank, can significantly increase the ranking quality. While, their training is very slow on a single computer and inherent coarse-grained parallelism could be hardly utilized by computer clusters. Thus an efficient implementation is necessary to timely generate acceptable NN models on frequently updated training datasets. This paper presents our work in accelerator. A SIMD streaming architecture is proposed to i) efficiently map the query-level NN computation and data structure to FPGA, ii) fully exploit the inherent fine-grained parallelism, and iii) provide scalability to large scale datasets. The accelerator shows up to 17.9X speedup over the software implementation on datasets from a commercial search engine.
ABSTRACT This paper describes a FPGA-based hardware acceleration system for LambdaRank algorithm.... more ABSTRACT This paper describes a FPGA-based hardware acceleration system for LambdaRank algorithm. LambdaRank Algorithm is a Neural Network (NN)-based learning to rank algorithm. It is intensively used by web search engine companies to increase the search relevance. Since i) the cost function for the ranking problem is much more complex than that of traditional Back-Propagation(BP) NNs, and ii) no coarse-grained parallelism exists, LambdaRank is hard to be efficiently accelerated by GPU or computer clusters. We presents a FPGA-based accelerator solution to provide high computing performance. A compact deep pipeline is proposed to handle the complex computing in the batch updating. The area scales linearly with the number of hidden nodes in the NN model. We also carefully design a data format to enable streaming consumption of the training data from host computer. The accelerator shows up to 24.6 speedup compared with the pure software implementation on datasets from a commercial search engine.
Web search engine ranks web pages according to their relevance to user queries, which is critical... more Web search engine ranks web pages according to their relevance to user queries, which is critical for the success of commercial search engines. Rank Boost algorithm is promising in Web relevance ranking area, while its computation complexity makes our existing implementations (including single node software-based implementation and a FPGA-based accelerator) too slow to reflect the dynamics of the Web. Moreover, previous implementations can not handle the huge web-scale data. As such, in this paper, we present the RankBoost implementation on a MPI-based distributed FPGA-based accelerators. Our results show that the combination of the coarse parallel efficiency of distributed system and the fine parallel efficiency of reconfigurable hardware accelerators can significantly increase the computing performance.
ACM Transactions on Reconfigurable Technology and Systems, 2009
ABSTRACT Search relevance is a key measurement for the usefulness of search engines. Shift of sea... more ABSTRACT Search relevance is a key measurement for the usefulness of search engines. Shift of search relevance among search engines can easily change a search company's market cap by tens of billions of dollars. With the ever-increasing scale of the Web, machine learning technologies have become important tools to improve search relevance ranking. RankBoost is a promising algorithm in this area, but it is not widely used due to its long training time. To reduce the computation time for RankBoost, we designed a FPGA-based accelerator system and its upgraded version. The accelerator, plugged into a commodity PC, increased the training speed on MSN search engine data up to 1800x compared to the original software implementation on a server. The proposed accelerator has been successfully used by researchers in the search relevance ranking.
Uploads
Papers by Cai Xiongfei