Abstract
Processing-intensive web server requests can lead to low Quality of Service (QoS), such as longer mean response time and lower throughput, which calls for a new web server software framework that can improve the performance of web servers. The characteristic of request-level parallelism in web servers is fit for many-core architecture accelerators, such as GPU and Intel Xeon Phi co-processors, but traditional web server model cannot make full use of the performance of these accelerators. We proposed a new web server software framework— called MIC-based Server-side Accelerator Framework (MSAF)—for a machine with not only multi-core CPUs but also Intel Xeon Phi co-processors based on Staged Event Driven Architecture (SEDA). The framework can fully exploit the performance of Intel Xeon Phi co-processors and multi-core CPUs, and improve power/energy efficiency by offloading the stage of handling requests to Intel Xeon Phi co-processors. We implemented the web server simulation software based on MSAF framework on a machine with multi-core CPUs and Intel Xeon Phi co-processors, and evaluated it by means of Apache Benchmark (AB). Our evaluation of MSAF shows its performance is about equivalent to that of a web server cluster consisting of four to five computing nodes. This paper indicates that if MSAF is applied to, Intel Xeon Phi co-processors are suitable for server side software, such as web servers, DNS servers, and database servers, because of its characteristic of lower communication latency between Intel Xeon Phi co-processors and host, more powerful logic processing ability, and more energy efficiency.
Similar content being viewed by others
References
Al-Tarazi, M., Chang, J.M.: Network-aware energy saving multi-objective optimization in virtualized data centers. Clust. Comput. 22, 635–647 (2018)
Schroeder, T., Goddard, S., Ramamurthy, B.: Scalable web server clustering technologies. IEEE Network 14, 38–45 (2000)
Cardellini, V., Casalicchio, E., Colajanni, M., Yu, PhS: The state of the art in locally distributed web-server systems. ACM Comput. Surveys (CSUR) 34(2), 263–311 (2002)
Andreolini, M., Casalicchio, E.: A cluster-based web system providing differentiated and guaranteed services. Clust. Comput. 7(1), 7–19 (2004)
Hellerstein, J.L., Katircioglu, K., Surendra, M.: An on-line, business-oriented optimization of performance and availability for utility computing. IEEE J. Sel. Areas Commun. 23(10), 2013–2021 (2005)
Andersen, D.G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., Vasudevan, V., Fawn: a fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP’09, pp. 1–14 (2009)
Chase, J.S., Anderson, D.C., Thakar, P.N., Vahdat, A.M., Doyle, R.P.: Managing energy and server resources in hosting centers. In: Proceedings of the eighteenth ACM symposium on Operating systems principles, vol. 35, pp. 103–116 (2001)
NVIDIA, Tesla K20 GPU accelerator board specification. http://www.nvidia.com/content/PDF/kepler/Tesla-K20-Passive-BD-06455-001-v05.pdf
Intel Corp, Intel Xeon Phi coprocessor. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html
Zhang C., Li P., Sun G, et al.: Optimizing fpga based accelerator design for deep convolutional neural networks[C] Acm/sigda International Symposium on Field-programmable Gate Arrays. ACM (2015)
Agrawal, S.R., Pistol, V., Pang, J., et al: Rhythm: harnessing data parallel hardware for server workloads, in: International Conference on Architectural Support for Programming Languages and Operating Systems, ACM, pp. 19–34 (2014)
Fjalling, T., Stenstrom, P., Performance impact of batching web-application requests using hot-spot processing on GPUs. In: Parallel and distributed processing symposium IEEE, pp. 989–999 (2015)
Putnam, A., Gray, J., Haselman, M., et al.: A reconfigurable fabric for accelerating large-scale datacenter services. Commun. ACM 59(11), 114–122 (2016)
Lim, R., Lee, Y., Kim, R., et al.: An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512. Clust. Comput. 21, 1785–1795 (2018)
Kao, C.C., Hsu, W.C.: Exploring hidden coherency of ray-tracing for heterogeneous systems using online feedback methodology. Vis. Comput. 34, 633–643 (2017)
Sharifian, S., Motamedi, S.A., Akbari, M.K.: A content-based load balancing algorithm with admission control for cluster web servers. Future Gener. Comput. Syst. 24(8), 775–787 (2008)
Reisizadeh, A., Prakash, S., Pedarsani R., et al: Coded computation over heterogeneous clusters. In: Information theory (ISIT) 2017 IEEE international symposium on, ISIT 2017, pp. 2408–2412 (2017)
Potluri, S., Bureddy, D., Hamidouche K., et al: MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters, in: High PERFORMANCE computing, networking, storage and analysis, IEEE, pp.1–11 (2013)
Lu, M., Zhang, L., Huynh H.P., et al: Optimizing the mapreduce framework on intel xeon phi coprocessor. In: Big Data, 2013 IEEE international conference on, IEEE, pp. 125–130 (2013)
Jha, S., He, B., Lu, M., et al.: Improving main memory hash joins on intel xeon phi processors: an experimental approach. Proc. VLDB Endowment 8(6), 642–653 (2015)
Lima, J.V.F., Broquedis, F., Gautier, T. et al: Preliminary experiments with xkaapi on intel xeon phi coprocessor. In: Computer architecture and high performance computing (SBAC-PAD) 25th international symposium on. IEEE, pp. 105–112 (2013)
Hou, K., Wang, H., Feng, W.: Aspas: a framework for automatic simdization of parallel sorting on x86-based many-core processors, In: Proceedings of the 29th ACM on international conference on supercomputing, ACM, pp. 383–392 (2015)
Von Behren, J.R., Condit, J., Brewer, E.A.: Why events are a bad idea (for high-concurrency servers, In: HotOS, pp. 19–24 (2003)
Pariag, D., Brecht, T., Harji, A., et al.: Comparing the performance of web server architectures. ACM SIGOPS Operat. Syst. Rev. 41(3), 231–243 (2007)
Crovella, M.E., Frangioso, R., Harchol-Balter, M., Connection scheduling in web servers, Boston University Computer Science Department (1999)
Liu, W.L., Lung, C.H., Ajila, S.: Impact of aspect-oriented programming on software performance: a case study of leader/followers and half-sync/half-async architectures. In: Computer software and applications conference 2011 IEEE 35th annual, COMPSAC 2011, pp. 662–667 (2011)
Reese, W.: Nginx: the high-performance web server and reverse proxy. Linux Journal 173, 2 (2008)
Hu, Y., Nanda, A., Yang, Q.: Measurement, analysis and performance improvement of the Apache web server, In: Performance, computing and communications conference IEEE international, pp. 261–267 (1999)
Vukotic, A., Goodwill, J.: Apache Tomcat 7. Apress, New York (2011)
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable Internet services. In: Proceedings of the eighteenth symposium on operating systems principles, Banff, ACM, pp. 230–243 (2001)
Choi, G.S., Das, C.R.: A superscalar software architecture model for multi-core processors. J. Syst. Software 83, 1823–1837 (2010)
Guo, D., Bhuyan, L.N., Liu, B.: An efficient parallelized L7-filter design for multicore servers. IEEE/ACM Transact. Netw. 20(5), 1426–1439 (2012)
Boyd-Wickizer, S., Clements, A.T., Mao, Y., Pesterev, A., Kaashoek, M.F., Morris, R., Zeldovich, N.: An analysis of linux scalability to many cores. In: Proceedings of the 9th USENIX conference on operating systems design and implementation, OSDI’10, Berkeley, USA, pp. 1–8 (2010)
Harji, A.S., Buhr, P.A., Brecht, T.: Comparing high-performance multi-core web-server architectures. In: Proceedings of the 5th annual international systems and storage conference, SYSTOR’12, New York, USA, pp. 1-12 (2012)
Hashemian, R., Krishnamurthy, D., Arlitt, M., Carlsson, N.: Characterizing the scalability of a web application on a multi-core server. Concurr. Comput. 26, 2027–2052 (2014)
You, G., Zhao, Y.: A weighted-fair-queuing (WFQ)-based dynamic request scheduling approach in a multi-core system. Future Gener. Comput. Syst. 28, 1110–1120 (2012)
You, G., Wang, X., Zhao, Y.: An adaptive dynamic request scheduling model for multi-socket, Multi-core Web Servers. Arab. J. Sci. Eng. 42, 751–764 (2016)
Sharifian, S., Motamedi, S.A., Akbari, M.K.: A predictive and probabilistic load-balancing algorithm for cluster-based web servers. Appl. Soft. Comput. 11, 970–981 (2011)
Uiseok, S., Bodon, J., Sungyong, P., et al.: Optimizing communication performance in scale-out storage system. Clust. Comput. 22, 335–346 (2018)
Gammo, L., Brecht, T., Shukla A., et al: Comparing and evaluating epoll, select, and poll event mechanisms. In: Proceedings of annual linux symposium (2004)
Borhani, A.H., Hung, T., Lee, B.S., et al.: Power-network aware VM migration heuristics for multi-tier web applications. Clust. Comput. (2018). https://doi.org/10.1007/s10586-018-2872-x
Hernández-Orallo, E., Vila-Carbó, J.: Web server performance analysis using histogram workload models. Comput. Netw. 53, 2727–2739 (2009)
Acknowledgements
This paper has been supported by the Fundamental Research Funds for the Central Universities (Grant No. PT1607), and CHEMCLOUDCOMPUTING@BUCT.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
You, G., Wang, X. A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems. Cluster Comput 23, 2591–2608 (2020). https://doi.org/10.1007/s10586-019-03030-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-019-03030-z