Abstract
In this paper we investigate the issue of automatically identifying the “natural” degree of parallelism of an application using software transactional memory (STM), i.e., the workload-specific multiprogramming level that maximizes application’s performance. We discuss the importance of adapting the concurrency level in two different scenarios, a shared-memory and a distributed STM infrastructure. We propose and evaluate two alternative self-tuning methodologies, explicitly tailored for the considered scenarios. In shared-memory STM, we show that lightweight, black-box approaches relying solely on on-line exploration can be extremely effective. For distributed STMs , we introduce a novel hybrid approach that combines model-driven performance forecasting techniques and on-line exploration in order to take the best of the two techniques, namely enhancing robustness despite model’s inaccuracies, and maximizing convergence speed towards optimum solutions.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
One should note at this point that none of the benchmarks we experimented with (i.e., STAMP applications and various micro-benchmarks) exhibits multiple maxima when observing throughput as a function of the number of threads, up to the hardware limit of our 48-core test machine.
This algorithm, although modifying the number of threads frequently, already achieves good performance results. Our intent was to avoid using fixed thresholds in order to provide a generic solution while keeping the approach simple. A more elaborate algorithm may detect portions of the execution where it is possible to keep the number of threads constant and would achieve even better performance.
A large collection of other experimental results can be found in a companion research report [25].
Note that state transfer is required upon elastic scaling both of replicated DTSMs (which are considered in this paper) and non-replicated ones. Even without replication, in fact, state transfer is needed to send a portion of the data-set to joining nodes in case of scale-up of the system and to preserve the data set upon shrinking the platform’s size, redistributing data from the leaving nodes to the remaining ones.
The original TPC-C benchmark is designed to operate on a relational database, hence we developed a porting running directly on top of a transactional key-value store such as Infinispan (code available here: http://github.com/cloudtm).
References
Abouzour M, Salem K, Bumbulis P (2010) Automatic tuning of the multiprogramming level in Sybase SQL Anywhere. In: Proc. of ICDE workshops
Cao Minh C, Chung J, Kozyrakis C, Olukotun K (2008) STAMP: Stanford transactional applications for multi-processing. In: Proc. of IISWC
Couceiro M, Romano P, Carvalho N, Rodrigues L (2009) D2stm: dependable distributed software transactional memory. In: Proc. of PRDC
Di Sanzo P, Ciciani B, Palmieri R, Quaglia F, Romano P (2012) On the analytical modeling of concurrency control algorithms for software transactional memories: the case of commit-time-locking. Performance Evaluation
Di Sanzo P, Ciciani B, Quaglia F, Romano P (2008) A performance model of multi-version concurrency control. In: Proc. of MASCOTS
Didona D, Felber P, Harmanci D, Romano P, Schenker J (2013) Identifying the optimal level of parallelism in transactional memory applications. In: Proc. of NETYS
Didona D, Romano P, Peluso S, Quaglia F (2012) Transactional auto scaler: elastic scaling of in-memory transactional data grids. In: Proc. of ICAC
Dragojevic A, Guerraoui R (2010) Predicting the scalability of an stm: a pragmatic approach. In: TRANSACT
Elnikety S, Dropsho S, Cecchet E, Zwaenepoel W (2009) Predicting replicated database scalability from standalone database profiling. In: Proc. of EuroSys
Ghanbari S, Soundararajan G, Chen J, Amza C (2007) Adaptive learning of metric correlations for temperature-aware database provisioning. In: Proc. of ICAC
Harmanci D, Gramoli V, Felber P, Fetzer C (2010) Extensible transactional memory testbed. Journal of Parallel and Distributed Computing, Special Issue (Transactional Memory) 70(10):1053–1067
Harris T, Larus JR, Rajwar R (2010) Transactional memory, synthesis. Lectures on computer architecture, 2nd edn. Morgan & Claypool Publisher, San Rafael
Heindl A, Pokam G, Adl-Tabatabai AR (2009) An analytic model of optimistic software transactional memory. In: Proc. of ISPASS
Heiss HU, Wagner R (1991) Adaptive load control in transaction processing systems. In: Proc. of VLDB
Herlihy M, Moss JEB (1993) Transactional memory: architectural support for lock-free data structures. In: Proc. of ISCA
Jiménez-Peris R, Patiño-Martínez M, Alonso G (2002) Non-intrusive, parallel recovery of replicated data. In: Proc. of SRDS
Marchioni F, Surtani M (2012) Infinispan Data Grid Platform. Packt Publishing, Birmingham
Mohammad A, Mikel L, Christos K, Kim J, Chris K, Ian W (2008) Robust adaptation to available parallelism in transactional memory applications. HIPEAC J
Quinlan JR Rulequest Cubist. http://www.rulequest.com/cubist-info.html. Accessed Nov 2013
Quinlan JR (1993) C.45: programs for machine learning. Morgan Kaufmann, Burlington
Raghavan N, Vitenberg R (2011) Balancing the communication load of state transfer in replicated systems. In: Proc. of SRDS
(2011) Red Hat/JBoss: JBoss Infinispan. http://www.jboss.org/infinispan. Accessed Nov 2013
Reimer N, Haenssgen S, Tichy WF (1996) Dynamically adapting the degree of parallelism with reflexive programs. In: Proc. of IRREGULAR
Rughetti D, Di Sanzo P, Ciciani B, Quaglia F (2012) Machine learning-based self-adjusting concurrency in software transactional memory systems. In: Proc. of MASCOTS
Schenker J (2012) Optimistic synchronization and the natural degree of parallelism of concurrent applications, MSc Thesis
Schroeder B, Harchol-Balter M, Iyengar A, Nahum E, Wierman A (2006) How to determine a good multi-programming level for external scheduling. In: Proc. of ICDE
Singh R, Sharma U, Cecchet E, Shenoy P (2010) Autonomic mix-aware provisioning for non-stationary data center workloads. In: Proc. of ICAC. Accessed Nov 2013
TPC Council: TPC-C Benchmark. http://www.tpc.org/tpcc. Accessed Nov 2013
Yoo RM, Lee HHS (2008) Adaptive transaction scheduling for transactional memory systems. In: Proc. of SPAA
Yu PS, Dias DM, Lavenberg SS (1993) On the analytical modeling of database concurrency control. ACM J 40:831–872
Zhang Q, Cherkasova L, Smirni E (2007) A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: Proc. of ICAC
Acknowledgments
This work has been partially supported by the projects “Cloud-TM” and “ParaDIME” (co-financed by the European Commission through the contracts no. 257784 and 318693), project specSTM (PTDC/EIA-EIA/122785/2010), the COST Action Euro-TM (IC1001) and by FCT (INESC-ID multiannual funding) through the PEst-OE/EEI/LA0021/2013 Program Funds.
Author information
Authors and Affiliations
Corresponding author
Additional information
A shorter version of this article [6] appeared in Proc. of International Conference on Networked Systems, 2013.
Rights and permissions
About this article
Cite this article
Didona, D., Felber, P., Harmanci, D. et al. Identifying the optimal level of parallelism in transactional memory applications. Computing 97, 939–959 (2015). https://doi.org/10.1007/s00607-013-0376-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0376-3
Keywords
- Transactional memory
- Self-tuning
- Multi-programming level
- Analytical modelling
- Machine learning
- Gradient descent