Pesterev A, Zeldovich N and Morris R. Locating cache performance bottlenecks using data profiling. Proceedings of the 5th European conference on Computer systems. (335-348).
Berg E and Hagersten E.
(2002). SIP: Performance Tuning through Source Code Interdependence. Euro-Par 2002 Parallel Processing. 10.1007/3-540-45706-2_22. (177-186).
Jeon D, Garcia S, Louie C and Taylor M. Kismet. Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications. (519-536).
Mouhoub R and Hammami O.
(2006). NoC Monitoring Hardware Support for Fast NoC Design Space Exploration and Potential NoC Partial Dynamic Reconfiguration 2006 International Symposium on Industrial Embedded Systems. 10.1109/IES.2006.357481. 0-7803-9759-2. (1-10).
Hollingsworth J, Snavely A, Sbaraglia S and Ekanadham K. EMPS. Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11.
Berg E and Hagersten E.
(2002). SIP: Performance Tuning through Source Code Interdependence. Euro-Par 2002 Parallel Processing. 10.1007/3-540-45706-2_22. (177-186).
Mellor-Crummey J, Fowler R and Whalley D. Tools for application-oriented performance tuning. Proceedings of the 15th international conference on Supercomputing. (154-165).
Zilles C and Sohi G. A programmable co-processor for profiling HPCA-7 - 7th IEEE Symposium on High Performance Computer Architecture. 10.1109/HPCA.2001.903267. 0-7695-1019-1. (241-252).
Gibson J, Kunz R, Ofelt D, Horowitz M, Hennessy J and Heinrich M. FLASH vs. (Simulated) FLASH. Proceedings of the ninth international conference on Architectural support for programming languages and operating systems. (49-58).
Buck B and Hollingsworth J. Using hardware performance monitors to isolate memory bottlenecks. Proceedings of the 2000 ACM/IEEE conference on Supercomputing. (40-es).
Karl W, Leberecht M and Schulz M. Optimizing data locality for SCI-based PC-clusters with the SMiLE monitoring approach 1999 International Conference on Parallel Architectures and Compilation Techniques. 10.1109/PACT.1999.807523. 0-7695-0425-6. (169-176).
Reinhardt S, Pfile R and Wood D.
(1998). Hardware Support for Flexible Distributed Shared Memory. IEEE Transactions on Computers. 47:10. (1056-1072). Online publication date: 1-Oct-1998.
Liao C, Martonosi M and Clark D. Performance monitoring in a Myrinet-connected SHRIMP cluster. Proceedings of the SIGMETRICS symposium on Parallel and distributed tools. (21-29).
Liao C, Jiang D, Iftode L, Martonosi M and Clark D. Monitoring shared virtual memory performance on a Myrinet-based PC cluster. Proceedings of the 12th international conference on Supercomputing. (251-258).
Hockauf R, Karl W, Leberecht M, Oberhuber M and Wagner M.
(1998). Exploiting spatial and temporal locality of accesses: A new hardware-based monitoring approach for DSM systems. Euro-Par’98 Parallel Processing. 10.1007/BFb0057854. (206-215).
Xu Z, Larus J and Miller B. Shared-memory performance profiling. Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming. (240-251).