SIMD
18 Followers
Recent papers in SIMD
Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront-and... more
Performance of shared memory processors show negative performance impulses (drawbacks) in certain regions for execution of the basic matrix multiplication algorithm. In this paper we continue with analysis of GPU memory hierarchy and... more
In this work we present a programmable and reconfigurable single instruction multiple data (SIMD) visual processor based on the S-CNN architecture, namely, the Simplicial CNN Digital Visual Processor (SCDVP), oriented to high-performance... more
At the nanometer scale, the focus of micro-architecture will move from processing to communication. Most general computer architectures to date have been based on a "stored program" paradigm that differentiates between memory and... more
This paper describes a new parallel architectural system which we have called an MIMD-SIMD hybrid system. As the name implies, MIMD-SIMD hybrid system (also denoted as hybrid system in this paper) is a combination of both SIMD and MIMD... more
Most contemporary processors offer some version of Single Instruction Multiple Data (SIMD) machinery -vector registers and instructions to manipulate data stored in such registers. The central idea of this paper is to use these SIMD... more
Molecular dynamics (MD) simulations are able to provide a wealth of detailed information about biological systems enabling studies impossible to perform in a laboratory. As such, MD simulations provide insight into important biological... more
ratio rather than on the high performance as in scientific applications.
A review of previous array Pascals leads on to a description the Glasgow Pascal compiler. The compiler is an ISO-Pascal superset with semantic extensions to translate data parallel statements to run on multiple SIMD cores. An appendix is... more
SIMD instructions have been commonly used to accelerate video codecs. The recently introduced HEVC codec like its predecessors is based on the hybrid video codec principle, and, therefore, also well suited to be accelerated with SIMD. In... more
We live in a data centric world with the availability of information on web to satisfy the two folded objectives of secured easy access and quick processing. The web solutions, which connect users to information, are well equipped to... more
Ada rasa dari masa lalu yang kadang menyeruak di waktu-waktu yang kini kita lewati. Sebagian menyenangkan, sebagian tidak. Apa lagi jika berhubungan dengan kisah dua anak manusia yang kandas di tengah jalan. Ada yang dapat melupakannya,... more
Augmenting a processor with special hardware that is able to apply a Single Instruction to Multiple Data (SIMD) at the same time is a cost effective way of improving processor performance. It also offers a means of improving the ratio of... more
Dense linear algebra has been traditionally used to evaluate the performance and efficiency of new architectures. This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators.
Inspired by the function, power, and volume of the organic brain, we are developing TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture. TrueNorth consists of a scalable network of neurosynaptic cores, with... more
"This Contemporary computer systems are multiprocessor or multicomputer machines. Their efficiency depends on good methods of administering the executed works. Fast processing of a parallel application is possible only when its parts are... more
In this paper we present a VLSI architecture, which acts as both a SIMD and a wavefront processor to compute the dynamic time-warping (DTW) algorithm. The DTW algorithm is separated into two independent modules. One is for computing the... more
MAK2 illustmtes how a small set qf instruction extensions can provide suhword pwmllelism to uccelemte mediu processing and other dutu-pumllel pro~rpms.
In this paper, we present a new model of a massively parallel single Instruction Multiple Data (SIMD) structure machines in a distributed system. Among the modelled machines, we distinguish the linear, 2D, 3D meshes, pyramidal structures... more
We present an efficient O( n ) numerical algorithm for first-order approximation of geodesic distances on geometry images, where n is the number of points on the surface. The structure of our algorithm allows efficient implementation on... more
Fog and mobile edge computing is a paradigm that augments resource-scarce mobile devices with resource-rich network servers to enable ubiquitous computing. Smartphone applications rely on code offloading techniques to leverage... more
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware a compelling platform for computationally demanding tasks in a wide variety of application... more
In this paper, we present a case study for the design, programming and usage of a reconfigurable system-on-chip, MorphoSys, which is targeted at computation-intensive applications. This 2-million transistor design combines a... more
Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities... more
At-speed testing of GHz processors using external testers may not be technically and economically feasible. Hence, there is an emerging need for low-cost, high-quality self-test methodologies, which can be used by processors to test... more
ratio rather than on the high performance as in scientific applications.
Smart cameras are among the emerging new fields of electronics. The points of interest are in the application areas, software and IC development. In order to reduce cost, it is worthwhile to invest in a single architecture that can be... more
The aim of this work is how to speed up the process of the biological (DNA and proteins) sequence comparison process by using a hybrid parallelisation technique of combining different parallel methods. Smith-Waterman algorithm has been... more
Despite the widespread adoption of parallel operations in contemporary CPU designs, their use has been restricted by a lack of appropriate programming language abstractions and development environments. To fully exploit the SIMD model of... more
This paper describes an implementation of a novel major line removal hough transform on a new parallel architectural system, Hybrid System. A Hybrid System is a combination of single instruction multiple data (SIMD) and multiple... more
Design methodologies based on reuse of intellectual property (IP) components critically depend on techniques to protect IP ownership. IP protection is particularly challenging for hardware/software systems, where an IP core runs embedded... more
We present an efficient algorithm for nonlocal image filtering with applications in electron cryomicroscopy. Our denoising algorithm is a rewriting of the recently proposed nonlocal mean filter. It builds on the separable property of... more
Hierarchical, multiresolution data representations enable interactive analysis and visualization of large-scale simulations. One promising application of these techniques is to store high performance computing simulation output in a... more
Performance of shared memory processors show negative performance impulses (drawbacks) in certain regions for execution of the basic matrix multiplication algorithm. In this paper we continue with analysis of GPU memory hierarchy and... more
In the present computing landscape, interpreters are in use in a wide range of systems. Recent trends in consumer electronics have created a new category of portable, lightweight software applications. Typically, these applications have... more
Software tools are an important part of the programming environment. Perhaps one of the most pervasive type of software tools is the "static analyzer", as exemplified by cross reference listing tools, and call gra h generators. In this... more
Over the last decade, significant advances have been made in compilation technology for capitalizing on instruction-level parallelism (ILP). The vast majority of ILP compilation research has been conducted in the context of generalpurpose... more
MppSoC is a SIMD architecture composed of a grid of processors and memories connected by a X-Net neighbourhood network and a general purpose global router. MppSoC is an evolution of the famous massively parallel systems proposed at the... more
Parallel processing methods are a means to achieve signi®cant speedup of computationally expensive image understanding algorithms, such as those applied to range images. Practical implementations of these algorithms must deal with the... more
GPU devices offer great performance when dealing with algorithms that require intense computational resources. A developer can configure the L1 cache memory of the latest GPU Kepler architecture with different cache size and cache set... more