Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
The kNN machine learning method is widely used as a classifier in Human Activity Recognition (HAR) systems. Although the kNN algorithm works similarly both online and in offline mode, the use of all training instances is much more... more
The kNN machine learning method is widely used as a classifier in Human Activity Recognition (HAR) systems. Although the kNN algorithm works similarly both online and in offline mode, the use of all training instances is much more critical online than offline due to time and memory restrictions in the online mode. Some methods propose decreasing the high computational costs of kNN by focusing, e.g., on approximate kNN solutions such as the ones relying on Locality-Sensitive Hashing (LSH). However, embedded kNN implementations also need to address the target device’s memory constraints, especially as the use of online classification needs to cope with those constraints to be practical. This paper discusses online approaches to reduce the number of training instances stored in the kNN search space. To address practical implementations of HAR systems using kNN, this paper presents simple, energy/computationally efficient, and real-time feasible schemes to maintain at runtime a maximum ...
Page 1. Copyright 1998 Faculty of Electrical Engineering and Computing (FER). Published in the Proceedings of WDTA'98, Dubrovnik, Croatia, June 8-10, 1998. Personal use of this material is permitted. However, permission ...
The invention relates to a method for compiling programs on a system consisting of at least one first processor and a reconfigurable unit. It is provided in this method that the code parts suitable for the reconfigurable unit are... more
The invention relates to a method for compiling programs on a system consisting of at least one first processor and a reconfigurable unit. It is provided in this method that the code parts suitable for the reconfigurable unit are determined and extracted and the remaining code is extracted in such a manner for processing by the first processor.
Abstract Reconfigurable computing platforms offer the promise of substantially accelerating computations through the concurrent nature of hardware structures and the ability of these architectures for hardware customization. Effectively... more
Abstract Reconfigurable computing platforms offer the promise of substantially accelerating computations through the concurrent nature of hardware structures and the ability of these architectures for hardware customization. Effectively programming such reconfigurable architectures, however, is an extremely cumbersome and error-prone process, as it requires programmers to assume the role of hardware designers while mastering hardware description languages, thus limiting the acceptance and dissemination of this promising ...
Abstract not available.
Research Interests:
Research Interests:
Embedded Systems permeate all aspects of our daily life, from the ubiquitous mobile devices (eg, PDAs and smart-phones) to play-stations, set-top boxes, household appliances, and in every electronic system, be it large or small (eg, in... more
Embedded Systems permeate all aspects of our daily life, from the ubiquitous mobile devices (eg, PDAs and smart-phones) to play-stations, set-top boxes, household appliances, and in every electronic system, be it large or small (eg, in cars, wrist-watches). Most embedded systems are characterized by stringent design constraints such as reduced memory and computing capacity, severe power and energy restrictions, weight and space limitations, most importantly, very short life spans and thus strict design cycles. ...
Abstract Sorting is an important operation in a myriad of applications. It can contribute substantially to the overall execution time of an application. Dedicated sorting architectures can be used to accelerate applications and/or to... more
Abstract Sorting is an important operation in a myriad of applications. It can contribute substantially to the overall execution time of an application. Dedicated sorting architectures can be used to accelerate applications and/or to reduce energy consumption. In this paper, we propose an efficient sorting unit aiming at accelerating the sort operation in FPGA-based embedded systems. The proposed sorting unit, named Unbalanced FIFO Merge Sorting Unit, is based on a FIFO merger implementation and is easily scalable to handle different ...
Abstract Sorting is an important operation for many embedded computing systems. Since sorting large datasets may slowdown the overall execution, schemes to speedup sorting operations are needed. Bearing in mind the hardware acceleration... more
Abstract Sorting is an important operation for many embedded computing systems. Since sorting large datasets may slowdown the overall execution, schemes to speedup sorting operations are needed. Bearing in mind the hardware acceleration of sorting, we show in this paper an analysis and comparison among three hardware sorting units: sorting network, insertion sorting, and FIFO-based merge sorting. We focus on embedded computing systems implemented with FPGAs, which give us the flexibility to accommodate ...
ABSTRACT In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper presents coarse/fine-grained data... more
ABSTRACT In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of producer/consumer tasks. This paper presents coarse/fine-grained data flow synchronization approaches to achieve pipelining execution of the producer/consumer tasks in FPGA-based multicore architectures. Our approaches are able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the producer/consumer tasks. The experimental results show the feasibility of the approach when dealing with producer/consumer tasks with out-of-order communication and reveal noticeable performance improvements for a number of benchmarks over a single core implementation and not using task-level pipelining.
ABSTRACT A common migration path for applications to high-performance multicore architectures relies on code annotations with concurrent semantics. Some annotations, however, are very target architecture specific and thus highly... more
ABSTRACT A common migration path for applications to high-performance multicore architectures relies on code annotations with concurrent semantics. Some annotations, however, are very target architecture specific and thus highly non-portable. In this paper we describe a source-to-source code transformation system that allows programmers to specify transformations using an aspect-oriented domain specific language - LARA. LARA allows programmers to specify strategies to search large code transformation design spaces while preserving the original source code. As the experimental results reveal, this approach leads to a substantial reduction in code maintenance costs, and promotes the portability of both programmers and performance.
Abstract The increasing capability and flexibility of reconfigurable hardware, such as Field-Programmable Gate Arrays (FPGAs), give developers a wide range of architectural choices that can satisfy various non-functional requirements,... more
Abstract The increasing capability and flexibility of reconfigurable hardware, such as Field-Programmable Gate Arrays (FPGAs), give developers a wide range of architectural choices that can satisfy various non-functional requirements, such as those involving performance, resource and energy efficiency. This paper describes a novel approach, based on an aspect-oriented language called LARA, that enables systematic coding and reuse of optimisation strategies that address such non-functional requirements. Our approach will be presented ...
Abstract The synthesis and mapping of applications to configurable embedded systems is a notoriously hard process. Tools have a wide range of parameters, which interact in very unpredictable ways, thus creating a large and complex design... more
Abstract The synthesis and mapping of applications to configurable embedded systems is a notoriously hard process. Tools have a wide range of parameters, which interact in very unpredictable ways, thus creating a large and complex design space. When exploring this space, designers must understand the interfaces to the various tools and apply, often manually, a sequence of tool-specific transformations making this an extremely cumbersome and error-prone process. This paper describes the use of aspect-oriented techniques for ...
Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow,... more
Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow, and to trigger source-to-source code transformations and compilation/synthesis optimizations on a given application. Hence, user knowledge about an application and/or target architecture can be codified as aspects, allowing the original application code to be automatically extended to satisfy non-functional concerns, such as arithmetic precision and performance.
Abstract This paper presents the use of LALP to implement typical industrial application kernels, ADPCM Encoder and Decoder, in FPGAs. LALP is a domain specific language and its compilation framework aims to the direct mapping of... more
Abstract This paper presents the use of LALP to implement typical industrial application kernels, ADPCM Encoder and Decoder, in FPGAs. LALP is a domain specific language and its compilation framework aims to the direct mapping of algorithms originally described in a high-level language onto FPGAs. In particular, LALP focuses on loop pipelining, a key technique for the design of hardware accelerators. While the language syntax resembles C, it contains certain constructs that allow programmer interventions to enforce or relax data ...
Due to the large number of optimizations provided in modern compilers and to compiler optimization specific opportunities, a Design Space Exploration (DSE) is necessary to search for the best sequence of compiler optimizations for a given... more
Due to the large number of optimizations provided in modern compilers and to compiler optimization specific opportunities, a Design Space Exploration (DSE) is necessary to search for the best sequence of compiler optimizations for a given code fragment (e.g., function). As this exploration is a complex and time consuming task, in this paper we present DSE strategies to select optimization sequences to both improve the performance of each function and reduce the exploration time. The DSE is based on a clustering approach which groups functions with similarities and then explore the reduced search space provided by the optimizations previously suggested for the functions in each group. The identification of similarities between functions uses a data mining method which is applied to a symbolic code representation of the source code. The DSE process uses the reduced set identified by clustering in two ways: as the design space or as the initial configuration. In both ways, the adoption of a pre-selection based on clustering allows the use of simple and fast DSE algorithms. Our experiments for evaluating the effectiveness of the proposed approach address the exploration of compiler optimization sequences considering 49 compilation passes and targeting a Xilinx MicroBlaze processor, and were performed aiming performance improvements for 41 functions. Experimental results reveal that the use of our new clustering-based DSE approach achieved a significant reduction on the total exploration time of the search space (18x over a Genetic Algorithm approach for DSE) at the same time that important performance speedups (43% over the baseline) were obtained by the optimized codes.
ABSTRACT
ABSTRACT
This chapter describes the design-flow approach developed in the REFLECT project as presented originally in [1]. Over the course of the project, this design-flow has evolved and has been extended into a fully operational toolchain. We... more
This chapter describes the design-flow approach developed in the REFLECT project as presented originally in [1]. Over the course of the project, this design-flow has evolved and has been extended into a fully operational toolchain. We begin by presenting an overview of the underlying aspect-oriented compilation flow followed by an extended description of the design-flow and its toolchain.

And 121 more