Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Mohamed Akil

    Mohamed Akil

    ESIEE, Informatics, Faculty Member
    The latest generation of multicore Digital Signal Processors (DSP), their high computing power, low consumption and integrated peripherals will allow them to be embedded in the next generation of smart camera. Such DSPs allow designers to... more
    The latest generation of multicore Digital Signal Processors (DSP), their high computing power, low consumption and integrated peripherals will allow them to be embedded in the next generation of smart camera. Such DSPs allow designers to evolve the vision landscape and simplify the developer's tasks to run more complex image and video processing applications without the need to burden a separate Personal Computer (PC). This paper explains how exploiting the computing power of a multicore DSP TMS320C6472 in order to implement a real-time H264/AVC video encoder. This work prepares the way to the implementation of the new High Efficiency Video Coding standard (HEVC-H265). To improve encoding speed, the enhanced Frame Level Parallelism (FLP) approach is presented and implemented. A real-time fully functional video demo is given taken into account video capture and bitstream storage. Experimental results show how we efficiently exploit the potentials and the features of the multicore platform without inducing PSNR degradation or bitrate increase. The enhanced FLP using five DSP cores achieves a speedup factor of 4.3 times in average compared to a mono-core processor implementation for Common Intermediate Format (CIF 352x288), Standard Definition (SD 720x480) and High Definition (HD 1280x720) resolutions. This optimized implementation allows us to exceed the real-time by reaching an encoding speed of 98 f/s (frame/second) and 32 f/s for CIF and SD resolutions respectively and saves up to 77% of encoding time for the HD resolution.
    Research Interests:
    This paper presents an optimizing methodology for the implementation of a Learning Vector Quantization (LVQ) neural network in a Field Programmable Gate Array (FPGA) device. Starting from an algorithmic specification in the form of a... more
    This paper presents an optimizing methodology for the implementation of a Learning Vector Quantization (LVQ) neural network in a Field Programmable Gate Array (FPGA) device. Starting from an algorithmic specification in the form of a Factorized and Conditioned Data Dependence Graph (GFCDD), we suggest a design methodology of the LVQ-dedicated architecture. This formal methodology is called AAA, “Algorithm Architecture Adequation”. Using graph transformations, it allows the generation of an optimized circuit implementation at the Register Transfer Level (RTL). It is associated to the SynDEx-IC software tool. Based on this formal methodology, we are able to explore and generate various LVQ network implementations by varying the LVQ sizes while minimizing the hardware resources and the design time. In addition, real-time constraints should be respected to ensure a reliable classification of vigilance states in humans from electroencephalographic signals (EEG). To validate our approach, the optimized LVQ implementation was tried on two types of Virtex devices.
    ... Flexibility is provided by exploiting the programmability of the PGA circuits used. ... Secondly, there is not a unique correspondence between the implementation of an FA and its ... of FPGA to obtain as flexible computing functions... more
    ... Flexibility is provided by exploiting the programmability of the PGA circuits used. ... Secondly, there is not a unique correspondence between the implementation of an FA and its ... of FPGA to obtain as flexible computing functions as possible, a trade-off between integrated memory ...
    ABSTRACT
    ... The fifth paper, by Jiang, Crookes and Bouridane, describes a parallel-matching processor architecture to perform high-speed biometric fingerprint database retrie-val. The processor was implemented on Xilinx Virtex-E and runs up to 65... more
    ... The fifth paper, by Jiang, Crookes and Bouridane, describes a parallel-matching processor architecture to perform high-speed biometric fingerprint database retrie-val. The processor was implemented on Xilinx Virtex-E and runs up to 65 MHz. ...
    ABSTRACT
    ABSTRACT
    ... Marcel0 Alves de Barros, Mohamed Akil and Ren6 Natowicz Groupe ESIEE - Laboratoire de Traitement de 1'Information et des Systkmes BP 99, Cite Descartes - 93162 - NOISY LE ... In this case, ADRi-1 corresponds to the position j... more
    ... Marcel0 Alves de Barros, Mohamed Akil and Ren6 Natowicz Groupe ESIEE - Laboratoire de Traitement de 1'Information et des Systkmes BP 99, Cite Descartes - 93162 - NOISY LE ... In this case, ADRi-1 corresponds to the position j of the element kj closest to p's grey level. ...
    ABSTRACT This paper deals with a dedicated hardware architecture for 1-D morphological opening and pattern spectrum. These operators allow extraction and measurement of 1-D features in images that is a commonly used technique in image... more
    ABSTRACT This paper deals with a dedicated hardware architecture for 1-D morphological opening and pattern spectrum. These operators allow extraction and measurement of 1-D features in images that is a commonly used technique in image analysis and texture classification. The architecture is based on a recently proposed opening algorithm and makes it possible to obtain arbitrary-oriented opening and granulometry at the same time. Respecting a sequential data access, several instances with different orientation can run in parallel on a single input dataflow, increasing thus the performance (experimentally 414 Mpx/s per opening). It opens applicability of traditionally costly operators in embedded, industrial applications.
    ABSTRACT
    ABSTRACT
    In this paper we present the hardware implementation of image segmentation chain based on topological operators on a mixed FPGA/DSP architecture. These operators can segment a bi-class image into regions. This original method is based on... more
    In this paper we present the hardware implementation of image segmentation chain based on topological operators on a mixed FPGA/DSP architecture. These operators can segment a bi-class image into regions. This original method is based on some low-level operators, which do not need parameters. This is more interesting in architecture viewpoint, due to its simplicity and the fact that these low-level operators are used somehow in different phases of algorithm, hence an important reduction of used FPGA surface. Beside, the result of the segmentaion consists of closed and thin contours. This method is based on four basic operators which modify the topology of the image in order to segment it. The first operator of the image simplifies the topology of the image while preserving its gray level informations. A real image once simplified is full of irregular regions or points, due the noise or the texture of the regions. The second and third operators selectively eliminate respectively the irregular points and irregular regions. The fourth operator is to reconstruct the image. In this paper we present the implementation of these four operators on PCI architecture based on a FPGA circuit and a DSP processor.
    In this article we present an implementation of a watershed algorithm on a multi-FPGA architecture. This implementation is based on an hierarchical FIFO. A separate FIFO for each gray level. The gray scale value of a pixel is taken for... more
    In this article we present an implementation of a watershed algorithm on a multi-FPGA architecture. This implementation is based on an hierarchical FIFO. A separate FIFO for each gray level. The gray scale value of a pixel is taken for the altitude of the point. In this way we look at the image as a relief. We proceed by a
    ABSTRACT
    The authors describe problems concerning the implementation of 2D convolution algorithms using reconfigurable technology. An approach for the automatic design of specific architectures in this technology is discussed. The Xilinx... more
    The authors describe problems concerning the implementation of 2D convolution algorithms using reconfigurable technology. An approach for the automatic design of specific architectures in this technology is discussed. The Xilinx programmable gate array (PGA) resources are presented. The authors consider specially their time and area limits. They present an implementation of a real time 3×3 programmable convolver with Xilinx XC 3090 PGA
    Page 1. Cognitive Radio Spectrum Evolution Prediction using Artificial Neural Networks based Multivariate Time Series Modelling MITaj and M.Akil Université Paris-Est, Laboratoire d'Informatique Gaspard-Monge, Equipe A3SI ...
    The Timepix is a pixel detector that records energy deposited by charged particles. Different particles leave a differ- ent trace. These traces can be analyzed in order to identify the particles, and consequently, analyze the source of... more
    The Timepix is a pixel detector that records energy deposited by charged particles. Different particles leave a differ- ent trace. These traces can be analyzed in order to identify the particles, and consequently, analyze the source of the radiation. We propose an image processing approach to the classification of particles based on the shape of traces, using only a few basic morphological operations. This method - implemented in an FPGA - achieves perfor- mance and latency allowing a high acquisition rate. Embedded with Timepix, it can beneficially analyze radioactive fluxes of unknown sources and spectra.
    ABSTRACT The human vision has been studied deeply in the past years, and several different models have been proposed to simulate it on computer. Some of these models concerns visual saliency which is potentially very interesting in a lot... more
    ABSTRACT The human vision has been studied deeply in the past years, and several different models have been proposed to simulate it on computer. Some of these models concerns visual saliency which is potentially very interesting in a lot of applications like ...
    ABSTRACT PNG (Portable Network Graphics) is a lossless compression method for real-world pictures. Since its specification, it continues to attract the interest of the image processing community. Indeed, PNG is an extensible file format... more
    ABSTRACT PNG (Portable Network Graphics) is a lossless compression method for real-world pictures. Since its specification, it continues to attract the interest of the image processing community. Indeed, PNG is an extensible file format for portable and well-compressed storage of raster images. In addition, it supports all of Black and White (binary mask), grayscale, indexed-color, and truecolor images. Within the framework of the Demat+ project which intend to propose a complete solution for storage and retrieval of scanned documents, we address in this paper a hardware design to accelerate the PNG encoder for binary mask compression on FPGA. For this, an optimized architecture is proposed as part of an hybrid software and hardware co-operating system. For its evaluation, the new designed PNG IP has been implemented on the ALTERA Arria II GX EP2AGX125EF35" FPGA. The experimental results show a good match between the achieved compression ratio, the computational cost and the used hardware resources. © (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
    ABSTRACT
    In this paper, a CAD tool is proposed to facilitate the description of a Fuzzy Logic System described in high level language (fuzzy variables and rule base) and translate it into synthesizable VHDL code that meets real-time constraints.... more
    In this paper, a CAD tool is proposed to facilitate the description of a Fuzzy Logic System described in high level language (fuzzy variables and rule base) and translate it into synthesizable VHDL code that meets real-time constraints. The generated code is synthesizable by any industrial standard synthesizing tool (e.g. MAXPLUS II, Xilinx ISE). The FLS is described using a GUI wizard. By answering a number of questions in the wizard window, all the specifications about the system to be designed are accounted for. This GUI can be used to launch the design wizard and to generate VHDL code. SynDEx-IC was used as the back-end of the system. The CAD tool was developed using Qt 3 to enable cross-platform portability.
    ABSTRACT
    In this work, we are concerning by the improvement of optimized static heuristic in distributed and heterogeneous architecture. Thus, in order to take into account the load-balancing criterion, a need for a dynamic model is related to the... more
    In this work, we are concerning by the improvement of optimized static heuristic in distributed and heterogeneous architecture. Thus, in order to take into account the load-balancing criterion, a need for a dynamic model is related to the use of heuristic based on list scheduling. The greedy list algorithms are well adapted, for they help the designer to rapidly obtain an efficient implementation and so to shorten development cycle of an application. This paper presents a dynamic model based on agent concept, coupled with an off line (static) heuristic. Our model has been simulated in programming parallel environment Xpvm. We show the integration of our model in the transformations flow, that implicates a heuristic to generating a real time distributed executive. This integration will be implemented in system level CAD software tool called SynDEx which, is a seamless flow based on graphs transformations.
    ABSTRACT electronic version (8 pp.)
    ABSTRACT electronic version (8 pp.)
    ABSTRACT electronic version (8 pp.)
    In this article, we present a popular lossless compression/decompression algorithm, GZIP, and the study to implement it on an FPGA-based architecture, the ADM-XRC board from ALPHA DATA parallel system ltd. The algorithm is lossless, and... more
    In this article, we present a popular lossless compression/decompression algorithm, GZIP, and the study to implement it on an FPGA-based architecture, the ADM-XRC board from ALPHA DATA parallel system ltd. The algorithm is lossless, and applied to “bi-level” images of large size (A0 format). It ensures a minimum compression rate for the images we are considering. It aims to decrease
    Abstract This article presents the parallel implementa-tion on a GPU of a real-time dynamic tone-mapping operator. The operator we describe in this article is generic and may be used by any application. However, the goal of our work is to... more
    Abstract This article presents the parallel implementa-tion on a GPU of a real-time dynamic tone-mapping operator. The operator we describe in this article is generic and may be used by any application. However, the goal of our work is to integrate this operator into the graphic ...
    In miscellaneous applications of image treatment, thinning and crest restoring present a lot of interests. Recommended algorithms for these procedures are those able to act directly over grayscales images while preserving topology. But... more
    In miscellaneous applications of image treatment, thinning and crest restoring present a lot of interests. Recommended algorithms for these procedures are those able to act directly over grayscales images while preserving topology. But their strong consummation in term of time remains the major disadvantage in their choice. In this paper we present an efficient hardware implementation on RISC processor of two powerful algorithms of thinning and crest restoring developed by our team. Proposed implementation enhances execution time. A chain of segmentation applied to medical imaging will serve as a concrete example to illustrate the improvements brought thanks to the optimization techniques in both algorithm and architectural levels. The particular use of the SSE instruction set relative to the X86_32 processors (PIV 3.06 GHz) will allow a best performance for real time processing: a cadency of 33 images (512*512) per second is assured.
    ABSTRACT electronic version (8 pp.)
    Abstract. AAA is a methodology developed for the fast prototyping of real-time embedded applications and SynDEx is the software tool based on this methodology. Based on formal transformations, AAA helps the designer to implement signal... more
    Abstract. AAA is a methodology developed for the fast prototyping of real-time embedded applications and SynDEx is the software tool based on this methodology. Based on formal transformations, AAA helps the designer to implement signal and images processing ...
    ... Implementing Real-Time Algorithms by using the AAA Prototyping Methodology. ThierryGrandpierre, Pierre Niang, Mohamed Akil. Abstract. Implementing Real-Time Algorithms by using the AAA Prototyping Methodology Full Text at Springer,... more
    ... Implementing Real-Time Algorithms by using the AAA Prototyping Methodology. ThierryGrandpierre, Pierre Niang, Mohamed Akil. Abstract. Implementing Real-Time Algorithms by using the AAA Prototyping Methodology Full Text at Springer, may require registration or fee.
    In this article we present the local operations in image processing based upon spatial 2D discrete convolution. We study different implementation of such local operations. We also present the principles and the design flow of the AAA... more
    In this article we present the local operations in image processing based upon spatial 2D discrete convolution. We study different implementation of such local operations. We also present the principles and the design flow of the AAA methodology and its associated CAD software tool for integrated circuit (SynDEx-IC). In this methodology, the algorithm is modeled by Conditioned (if - then
    Mixed Architectures contain programmable devices and reconfigurable devices. They provide a powerfull answer to meet the computational requirement of latest digital signal processing applications. But the complexity of the corresponding... more
    Mixed Architectures contain programmable devices and reconfigurable devices. They provide a powerfull answer to meet the computational requirement of latest digital signal processing applications. But the complexity of the corresponding algorithms and the multiplicity and diversity of computing components usually lead to a huge number of possible implementations. Architecture Algorithm Adequation (AAA) is one of the rapid prototyping methodologies which allow to explore the solution set to build an optimized application. However, it requires improvement in order to support mixed architectures containing both programmable and configurable components. This paper suggests an extension of the AAA methodology to support mixed architectures. The AAA architecture model is first extended to mixed architecture. Then, we present the coupling of existing tools (SynDEx and SynDEx-IC) in order to support mixed architectures. Finally a communication IP is proposed to manage FPGA communication and...
    Research Interests:
    Real-time H.264/AVC high definition video encoding represents a challenging workload to most existing programmable processors. The new technologies of programmable processors such as Graphic Processor Unit (GPU) and multicore Digital... more
    Real-time H.264/AVC high definition video encoding represents a challenging workload to most existing programmable processors. The new technologies of programmable processors such as Graphic Processor Unit (GPU) and multicore Digital signal Processor (DSP) offer a very promising solution to overcome these constraints. In this paper, an optimized implementation of H264/AVC video encoder on a single core among the six cores of TMS320C6472 DSP for Common Intermediate Format (CIF) (352x288) resolution is presented in order to move afterwards to a multicore implementation for standard and high definitions (SD,HD). Algorithmic optimization is applied to the intra prediction module to reduce the computational time. Furthermore, based on the DSP architectural features, various structural and hardware optimizations are adopted to minimize external memory access. The parallelism between CPU processing and data transfers is fully exploited using an Enhanced Direct Memory Access controller (EDM...

    And 51 more