Search | arXiv e-print repository

A Low-Computational Video Synopsis Framework with a Standard Dataset

Authors: Ramtin Malekpour, M. Mehrdad Morsali, Hoda Mohammadzade

Abstract: Video synopsis is an efficient method for condensing surveillance videos. This technique begins with the detection and tracking of objects, followed by the creation of object tubes. These tubes consist of sequences, each containing chronologically ordered bounding boxes of a unique object. To generate a condensed video, the first step involves rearranging the object tubes to maximize the number of… ▽ More Video synopsis is an efficient method for condensing surveillance videos. This technique begins with the detection and tracking of objects, followed by the creation of object tubes. These tubes consist of sequences, each containing chronologically ordered bounding boxes of a unique object. To generate a condensed video, the first step involves rearranging the object tubes to maximize the number of non-overlapping objects in each frame. Then, these tubes are stitched to a background image extracted from the source video. The lack of a standard dataset for the video synopsis task hinders the comparison of different video synopsis models. This paper addresses this issue by introducing a standard dataset, called SynoClip, designed specifically for the video synopsis task. SynoClip includes all the necessary features needed to evaluate various models directly and effectively. Additionally, this work introduces a video synopsis model, called FGS, with low computational cost. The model includes an empty-frame object detector to identify frames empty of any objects, facilitating efficient utilization of the deep object detector. Moreover, a tube grouping algorithm is proposed to maintain relationships among tubes in the synthesized video. This is followed by a greedy tube rearrangement algorithm, which efficiently determines the start time of each tube. Finally, the proposed model is evaluated using the proposed dataset. The source code, fine-tuned object detection model, and tutorials are available at https://github.com/Ramtin-ma/VideoSynopsis-FGS. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 13 pages, 8 figures

arXiv:2408.08489 [pdf, other]

DFT-Based Adversarial Attack Detection in MRI Brain Imaging: Enhancing Diagnostic Accuracy in Alzheimer's Case Studies

Authors: Mohammad Hossein Najafi, Mohammad Morsali, Mohammadmahdi Vahediahmar, Saeed Bagheri Shouraki

Abstract: Recent advancements in deep learning, particularly in medical imaging, have significantly propelled the progress of healthcare systems. However, examining the robustness of medical images against adversarial attacks is crucial due to their real-world applications and profound impact on individuals' health. These attacks can result in misclassifications in disease diagnosis, potentially leading to… ▽ More Recent advancements in deep learning, particularly in medical imaging, have significantly propelled the progress of healthcare systems. However, examining the robustness of medical images against adversarial attacks is crucial due to their real-world applications and profound impact on individuals' health. These attacks can result in misclassifications in disease diagnosis, potentially leading to severe consequences. Numerous studies have explored both the implementation of adversarial attacks on medical images and the development of defense mechanisms against these threats, highlighting the vulnerabilities of deep neural networks to such adversarial activities. In this study, we investigate adversarial attacks on images associated with Alzheimer's disease and propose a defensive method to counteract these attacks. Specifically, we examine adversarial attacks that employ frequency domain transformations on Alzheimer's disease images, along with other well-known adversarial attacks. Our approach utilizes a convolutional neural network (CNN)-based autoencoder architecture in conjunction with the two-dimensional Fourier transform of images for detection purposes. The simulation results demonstrate that our detection and defense mechanism effectively mitigates several adversarial attacks, thereby enhancing the robustness of deep neural networks against such vulnerabilities. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 10 pages, 4 figures, conference

arXiv:2404.10875 [pdf, other]

SA-DS: A Dataset for Large Language Model-Driven AI Accelerator Design Generation

Authors: Deepak Vungarala, Mahmoud Nazzal, Mehrdad Morsali, Chao Zhang, Arnob Ghosh, Abdallah Khreishah, Shaahin Angizi

Abstract: In the ever-evolving landscape of Deep Neural Networks (DNN) hardware acceleration, unlocking the true potential of systolic array accelerators has long been hindered by the daunting challenges of expertise and time investment. Large Language Models (LLMs) offer a promising solution for automating code generation which is key to unlocking unprecedented efficiency and performance in various domains… ▽ More In the ever-evolving landscape of Deep Neural Networks (DNN) hardware acceleration, unlocking the true potential of systolic array accelerators has long been hindered by the daunting challenges of expertise and time investment. Large Language Models (LLMs) offer a promising solution for automating code generation which is key to unlocking unprecedented efficiency and performance in various domains, including hardware descriptive code. The generative power of LLMs can enable the effective utilization of preexisting designs and dedicated hardware generators. However, the successful application of LLMs to hardware accelerator design is contingent upon the availability of specialized datasets tailored for this purpose. To bridge this gap, we introduce the Systolic Array-based Accelerator Data Set (SA-DS). SA-DS comprises a diverse collection of spatial array designs following the standardized Berkeley's Gemmini accelerator generator template, enabling design reuse, adaptation, and customization. SA-DS is intended to spark LLM-centered research on DNN hardware accelerator architecture. We envision that SA-DS provides a framework that will shape the course of DNN hardware acceleration research for generations to come. SA-DS is open-sourced under the permissive MIT license at https://github.com/ACADLab/SA-DS.git}{https://github.com/ACADLab/SA-DS. △ Less

Submitted 17 July, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 4 pages, 5 Figures

arXiv:2404.07553 [pdf, other]

SFSORT: Scene Features-based Simple Online Real-Time Tracker

Authors: M. M. Morsali, Z. Sharifi, F. Fallah, S. Hashembeiki, H. Mohammadzade, S. Bagheri Shouraki

Abstract: This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similar… ▽ More This paper introduces SFSORT, the world's fastest multi-object tracking system based on experiments conducted on MOT Challenge datasets. To achieve an accurate and computationally efficient tracker, this paper employs a tracking-by-detection method, following the online real-time tracking approach established in prior literature. By introducing a novel cost function called the Bounding Box Similarity Index, this work eliminates the Kalman Filter, leading to reduced computational requirements. Additionally, this paper demonstrates the impact of scene features on enhancing object-track association and improving track post-processing. Using a 2.2 GHz Intel Xeon CPU, the proposed method achieves an HOTA of 61.7\% with a processing speed of 2242 Hz on the MOT17 dataset and an HOTA of 60.9\% with a processing speed of 304 Hz on the MOT20 dataset. The tracker's source code, fine-tuned object detection model, and tutorials are available at \url{https://github.com/gitmehrdad/SFSORT}. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.05037 [pdf, other]

Lightator: An Optical Near-Sensor Accelerator with Compressive Acquisition Enabling Versatile Image Processing

Authors: Mehrdad Morsali, Brendan Reidy, Deniz Najafi, Sepehr Tabrizchi, Mohsen Imani, Mahdi Nikdast, Arman Roohi, Ramtin Zand, Shaahin Angizi

Abstract: This paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will sub… ▽ More This paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will substantially diminish the energy consumption and latency of conversion, transmission, and processing within the established cloud-centric architecture as well as recently designed edge accelerators. Our device-to-architecture simulation results show that with favorable accuracy, Lightator achieves 84.4 Kilo FPS/W and reduces power consumption by a factor of ~24x and 73x on average compared with existing photonic accelerators and GPU baseline. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 6 pages, 10 figures

arXiv:2312.05212 [pdf, other]

Enabling Normally-off In-Situ Computing with a Magneto-Electric FET-based SRAM Design

Authors: Deniz Najafi, Mehrdad Morsali, Ranyang Zhou, Arman Roohi, Andrew Marshall, Durga Misra, Shaahin Angizi

Abstract: As an emerging post-CMOS Field Effect Transistor, Magneto-Electric FETs (MEFETs) offer compelling design characteristics for logic and memory applications, such as high-speed switching, low power consumption, and non-volatility. In this paper, for the first time, a non-volatile MEFET-based SRAM design named ME-SRAM is proposed for edge applications which can remarkably save the SRAM static power c… ▽ More As an emerging post-CMOS Field Effect Transistor, Magneto-Electric FETs (MEFETs) offer compelling design characteristics for logic and memory applications, such as high-speed switching, low power consumption, and non-volatility. In this paper, for the first time, a non-volatile MEFET-based SRAM design named ME-SRAM is proposed for edge applications which can remarkably save the SRAM static power consumption in the idle state through a fast backup-restore process. To enable normally-off in-situ computing, the ME-SRAM cell is integrated into a novel processing-in-SRAM architecture that exploits a hardware-optimized bit-line computing approach for the execution of Boolean logic operations between operands housed in a memory sub-array within a single clock cycle. Our device-to-architecture evaluation results on Binary convolutional neural network acceleration show the robust performance of ME- SRAM while reducing energy consumption on average by a factor of 5.3 times compared to the best in-SRAM designs. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: 7 pages, 10 Figures, 4 Tables

arXiv:2311.18655 [pdf, other]

OISA: Architecting an Optical In-Sensor Accelerator for Efficient Visual Computing

Authors: Mehrdad Morsali, Sepehr Tabrizchi, Deniz Najafi, Mohsen Imani, Mahdi Nikdast, Arman Roohi, Shaahin Angizi

Abstract: Targeting vision applications at the edge, in this work, we systematically explore and propose a high-performance and energy-efficient Optical In-Sensor Accelerator architecture called OISA for the first time. Taking advantage of the promising efficiency of photonic devices, the OISA intrinsically implements a coarse-grained convolution operation on the input frames in an innovative minimum-conver… ▽ More Targeting vision applications at the edge, in this work, we systematically explore and propose a high-performance and energy-efficient Optical In-Sensor Accelerator architecture called OISA for the first time. Taking advantage of the promising efficiency of photonic devices, the OISA intrinsically implements a coarse-grained convolution operation on the input frames in an innovative minimum-conversion fashion in low-bit-width neural networks. Such a design remarkably reduces the power consumption of data conversion, transmission, and processing in the conventional cloud-centric architecture as well as recently-presented edge accelerators. Our device-to-architecture simulation results on various image data-sets demonstrate acceptable accuracy while OISA achieves 6.68 TOp/s/W efficiency. OISA reduces power consumption by a factor of 7.9 and 18.4 on average compared with existing electronic in-/near-sensor and ASIC accelerators. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 7 pages

arXiv:2303.14162 [pdf, other]

IMA-GNN: In-Memory Acceleration of Centralized and Decentralized Graph Neural Networks at the Edge

Authors: Mehrdad Morsali, Mahmoud Nazzal, Abdallah Khreishah, Shaahin Angizi

Abstract: In this paper, we propose IMA-GNN as an In-Memory Accelerator for centralized and decentralized Graph Neural Network inference, explore its potential in both settings and provide a guideline for the community targeting flexible and efficient edge computation. Leveraging IMA-GNN, we first model the computation and communication latencies of edge devices. We then present practical case studies on GN… ▽ More In this paper, we propose IMA-GNN as an In-Memory Accelerator for centralized and decentralized Graph Neural Network inference, explore its potential in both settings and provide a guideline for the community targeting flexible and efficient edge computation. Leveraging IMA-GNN, we first model the computation and communication latencies of edge devices. We then present practical case studies on GNN-based taxi demand and supply prediction and also adopt four large graph datasets to quantitatively compare and analyze centralized and decentralized settings. Our cross-layer simulation results demonstrate that on average, IMA-GNN in the centralized setting can obtain ~790x communication speed-up compared to the decentralized GNN setting. However, the decentralized setting performs computation ~1400x faster while reducing the power consumption per device. This further underlines the need for a hybrid semi-decentralized GNN approach. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: 6 pages, 8 Figures, 2 Tables

arXiv:2303.03666 [pdf, other]

Face: Fast, Accurate and Context-Aware Audio Annotation and Classification

Authors: M. Mehrdad Morsali, Hoda Mohammadzade, Saeed Bagheri Shouraki

Abstract: This paper presents a context-aware framework for feature selection and classification procedures to realize a fast and accurate audio event annotation and classification. The context-aware design starts with exploring feature extraction techniques to find an appropriate combination to select a set resulting in remarkable classification accuracy with minimal computational effort. The exploration f… ▽ More This paper presents a context-aware framework for feature selection and classification procedures to realize a fast and accurate audio event annotation and classification. The context-aware design starts with exploring feature extraction techniques to find an appropriate combination to select a set resulting in remarkable classification accuracy with minimal computational effort. The exploration for feature selection also embraces an investigation of audio Tempo representation, an advantageous feature extraction method missed by previous works in the environmental audio classification research scope. The proposed annotation method considers outlier, inlier, and hard-to-predict data samples to realize context-aware Active Learning, leading to the average accuracy of 90% when only 15% of data possess initial annotation. Our proposed algorithm for sound classification obtained average prediction accuracy of 98.05% on the UrbanSound8K dataset. The notebooks containing our source codes and implementation results are available at https://github.com/gitmehrdad/FACE. △ Less

Submitted 7 March, 2023; originally announced March 2023.

Comments: 9 pages, 4 figures

arXiv:2210.06698 [pdf, other]

A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks

Authors: Shaahin Angizi, Mehrdad Morsali, Sepehr Tabrizchi, Arman Roohi

Abstract: In this work, a high-speed and energy-efficient comparator-based Near-Sensor Local Binary Pattern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the… ▽ More In this work, a high-speed and energy-efficient comparator-based Near-Sensor Local Binary Pattern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN data-sets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2x and execution time by a factor of 4x compared to the best recent LBP-based networks. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 10 pages, 11 figures, 4 tables

arXiv:1702.03429 [pdf, other]

Decoupled Sampling Based Planning Method for Multiple Autonomous Vehicles

Authors: Fatemeh Mohseni, Mahdi Morsali

Abstract: This paper proposes a sampling based planning algorithm to control autonomous vehicles. We propose an improved Rapidly-exploring Random Tree which includes the definition of K- nearest points and propose a two-stage sampling strategy to adjust RRT in other to perform maneuver while avoiding collision. The simulation results show the success of the algorithm. This paper proposes a sampling based planning algorithm to control autonomous vehicles. We propose an improved Rapidly-exploring Random Tree which includes the definition of K- nearest points and propose a two-stage sampling strategy to adjust RRT in other to perform maneuver while avoiding collision. The simulation results show the success of the algorithm. △ Less

Submitted 11 February, 2017; originally announced February 2017.

arXiv:1612.06458 [pdf, other]

A sub-optimal sampling based method for path planning

Authors: Mahdi Morsali, Fatemeh Mohseni

Abstract: In this paper a search algorithm is proposed to find a sub optimal path for a non-holonomic system. For this purpose the algorithm starts sampling the front part of the vehicle and moves towards the destination with a cost function. The bicycle model is used to define the non-holonomic system and a stability analysis with different integration methods is performed on the dynamics of the system. A… ▽ More In this paper a search algorithm is proposed to find a sub optimal path for a non-holonomic system. For this purpose the algorithm starts sampling the front part of the vehicle and moves towards the destination with a cost function. The bicycle model is used to define the non-holonomic system and a stability analysis with different integration methods is performed on the dynamics of the system. A proper integration method is chosen with a reasonably large step size in order to decrease the computation time. When the tree is close enough to destination the algorithm returns the path and in order to connect the tree to destination point an optimal control problem using single shooting method is defined. To test the algorithm different scenarios are tested and the simulation results show the success of the algorithm. △ Less

Submitted 19 December, 2016; originally announced December 2016.

Showing 1–12 of 12 results for author: Morsali, M