Search | arXiv e-print repository

arXiv:2410.12584 [pdf, other]

Self-DenseMobileNet: A Robust Framework for Lung Nodule Classification using Self-ONN and Stacking-based Meta-Classifier

Authors: Md. Sohanur Rahman, Muhammad E. H. Chowdhury, Hasib Ryan Rahman, Mosabber Uddin Ahmed, Muhammad Ashad Kabir, Sanjiban Sekhar Roy, Rusab Sarmun

Abstract: In this study, we propose a novel and robust framework, Self-DenseMobileNet, designed to enhance the classification of nodules and non-nodules in chest radiographs (CXRs). Our approach integrates advanced image standardization and enhancement techniques to optimize the input quality, thereby improving classification accuracy. To enhance predictive accuracy and leverage the strengths of multiple mo… ▽ More In this study, we propose a novel and robust framework, Self-DenseMobileNet, designed to enhance the classification of nodules and non-nodules in chest radiographs (CXRs). Our approach integrates advanced image standardization and enhancement techniques to optimize the input quality, thereby improving classification accuracy. To enhance predictive accuracy and leverage the strengths of multiple models, the prediction probabilities from Self-DenseMobileNet were transformed into tabular data and used to train eight classical machine learning (ML) models; the top three performers were then combined via a stacking algorithm, creating a robust meta-classifier that integrates their collective insights for superior classification performance. To enhance the interpretability of our results, we employed class activation mapping (CAM) to visualize the decision-making process of the best-performing model. Our proposed framework demonstrated remarkable performance on internal validation data, achieving an accuracy of 99.28\% using a Meta-Random Forest Classifier. When tested on an external dataset, the framework maintained strong generalizability with an accuracy of 89.40\%. These results highlight a significant improvement in the classification of CXRs with lung nodules. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: 31 pages

arXiv:2410.02011 [pdf]

A Census-Based Genetic Algorithm for Target Set Selection Problem in Social Networks

Authors: Md. Samiur Rahman, Mohammad Shamim Ahsan, Tim Chen, Vijayakumar Varadarajan

Abstract: This paper considers the Target Set Selection (TSS) Problem in social networks, a fundamental problem in viral marketing. In the TSS problem, a graph and a threshold value for each vertex of the graph are given. We need to find a minimum size vertex subset to "activate" such that all graph vertices are activated at the end of the propagation process. Specifically, we propose a novel approach calle… ▽ More This paper considers the Target Set Selection (TSS) Problem in social networks, a fundamental problem in viral marketing. In the TSS problem, a graph and a threshold value for each vertex of the graph are given. We need to find a minimum size vertex subset to "activate" such that all graph vertices are activated at the end of the propagation process. Specifically, we propose a novel approach called "a census-based genetic algorithm" for the TSS problem. In our algorithm, we use the idea of a census to gather and store information about each individual in a population and collect census data from the individuals constructed during the algorithm's execution so that we can achieve greater diversity and avoid premature convergence at locally optimal solutions. We use two distinct census information: (a) for each individual, the algorithm stores how many times it has been identified during the execution (b) for each network node, the algorithm counts how many times it has been included in a solution. The proposed algorithm can also self-adjust by using a parameter specifying the aggressiveness employed in each reproduction method. Additionally, the algorithm is designed to run in a parallelized environment to minimize the computational cost and check each individual's feasibility. Moreover, our algorithm finds the optimal solution in all cases while experimenting on random graphs. Furthermore, we execute the proposed algorithm on 14 large graphs of real-life social network instances from the literature, improving around 9.57 solution size (on average) and 134 vertices (in total) compared to the best solutions obtained in previous studies. △ Less

Submitted 2 October, 2024; originally announced October 2024.

arXiv:2409.00567 [pdf]

Programmable refractive functions

Authors: Md Sadman Sakib Rahman, Tianyi Gan, Mona Jarrahi, Aydogan Ozcan

Abstract: Snell's law dictates the phenomenon of light refraction at the interface between two media. Here, we demonstrate, for the first time, arbitrary programming of light refraction through an engineered material where the direction of the output wave can be set independently for different directions of the input wave, covering arbitrarily selected permutations of light refraction between the input and… ▽ More Snell's law dictates the phenomenon of light refraction at the interface between two media. Here, we demonstrate, for the first time, arbitrary programming of light refraction through an engineered material where the direction of the output wave can be set independently for different directions of the input wave, covering arbitrarily selected permutations of light refraction between the input and output apertures. Formed by a set of cascaded transmissive layers with optimized phase profiles, this refractive function generator (RFG) spans only a few tens of wavelengths in the axial direction. In addition to monochrome RFG designs, we also report wavelength-multiplexed refractive functions, where a distinct refractive function is implemented at each wavelength through the same engineered material volume, i.e., the permutation of light refraction is switched from one desired function to another function by changing the illumination wavelength. As an experimental proof of concept, we demonstrate negative refractive function at the terahertz part of the spectrum using a 3D-printed material. Arbitrary programming of refractive functions enables new design capabilities for optical materials, devices and systems. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: 22 Pages, 10 Figures

arXiv:2408.09005 [pdf]

Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease

Authors: Nayeem Ahmed, Md Maruf Rahman, Md Fatin Ishrak, Md Imran Kabir Joy, Md Sanowar Hossain Sabuj, Md. Sadekur Rahman

Abstract: This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation wer… ▽ More This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. A carefully selected dataset of keratoconus, normal, and suspicious cases was used. The models tested include DenseNet121, EfficientNetB0, InceptionResNetV2, InceptionV3, MobileNetV2, ResNet50, VGG16, and VGG19. To maximize model training, bad sample removal, resizing, rescaling, and augmentation were used. The models were trained with similar parameters, activation function, classification function, and optimizer to compare performance. To determine class separation effectiveness, each model was evaluated on accuracy, precision, recall, and F1-score. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications. InceptionV3 and DenseNet121 both performed well in keratoconus detection, but they had trouble with questionable cases. In contrast, EfficientNetB0, ResNet50, and VGG19 had more difficulty distinguishing dubious cases from regular ones, indicating the need for model refining and development. A detailed comparison of state-of-the-art CNN architectures for automated keratoconus identification reveals each model's benefits and weaknesses. This study shows that advanced deep learning models can enhance keratoconus diagnosis and treatment planning. Future research should explore hybrid models and integrate clinical parameters to improve diagnostic accuracy and robustness in real-world clinical applications, paving the way for more effective AI-driven ophthalmology tools. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 14 pages, 3 tables, 27 figures

ACM Class: I.4.m

arXiv:2408.05449 [pdf]

Unidirectional imaging with partially coherent light

Authors: Guangdong Ma, Che-Yung Shen, Jingxi Li, Luzhe Huang, Cagatay Isil, Fazil Onuralp Ardic, Xilin Yang, Yuhang Li, Yuntian Wang, Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting th… ▽ More Unidirectional imagers form images of input objects only in one direction, e.g., from field-of-view (FOV) A to FOV B, while blocking the image formation in the reverse direction, from FOV B to FOV A. Here, we report unidirectional imaging under spatially partially coherent light and demonstrate high-quality imaging only in the forward direction (A->B) with high power efficiency while distorting the image formation in the backward direction (B->A) along with low power efficiency. Our reciprocal design features a set of spatially engineered linear diffractive layers that are statistically optimized for partially coherent illumination with a given phase correlation length. Our analyses reveal that when illuminated by a partially coherent beam with a correlation length of ~1.5 w or larger, where w is the wavelength of light, diffractive unidirectional imagers achieve robust performance, exhibiting asymmetric imaging performance between the forward and backward directions - as desired. A partially coherent unidirectional imager designed with a smaller correlation length of less than 1.5 w still supports unidirectional image transmission, but with a reduced figure of merit. These partially coherent diffractive unidirectional imagers are compact (axially spanning less than 75 w), polarization-independent, and compatible with various types of illumination sources, making them well-suited for applications in asymmetric visual information processing and communication. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: 25 Pages, 8 Figures

arXiv:2408.00984 [pdf, other]

GraphAge: Unleashing the power of Graph Neural Network to Decode Epigenetic Aging

Authors: Saleh Sakib Ahmed, Nahian Shabab, Md. Abul Hassan Samee, M. Sohel Rahman

Abstract: DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and rela… ▽ More DNA methylation is a crucial epigenetic marker used in various clocks to predict epigenetic age. However, many existing clocks fail to account for crucial information about CpG sites and their interrelationships, such as co-methylation patterns. We present a novel approach to represent methylation data as a graph, using methylation values and relevant information about CpG sites as nodes, and relationships like co-methylation, same gene, and same chromosome as edges. We then use a Graph Neural Network (GNN) to predict age. Thus our model, GraphAge, leverages both structural and positional information for prediction as well as better interpretation. Although we had to train in a constrained compute setting, GraphAge still showed competitive performance with a Mean Absolute Error (MAE) of 3.207 and a Mean Squared Error (MSE) of 25.277, slightly outperforming the current state of the art. Perhaps more importantly, we utilized GNN explainer for interpretation purposes and were able to unearth interesting insights (e.g., key CpG sites, pathways, and their relationships through Methylation Regulated Networks in the context of aging), which were not possible to 'decode' without leveraging the unique capability of GraphAge to 'encode' various structural relationships. GraphAge has the potential to consume and utilize all relevant information (if available) about an individual that relates to the complex process of aging. So, in that sense, it is one of its kind and can be seen as the first benchmark for a multimodal model that can incorporate all this information in order to close the gap in our understanding of the true nature of aging. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.05461 [pdf, other]

CAV-AD: A Robust Framework for Detection of Anomalous Data and Malicious Sensors in CAV Networks

Authors: Md Sazedur Rahman, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong

Abstract: The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Alth… ▽ More The adoption of connected and automated vehicles (CAVs) has sparked considerable interest across diverse industries, including public transportation, underground mining, and agriculture sectors. However, CAVs' reliance on sensor readings makes them vulnerable to significant threats. Manipulating these readings can compromise CAV network security, posing serious risks for malicious activities. Although several anomaly detection (AD) approaches for CAV networks are proposed, they often fail to: i) detect multiple anomalies in specific sensor(s) with high accuracy or F1 score, and ii) identify the specific sensor being attacked. In response, this paper proposes a novel framework tailored to CAV networks, called CAV-AD, for distinguishing abnormal readings amidst multiple anomaly data while identifying malicious sensors. Specifically, CAV-AD comprises two main components: i) A novel CNN model architecture called optimized omni-scale CNN (O-OS-CNN), which optimally selects the time scale by generating all possible kernel sizes for input time series data; ii) An amplification block to increase the values of anomaly readings, enhancing sensitivity for detecting anomalies. Not only that, but CAV-AD integrates the proposed O-OS-CNN with a Kalman filter to instantly identify the malicious sensors. We extensively train CAV-AD using real-world datasets containing both instant and constant attacks, evaluating its performance in detecting intrusions from multiple anomalies, which presents a more challenging scenario. Our results demonstrate that CAV-AD outperforms state-of-the-art methods, achieving an average accuracy of 98% and an average F1 score of 89\%, while accurately identifying the malicious sensors. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.10688 [pdf]

doi 10.1021/acsphotonics.4c01099

Integration of Programmable Diffraction with Digital Neural Networks

Authors: Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural network… ▽ More Optical imaging and sensing systems based on diffractive elements have seen massive advances over the last several decades. Earlier generations of diffractive optical processors were, in general, designed to deliver information to an independent system that was separately optimized, primarily driven by human vision or perception. With the recent advances in deep learning and digital neural networks, there have been efforts to establish diffractive processors that are jointly optimized with digital neural networks serving as their back-end. These jointly optimized hybrid (optical+digital) processors establish a new "diffractive language" between input electromagnetic waves that carry analog information and neural networks that process the digitized information at the back-end, providing the best of both worlds. Such hybrid designs can process spatially and temporally coherent, partially coherent, or incoherent input waves, providing universal coverage for any spatially varying set of point spread functions that can be optimized for a given task, executed in collaboration with digital neural networks. In this article, we highlight the utility of this exciting collaboration between engineered and programmed diffraction and digital neural networks for a diverse range of applications. We survey some of the major innovations enabled by the push-pull relationship between analog wave processing and digital neural networks, also covering the significant benefits that could be reaped through the synergy between these two complementary paradigms. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 30 Pages, 6 Figures

Journal ref: ACS Photonics (2024)

arXiv:2406.07710 [pdf, other]

Vehicle Speed Detection System Utilizing YOLOv8: Enhancing Road Safety and Traffic Management for Metropolitan Areas

Authors: SM Shaqib, Alaya Parvin Alo, Shahriar Sultan Ramit, Afraz Ul Haque Rupak, Sadman Sadik Khan, Md. Sadekur Rahman

Abstract: In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in… ▽ More In order to ensure traffic safety through a reduction in fatalities and accidents, vehicle speed detection is essential. Relentless driving practices are discouraged by the enforcement of speed restrictions, which are made possible by accurate monitoring of vehicle speeds. Road accidents remain one of the leading causes of death in Bangladesh. The Bangladesh Passenger Welfare Association stated in 2023 that 7,902 individuals lost their lives in traffic accidents during the course of the year. Efficient vehicle speed detection is essential to maintaining traffic safety. Reliable speed detection can also help gather important traffic data, which makes it easier to optimize traffic flow and provide safer road infrastructure. The YOLOv8 model can recognize and track cars in videos with greater speed and accuracy when trained under close supervision. By providing insights into the application of supervised learning in object identification for vehicle speed estimation and concentrating on the particular traffic conditions and safety concerns in Bangladesh, this work represents a noteworthy contribution to the area. The MAE was 3.5 and RMSE was 4.22 between the predicted speed of our model and the actual speed or the ground truth measured by the speedometer Promising increased efficiency and wider applicability in a variety of traffic conditions, the suggested solution offers a financially viable substitute for conventional approaches. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.11188 [pdf, other]

Wind Power Prediction across Different Locations using Deep Domain Adaptive Learning

Authors: Md Saiful Islam Sajol, Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman, Jubair Yusuf

Abstract: Accurate prediction of wind power is essential for the grid integration of this intermittent renewable source and aiding grid planners in forecasting available wind capacity. Spatial differences lead to discrepancies in climatological data distributions between two geographically dispersed regions, consequently making the prediction task more difficult. Thus, a prediction model that learns from th… ▽ More Accurate prediction of wind power is essential for the grid integration of this intermittent renewable source and aiding grid planners in forecasting available wind capacity. Spatial differences lead to discrepancies in climatological data distributions between two geographically dispersed regions, consequently making the prediction task more difficult. Thus, a prediction model that learns from the data of a particular climatic region can suffer from being less robust. A deep neural network (DNN) based domain adaptive approach is proposed to counter this drawback. Effective weather features from a large set of weather parameters are selected using a random forest approach. A pre-trained model from the source domain is utilized to perform the prediction task, assuming no source data is available during target domain prediction. The weights of only the last few layers of the DNN model are updated throughout the task, keeping the rest of the network unchanged, making the model faster compared to the traditional approaches. The proposed approach demonstrates higher accuracy ranging from 6.14% to even 28.44% compared to the traditional non-adaptive method. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.06126 [pdf, other]

Quantum Secure Anonymous Communication Networks

Authors: Mohammad Saidur Rahman, Stephen DiAdamo, Miralem Mehic, Charles Fleming

Abstract: Anonymous communication networks (ACNs) enable Internet browsing in a way that prevents the accessed content from being traced back to the user. This allows a high level of privacy, protecting individuals from being tracked by advertisers or governments, for example. The Tor network, a prominent example of such a network, uses a layered encryption scheme to encapsulate data packets, using Tor node… ▽ More Anonymous communication networks (ACNs) enable Internet browsing in a way that prevents the accessed content from being traced back to the user. This allows a high level of privacy, protecting individuals from being tracked by advertisers or governments, for example. The Tor network, a prominent example of such a network, uses a layered encryption scheme to encapsulate data packets, using Tor nodes to obscure the routing process before the packets enter the public Internet. While Tor is capable of providing substantial privacy, its encryption relies on schemes, such as RSA and Diffie-Hellman for distributing symmetric keys, which are vulnerable to quantum computing attacks and are currently in the process of being phased out. To overcome the threat, we propose a quantum-resistant alternative to RSA and Diffie-Hellman for distributing symmetric keys, namely, quantum key distribution (QKD). Standard QKD networks depend on trusted nodes to relay keys across long distances, however, reliance on trusted nodes in the quantum network does not meet the criteria necessary for establishing a Tor circuit in the ACN. We address this issue by developing a protocol and network architecture that integrates QKD without the need for trusted nodes, thus meeting the requirements of the Tor network and creating a quantum-secure anonymous communication network. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted for publication in QCNC2024

arXiv:2404.12258 [pdf, ps, other]

DeepLocalization: Using change point detection for Temporal Action Localization

Authors: Mohammed Shaiqur Rahman, Ibne Farabi Shihab, Lynna Chu, Anuj Sharma

Abstract: In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveragi… ▽ More In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior. Utilizing the power of advanced deep learning methodologies, our objective is to tackle the critical issue of distracted driving-a significant factor contributing to road accidents. Our strategy employs a dual approach: leveraging Graph-Based Change-Point Detection for pinpointing actions in time alongside a Video Large Language Model (Video-LLM) for precisely categorizing activities. Through careful prompt engineering, we customize the Video-LLM to adeptly handle driving activities' nuances, ensuring its classification efficacy even with sparse data. Engineered to be lightweight, our framework is optimized for consumer-grade GPUs, making it vastly applicable in practical scenarios. We subjected our method to rigorous testing on the SynDD2 dataset, a complex benchmark for distracted driving behaviors, where it demonstrated commendable performance-achieving 57.5% accuracy in event classification and 51% in event detection. These outcomes underscore the substantial promise of DeepLocalization in accurately identifying diverse driver behaviors and their temporal occurrences, all within the bounds of limited computational resources. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.09432 [pdf, other]

The 8th AI City Challenge

Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC) people tracking, highlighting significant enhancements in camera count, character number, 3D annotation, and camera matrices, alongside new rules for 3D tracking and online tracking algorithm encouragement. Track 2 introduced dense video captioning for traffic safety, focusing on pedestrian accidents using multi-camera feeds to improve insights for insurance and prevention. Track 3 required teams to classify driver actions in a naturalistic driving analysis. Track 4 explored fish-eye camera analytics using the FishEye8K dataset. Track 5 focused on motorcycle helmet rule violation detection. The challenge utilized two leaderboards to showcase methods, with participants setting new benchmarks, some surpassing existing state-of-the-art achievements. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

arXiv:2403.06438 [pdf, other]

Unification of Secret Key Generation and Wiretap Channel Transmission

Authors: Yingbo Hua, Md Saydur Rahman

Abstract: This paper presents further insights into a recently developed round-trip communication scheme called ``Secret-message Transmission by Echoing Encrypted Probes (STEEP)''. A legitimate wireless channel between a multi-antenna user (Alice) and a single-antenna user (Bob) in the presence of a multi-antenna eavesdropper (Eve) is focused on. STEEP does not require full-duplex, channel reciprocity or Ev… ▽ More This paper presents further insights into a recently developed round-trip communication scheme called ``Secret-message Transmission by Echoing Encrypted Probes (STEEP)''. A legitimate wireless channel between a multi-antenna user (Alice) and a single-antenna user (Bob) in the presence of a multi-antenna eavesdropper (Eve) is focused on. STEEP does not require full-duplex, channel reciprocity or Eve's channel state information, but is able to yield a positive secrecy rate in bits per channel use between Alice and Bob in every channel coherence period as long as Eve's receive channel is not noiseless. This secrecy rate does not diminish as coherence time increases. Various statistical behaviors of STEEP's secrecy capacity due to random channel fading are also illustrated. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for presentation at IEEE ICC 2024

arXiv:2403.04311 [pdf, other]

ALTO: An Efficient Network Orchestrator for Compound AI Systems

Authors: Keshav Santhanam, Deepti Raghavan, Muhammad Shahir Rahman, Thejas Venkatesh, Neha Kunjal, Pratiksha Thaker, Philip Levis, Matei Zaharia

Abstract: We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO achieves high throughput and low latency by taking advantage of an optimization opportunity specific to generative language models: streaming intermediate outputs. As language models produce outputs token by token, ALTO exposes opportunities to stream intermediate outputs… ▽ More We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO achieves high throughput and low latency by taking advantage of an optimization opportunity specific to generative language models: streaming intermediate outputs. As language models produce outputs token by token, ALTO exposes opportunities to stream intermediate outputs between stages when possible. We highlight two new challenges of correctness and load balancing which emerge when streaming intermediate data across distributed pipeline stage instances. We also motivate the need for an aggregation-aware routing interface and distributed prompt-aware scheduling to address these challenges. We demonstrate the impact of ALTO's partial output streaming on a complex chatbot verification pipeline, increasing throughput by up to 3x for a fixed latency target of 4 seconds / request while also reducing tail latency by 1.8x compared to a baseline serving approach. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.01208 [pdf, other]

Location Agnostic Adaptive Rain Precipitation Prediction using Deep Learning

Authors: Md Shazid Islam, Md Saydur Rahman, Md Saad Ul Haque, Farhana Akter Tumpa, Md Sanzid Bin Hossain, Abul Al Arabi

Abstract: Rain precipitation prediction is a challenging task as it depends on weather and meteorological features which vary from location to location. As a result, a prediction model that performs well at one location does not perform well at other locations due to the distribution shifts. In addition, due to global warming, the weather patterns are changing very rapidly year by year which creates the pos… ▽ More Rain precipitation prediction is a challenging task as it depends on weather and meteorological features which vary from location to location. As a result, a prediction model that performs well at one location does not perform well at other locations due to the distribution shifts. In addition, due to global warming, the weather patterns are changing very rapidly year by year which creates the possibility of ineffectiveness of those models even at the same location as time passes. In our work, we have proposed an adaptive deep learning-based framework in order to provide a solution to the aforementioned challenges. Our method can generalize the model for the prediction of precipitation for any location where the methods without adaptation fail. Our method has shown 43.51%, 5.09%, and 38.62% improvement after adaptation using a deep neural network for predicting the precipitation of Paris, Los Angeles, and Tokyo, respectively. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.01206 [pdf, other]

Comparative Evaluation of Weather Forecasting using Machine Learning Models

Authors: Md Saydur Rahman, Farhana Akter Tumpa, Md Shazid Islam, Abul Al Arabi, Md Sanzid Bin Hossain, Md Saad Ul Haque

Abstract: Gaining a deeper understanding of weather and being able to predict its future conduct have always been considered important endeavors for the growth of our society. This research paper explores the advancements in understanding and predicting nature's behavior, particularly in the context of weather forecasting, through the application of machine learning algorithms. By leveraging the power of ma… ▽ More Gaining a deeper understanding of weather and being able to predict its future conduct have always been considered important endeavors for the growth of our society. This research paper explores the advancements in understanding and predicting nature's behavior, particularly in the context of weather forecasting, through the application of machine learning algorithms. By leveraging the power of machine learning, data mining, and data analysis techniques, significant progress has been made in this field. This study focuses on analyzing the contributions of various machine learning algorithms in predicting precipitation and temperature patterns using a 20-year dataset from a single weather station in Dhaka city. Algorithms such as Gradient Boosting, AdaBoosting, Artificial Neural Network, Stacking Random Forest, Stacking Neural Network, and Stacking KNN are evaluated and compared based on their performance metrics, including Confusion matrix measurements. The findings highlight remarkable achievements and provide valuable insights into their performances and features correlation. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.14422 [pdf, other]

Location Agnostic Source-Free Domain Adaptive Learning to Predict Solar Power Generation

Authors: Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman, Jubair Yusuf, Md Saiful Islam Sajol, Farhana Akter Tumpa

Abstract: The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a no… ▽ More The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a notable acceleration in the alteration of weather patterns on an annual basis. This phenomenon introduces the potential for diminished efficacy of existing models, even within the same geographical region, as time progresses. In this paper, a domain adaptive deep learning-based framework is proposed to estimate solar power generation using weather features that can solve the aforementioned challenges. A feed-forward deep convolutional network model is trained for a known location dataset in a supervised manner and utilized to predict the solar power of an unknown location later. This adaptive data-driven approach exhibits notable advantages in terms of computing speed, storage efficiency, and its ability to improve outcomes in scenarios where state-of-the-art non-adaptive methods fail. Our method has shown an improvement of $10.47 \%$, $7.44 \%$, $5.11\%$ in solar power prediction accuracy compared to best performing non-adaptive method for California (CA), Florida (FL) and New York (NY), respectively. △ Less

Submitted 6 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.10659 [pdf, other]

BadODD: Bangladeshi Autonomous Driving Object Detection Dataset

Authors: Mirza Nihal Baig, Rony Hajong, Mahdi Murshed Patwary, Mohammad Shahidur Rahman, Husne Ara Chowdhury

Abstract: We propose a comprehensive dataset for object detection in diverse driving environments across 9 districts in Bangladesh. The dataset, collected exclusively from smartphone cameras, provided a realistic representation of real-world scenarios, including day and night conditions. Most existing datasets lack suitable classes for autonomous navigation on Bangladeshi roads, making it challenging for re… ▽ More We propose a comprehensive dataset for object detection in diverse driving environments across 9 districts in Bangladesh. The dataset, collected exclusively from smartphone cameras, provided a realistic representation of real-world scenarios, including day and night conditions. Most existing datasets lack suitable classes for autonomous navigation on Bangladeshi roads, making it challenging for researchers to develop models that can handle the intricacies of road scenarios. To address this issue, the authors proposed a new set of classes based on characteristics rather than local vehicle names. The dataset aims to encourage the development of models that can handle the unique challenges of Bangladeshi road scenarios for the effective deployment of autonomous vehicles. The dataset did not consist of any online images to simulate real-world conditions faced by autonomous vehicles. The classification of vehicles is challenging because of the diverse range of vehicles on Bangladeshi roads, including those not found elsewhere in the world. The proposed classification system is scalable and can accommodate future vehicles, making it a valuable resource for researchers in the autonomous vehicle sector. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 7 pages

arXiv:2401.08923 [pdf]

doi 10.1186/s43593-024-00067-5

Subwavelength Imaging using a Solid-Immersion Diffractive Optical Processor

Authors: Jingtian Hu, Kun Liao, Niyazi Ulas Dinc, Carlo Gigli, Bijie Bai, Tianyi Gan, Xurong Li, Hanlong Chen, Xilin Yang, Yuhang Li, Cagatay Isil, Md Sadman Sakib Rahman, Jingxi Li, Xiaoyong Hu, Mona Jarrahi, Demetri Psaltis, Aydogan Ozcan

Abstract: Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive im… ▽ More Phase imaging is widely used in biomedical imaging, sensing, and material characterization, among other fields. However, direct imaging of phase objects with subwavelength resolution remains a challenge. Here, we demonstrate subwavelength imaging of phase and amplitude objects based on all-optical diffractive encoding and decoding. To resolve subwavelength features of an object, the diffractive imager uses a thin, high-index solid-immersion layer to transmit high-frequency information of the object to a spatially-optimized diffractive encoder, which converts/encodes high-frequency information of the input into low-frequency spatial modes for transmission through air. The subsequent diffractive decoder layers (in air) are jointly designed with the encoder using deep-learning-based optimization, and communicate with the encoder layer to create magnified images of input objects at its output, revealing subwavelength features that would otherwise be washed away due to diffraction limit. We demonstrate that this all-optical collaboration between a diffractive solid-immersion encoder and the following decoder layers in air can resolve subwavelength phase and amplitude features of input objects in a highly compact design. To experimentally demonstrate its proof-of-concept, we used terahertz radiation and developed a fabrication method for creating monolithic multi-layer diffractive processors. Through these monolithically fabricated diffractive encoder-decoder pairs, we demonstrated phase-to-intensity transformations and all-optically reconstructed subwavelength phase features of input objects by directly transforming them into magnified intensity features at the output. This solid-immersion-based diffractive imager, with its compact and cost-effective design, can find wide-ranging applications in bioimaging, endoscopy, sensing and materials characterization. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 32 Pages, 9 Figures

Journal ref: eLight (2024)

arXiv:2401.03530 [pdf, other]

Detecting Anomalies in Blockchain Transactions using Machine Learning Classifiers and Explainability Analysis

Authors: Mohammad Hasan, Mohammad Shahriar Rahman, Helge Janicke, Iqbal H. Sarker

Abstract: As the use of Blockchain for digital payments continues to rise in popularity, it also becomes susceptible to various malicious attacks. Successfully detecting anomalies within Blockchain transactions is essential for bolstering trust in digital payments. However, the task of anomaly detection in Blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions. A… ▽ More As the use of Blockchain for digital payments continues to rise in popularity, it also becomes susceptible to various malicious attacks. Successfully detecting anomalies within Blockchain transactions is essential for bolstering trust in digital payments. However, the task of anomaly detection in Blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions. Although several studies have been conducted in the field, a limitation persists: the lack of explanations for the model's predictions. This study seeks to overcome this limitation by integrating eXplainable Artificial Intelligence (XAI) techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions. The Shapley Additive exPlanation (SHAP) method is employed to measure the contribution of each feature, and it is compatible with ensemble models. Moreover, we present rules for interpreting whether a Bitcoin transaction is anomalous or not. Additionally, we have introduced an under-sampling algorithm named XGBCLUS, designed to balance anomalous and non-anomalous transaction data. This algorithm is compared against other commonly used under-sampling and over-sampling techniques. Finally, the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers. Our experimental results demonstrate that: (i) XGBCLUS enhances TPR and ROC-AUC scores compared to state-of-the-art under-sampling and over-sampling techniques, and (ii) our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy, TPR, and FPR scores. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.05780 [pdf, other]

PULSAR: Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convolutions for Parkinson's Disease Recognition

Authors: Md. Zarif Ul Alam, Md Saiful Islam, Ehsan Hoque, M Saifur Rahman

Abstract: Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public hea… ▽ More Parkinson's disease (PD) is a neuro-degenerative disorder that affects movement, speech, and coordination. Timely diagnosis and treatment can improve the quality of life for PD patients. However, access to clinical diagnosis is limited in low and middle income countries (LMICs). Therefore, development of automated screening tools for PD can have a huge social impact, particularly in the public health sector. In this paper, we present PULSAR, a novel method to screen for PD from webcam-recorded videos of the finger-tapping task from the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS-UPDRS). PULSAR is trained and evaluated on data collected from 382 participants (183 self-reported as PD patients). We used an adaptive graph convolutional neural network to dynamically learn the spatio temporal graph edges specific to the finger-tapping task. We enhanced this idea with a multi stream adaptive convolution model to learn features from different modalities of data critical to detect PD, such as relative location of the finger joints, velocity and acceleration of tapping. As the labels of the videos are self-reported, there could be cases of undiagnosed PD in the non-PD labeled samples. We leveraged the idea of Positive Unlabeled (PU) Learning that does not need labeled negative data. Our experiments show clear benefit of modeling the problem in this way. PULSAR achieved 80.95% accuracy in validation set and a mean accuracy of 71.29% (2.49% standard deviation) in independent test, despite being trained with limited amount of data. This is specially promising as labeled data is scarce in health care sector. We hope PULSAR will make PD screening more accessible to everyone. The proposed techniques could be extended for assessment of other movement disorders, such as ataxia, and Huntington's disease. △ Less

Submitted 16 February, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

arXiv:2310.16991 [pdf]

An Efficient Deep Learning-based approach for Recognizing Agricultural Pests in the Wild

Authors: Mohtasim Hadi Rafi, Mohammad Ratul Mahjabin, Md Sabbir Rahman

Abstract: One of the biggest challenges that the farmers go through is to fight insect pests during agricultural product yields. The problem can be solved easily and avoid economic losses by taking timely preventive measures. This requires identifying insect pests in an easy and effective manner. Most of the insect species have similarities between them. Without proper help from the agriculturist academicia… ▽ More One of the biggest challenges that the farmers go through is to fight insect pests during agricultural product yields. The problem can be solved easily and avoid economic losses by taking timely preventive measures. This requires identifying insect pests in an easy and effective manner. Most of the insect species have similarities between them. Without proper help from the agriculturist academician it is very challenging for the farmers to identify the crop pests accurately. To address this issue we have done extensive experiments considering different methods to find out the best method among all. This paper presents a detailed overview of the experiments done on mainly a robust dataset named IP102 including transfer learning with finetuning, attention mechanism and custom architecture. Some example from another dataset D0 is also shown to show robustness of our experimented techniques. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.14005 [pdf, ps, other]

Ophthalmic Biomarker Detection Using Ensembled Vision Transformers -- Winning Solution to IEEE SPS VIP Cup 2023

Authors: H. A. Z. Sameen Shahgir, Khondker Salman Sayeed, Tanjeem Azwad Zaman, Md. Asif Haider, Sheikh Saifur Rahman Jony, M. Sohel Rahman

Abstract: This report outlines our approach in the IEEE SPS VIP Cup 2023: Ophthalmic Biomarker Detection competition. Our primary objective in this competition was to identify biomarkers from Optical Coherence Tomography (OCT) images obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensemb… ▽ More This report outlines our approach in the IEEE SPS VIP Cup 2023: Ophthalmic Biomarker Detection competition. Our primary objective in this competition was to identify biomarkers from Optical Coherence Tomography (OCT) images obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensembled them at inference time. We find MaxViT's use of convolution layers followed by strided attention to be better suited for the detection of local features while EVA-02's use of normal attention mechanism and knowledge distillation is better for detecting global features. Ours was the best-performing solution in the competition, achieving a patient-wise F1 score of 0.814 in the first phase and 0.8527 in the second and final phase of VIP Cup 2023, scoring 3.8% higher than the next-best solution. △ Less

Submitted 21 October, 2023; originally announced October 2023.

arXiv:2310.03384 [pdf]

doi 10.1117/1.APN.3.1.016010

Complex-valued universal linear transformations and image encryption using spatially incoherent diffractive networks

Authors: Xilin Yang, Md Sadman Sakib Rahman, Bijie Bai, Jingxi Li, Aydogan Ozcan

Abstract: As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent l… ▽ More As an optical processor, a Diffractive Deep Neural Network (D2NN) utilizes engineered diffractive surfaces designed through machine learning to perform all-optical information processing, completing its tasks at the speed of light propagation through thin optical layers. With sufficient degrees-of-freedom, D2NNs can perform arbitrary complex-valued linear transformations using spatially coherent light. Similarly, D2NNs can also perform arbitrary linear intensity transformations with spatially incoherent illumination; however, under spatially incoherent light, these transformations are non-negative, acting on diffraction-limited optical intensity patterns at the input field-of-view (FOV). Here, we expand the use of spatially incoherent D2NNs to complex-valued information processing for executing arbitrary complex-valued linear transformations using spatially incoherent light. Through simulations, we show that as the number of optimized diffractive features increases beyond a threshold dictated by the multiplication of the input and output space-bandwidth products, a spatially incoherent diffractive visual processor can approximate any complex-valued linear transformation and be used for all-optical image encryption using incoherent illumination. The findings are important for the all-optical processing of information under natural light using various forms of diffractive surface-based optical processors. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 16 Pages, 3 Figures

Journal ref: Advanced Photonics Nexus (2024)

arXiv:2309.12502 [pdf, ps, other]

doi 10.1109/TSP.2023.3310252

Secure Degree of Freedom of Wireless Networks Using Collaborative Pilots

Authors: Yingbo Hua, Qingpeng Liang, Md Saydur Rahman

Abstract: A wireless network of full-duplex nodes/users, using anti-eavesdropping channel estimation (ANECE) based on collaborative pilots, can yield a positive secure degree-of-freedom (SDoF) regardless of the number of antennas an eavesdropper may have. This paper presents novel results on SDoF of ANECE by analyzing secret-key capacity (SKC) of each pair of nodes in a network of multiple collaborative nod… ▽ More A wireless network of full-duplex nodes/users, using anti-eavesdropping channel estimation (ANECE) based on collaborative pilots, can yield a positive secure degree-of-freedom (SDoF) regardless of the number of antennas an eavesdropper may have. This paper presents novel results on SDoF of ANECE by analyzing secret-key capacity (SKC) of each pair of nodes in a network of multiple collaborative nodes per channel coherence period. Each transmission session of ANECE has two phases: phase 1 is used for pilots, and phase 2 is used for random symbols. This results in two parts of SDoF of ANECE. Both lower and upper bounds on the SDoF of ANECE for any number of users are shown, and the conditions for the two bounds to meet are given. This leads to important discoveries, including: a) The phase-1 SDoF is the same for both multi-user ANECE and pair-wise ANECE while the former may require only a fraction of the number of time slots needed by the latter; b) For a three-user network, the phase-2 SDoF of all-user ANECE is generally larger than that of pair-wise ANECE; c) For a two-user network, a modified ANECE deploying square-shaped nonsingular pilot matrices yields a higher total SDoF than the original ANECE. The multi-user ANECE and the modified two-user ANECE shown in this paper appear to be the best full-duplex schemes known today in terms of SDoF subject to each node using a given number of antennas for both transmitting and receiving. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.02588 [pdf, other]

Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework

Authors: Tariq Adnan, Md Saiful Islam, Wasifur Rahman, Sangwu Lee, Sutapa Dey Tithi, Kazi Noshin, Imran Sarker, M Saifur Rahman, Ehsan Hoque

Abstract: Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing partic… ▽ More Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable performance, even on two external test sets the model has never seen during training, suggesting the potential for PD risk assessment from smiling selfie videos. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2306.10159 [pdf, other]

Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Authors: Md Zahid Hasan, Jiajing Chen, Jiyang Wang, Mohammed Shaiqur Rahman, Ameya Joshi, Senem Velipasalar, Chinmay Hegde, Anuj Sharma, Soumik Sarkar

Abstract: Recognizing the activities causing distraction in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency an… ▽ More Recognizing the activities causing distraction in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models, such as CLIP, have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos. CLIP's vision embedding offers zero-shot transfer and task-based finetuning, which can classify distracted activities from driving video data. Our results show that this framework offers state-of-the-art performance on zero-shot transfer and video-based CLIP for predicting the driver's state on two public datasets. We propose both frame-based and video-based frameworks developed on top of the CLIP's visual representation for distracted driving detection and classification tasks and report the results. △ Less

Submitted 21 March, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 15 pages, 7 figures

arXiv:2305.09224 [pdf, other]

doi 10.1109/JIOT.2022.3151982

Privacy-Preserving Ensemble Infused Enhanced Deep Neural Network Framework for Edge Cloud Convergence

Authors: Veronika Stephanie, Ibrahim Khalil, Mohammad Saidur Rahman, Mohammed Atiquzzaman

Abstract: We propose a privacy-preserving ensemble infused enhanced Deep Neural Network (DNN) based learning framework in this paper for Internet-of-Things (IoT), edge, and cloud convergence in the context of healthcare. In the convergence, edge server is used for both storing IoT produced bioimage and hosting DNN algorithm for local model training. The cloud is used for ensembling local models. The DNN-bas… ▽ More We propose a privacy-preserving ensemble infused enhanced Deep Neural Network (DNN) based learning framework in this paper for Internet-of-Things (IoT), edge, and cloud convergence in the context of healthcare. In the convergence, edge server is used for both storing IoT produced bioimage and hosting DNN algorithm for local model training. The cloud is used for ensembling local models. The DNN-based training process of a model with a local dataset suffers from low accuracy, which can be improved by the aforementioned convergence and Ensemble Learning. The ensemble learning allows multiple participants to outsource their local model for producing a generalized final model with high accuracy. Nevertheless, Ensemble Learning elevates the risk of leaking sensitive private data from the final model. The proposed framework presents a Differential Privacy-based privacy-preserving DNN with Transfer Learning for a local model generation to ensure minimal loss and higher efficiency at edge server. We conduct several experiments to evaluate the performance of our proposed framework. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Journal ref: IEEE Internet of Things Journal, vol. 10, no. 5, pp. 3763-3773, 1 March1, 2023

arXiv:2304.13379 [pdf, other]

doi 10.1007/978-3-031-23020-2_35

Blockchain-based Access Control for Secure Smart Industry Management Systems

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammad Saidur Rahman, Abdelaziz Bouras

Abstract: Smart manufacturing systems involve a large number of interconnected devices resulting in massive data generation. Cloud computing technology has recently gained increasing attention in smart manufacturing systems for facilitating cost-effective service provisioning and massive data management. In a cloud-based manufacturing system, ensuring authorized access to the data is crucial. A cloud platfo… ▽ More Smart manufacturing systems involve a large number of interconnected devices resulting in massive data generation. Cloud computing technology has recently gained increasing attention in smart manufacturing systems for facilitating cost-effective service provisioning and massive data management. In a cloud-based manufacturing system, ensuring authorized access to the data is crucial. A cloud platform is operated under a single authority. Hence, a cloud platform is prone to a single point of failure and vulnerable to adversaries. An internal or external adversary can easily modify users' access to allow unauthorized users to access the data. This paper proposes a role-based access control to prevent modification attacks by leveraging blockchain and smart contracts in a cloud-based smart manufacturing system. The role-based access control is developed to determine users' roles and rights in smart contracts. The smart contracts are then deployed to the private blockchain network. We evaluate our solution by utilizing Ethereum private blockchain network to deploy the smart contract. The experimental results demonstrate the feasibility and evaluation of the proposed framework's performance. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Journal ref: Network and System Security: 16th International Conference, NSS 2022, Denarau Island, Fiji, December, 2022

arXiv:2304.12889 [pdf, other]

doi 10.1109/TII.2022.3170348

Blockchain-based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things

Authors: Aditya Pribadi Kalapaaking, Ibrahim Khalil, Mohammad Saidur Rahman, Mohammed Atiquzzaman, Xun Yi, Mahathir Almashor

Abstract: This paper proposes a blockchain-based Federated Learning (FL) framework with Intel Software Guard Extension (SGX)-based Trusted Execution Environment (TEE) to securely aggregate local models in Industrial Internet-of-Things (IIoTs). In FL, local models can be tampered with by attackers. Hence, a global model generated from the tampered local models can be erroneous. Therefore, the proposed framew… ▽ More This paper proposes a blockchain-based Federated Learning (FL) framework with Intel Software Guard Extension (SGX)-based Trusted Execution Environment (TEE) to securely aggregate local models in Industrial Internet-of-Things (IIoTs). In FL, local models can be tampered with by attackers. Hence, a global model generated from the tampered local models can be erroneous. Therefore, the proposed framework leverages a blockchain network for secure model aggregation. Each blockchain node hosts an SGX-enabled processor that securely performs the FL-based aggregation tasks to generate a global model. Blockchain nodes can verify the authenticity of the aggregated model, run a blockchain consensus mechanism to ensure the integrity of the model, and add it to the distributed ledger for tamper-proof storage. Each cluster can obtain the aggregated model from the blockchain and verify its integrity before using it. We conducted several experiments with different CNN models and datasets to evaluate the performance of the proposed framework. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Journal ref: IEEE Transactions on Industrial Informatics, vol. 19, no. 2, pp. 1703-1714, Feb. 2023

arXiv:2304.10087 [pdf]

doi 10.1038/s41467-023-42556-0

Learning Diffractive Optical Communication Around Arbitrary Opaque Occlusions

Authors: Md Sadman Sakib Rahman, Tianyi Gan, Emir Arda Deger, Cagatay Isil, Mona Jarrahi, Aydogan Ozcan

Abstract: Free-space optical systems are emerging for high data rate communication and transfer of information in indoor and outdoor settings. However, free-space optical communication becomes challenging when an occlusion blocks the light path. Here, we demonstrate, for the first time, a direct communication scheme, passing optical information around a fully opaque, arbitrarily shaped obstacle that partial… ▽ More Free-space optical systems are emerging for high data rate communication and transfer of information in indoor and outdoor settings. However, free-space optical communication becomes challenging when an occlusion blocks the light path. Here, we demonstrate, for the first time, a direct communication scheme, passing optical information around a fully opaque, arbitrarily shaped obstacle that partially or entirely occludes the transmitter's field-of-view. In this scheme, an electronic neural network encoder and a diffractive optical network decoder are jointly trained using deep learning to transfer the optical information or message of interest around the opaque occlusion of an arbitrary shape. The diffractive decoder comprises successive spatially-engineered passive surfaces that process optical information through light-matter interactions. Following its training, the encoder-decoder pair can communicate any arbitrary optical information around opaque occlusions, where information decoding occurs at the speed of light propagation. For occlusions that change their size and/or shape as a function of time, the encoder neural network can be retrained to successfully communicate with the existing diffractive decoder, without changing the physical layer(s) already deployed. We also validate this framework experimentally in the terahertz spectrum using a 3D-printed diffractive decoder to communicate around a fully opaque occlusion. Scalable for operation in any wavelength regime, this scheme could be particularly useful in emerging high data-rate free-space communication systems. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 23 Pages, 9 Figures

Journal ref: Nature Communications (2023)

arXiv:2304.07500 [pdf, other]

The 7th AI City Challenge

Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

Abstract: The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries. Track 1 was a brand new track that… ▽ More The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries. Track 1 was a brand new track that focused on multi-target multi-camera (MTMC) people tracking, where teams trained and evaluated using both real and highly realistic synthetic data. Track 2 centered around natural-language-based vehicle track retrieval. Track 3 required teams to classify driver actions in naturalistic driving analysis. Track 4 aimed to develop an automated checkout system for retail stores using a single view camera. Track 5, another new addition, tasked teams with detecting violations of the helmet rule for motorcyclists. Two leader boards were released for submissions based on different methods: a public leader board for the contest where external private data wasn't allowed and a general leader board for all results submitted. The participating teams' top performances established strong baselines and even outperformed the state-of-the-art in the proposed challenge tracks. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: Summary of the 7th AI City Challenge Workshop in conjunction with CVPR 2023

arXiv:2303.13037 [pdf]

doi 10.1038/s41377-023-01234-y

Universal Linear Intensity Transformations Using Spatially-Incoherent Diffractive Processors

Authors: Md Sadman Sakib Rahman, Xilin Yang, Jingxi Li, Bijie Bai, Aydogan Ozcan

Abstract: Under spatially-coherent light, a diffractive optical network composed of structured surfaces can be designed to perform any arbitrary complex-valued linear transformation between its input and output fields-of-view (FOVs) if the total number (N) of optimizable phase-only diffractive features is greater than or equal to ~2 Ni x No, where Ni and No refer to the number of useful pixels at the input… ▽ More Under spatially-coherent light, a diffractive optical network composed of structured surfaces can be designed to perform any arbitrary complex-valued linear transformation between its input and output fields-of-view (FOVs) if the total number (N) of optimizable phase-only diffractive features is greater than or equal to ~2 Ni x No, where Ni and No refer to the number of useful pixels at the input and the output FOVs, respectively. Here we report the design of a spatially-incoherent diffractive optical processor that can approximate any arbitrary linear transformation in time-averaged intensity between its input and output FOVs. Under spatially-incoherent monochromatic light, the spatially-varying intensity point spread functon(H) of a diffractive network, corresponding to a given, arbitrarily-selected linear intensity transformation, can be written as H(m,n;m',n')=|h(m,n;m',n')|^2, where h is the spatially-coherent point-spread function of the same diffractive network, and (m,n) and (m',n') define the coordinates of the output and input FOVs, respectively. Using deep learning, supervised through examples of input-output profiles, we numerically demonstrate that a spatially-incoherent diffractive network can be trained to all-optically perform any arbitrary linear intensity transformation between its input and output if N is greater than or equal to ~2 Ni x No. These results constitute the first demonstration of universal linear intensity transformations performed on an input FOV under spatially-incoherent illumination and will be useful for designing all-optical visual processors that can work with incoherent, natural light. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: 29 Pages, 10 Figures

Journal ref: Light: Science & Applications (2023)

arXiv:2303.07425 [pdf]

Immense Fidelity Enhancement of Encoded Quantum Bell Pairs at Short and Long-distance Communication along with Generalized Design of Circuit

Authors: Syed Emad Uddin Shubha, Md. Saifur Rahman, M. R. C. Mahdy

Abstract: Quantum entanglement is a unique criterion of the quantum realm and an essential tool to secure quantum communication. Ensuring high-fidelity entanglement has always been a challenging task owing to interaction with the hostile channel environment created due to quantum noise and decoherence. Though several methods have been proposed, achieving almost 100% error correction is still a gigantic task… ▽ More Quantum entanglement is a unique criterion of the quantum realm and an essential tool to secure quantum communication. Ensuring high-fidelity entanglement has always been a challenging task owing to interaction with the hostile channel environment created due to quantum noise and decoherence. Though several methods have been proposed, achieving almost 100% error correction is still a gigantic task. As one of the main contributions of this work, a new model for large distance communication has been introduced, which can correct all bit flip errors or other errors quite extensively if proper encoding is used. To achieve this purpose, at the very first step, the idea of differentiating the long and short-distance applications has been introduced. Short-distance is determined by the maximum range of applying unitary control gates by the qubit technology. As far as we know, there is no previous work that distinguishes long and short distance applications. At the beginning, we have applied stabilizer formalism and Repetition Code for decoding to distinguish the error correcting ability in long and short distance communication. Particularly for short distance communication, it has been demonstrated that a properly encoded bell state can identify all the bit flip, or phase flip errors with 100% accuracy theoretically. In contrast, if the bell states are used in long distance communication, the error-detecting and correcting ability reduces at huge amounts. To increase the fidelity significantly and correct the errors quite extensively for long-distance communication, a new model based on classical communication protocol has been proposed. All the required circuits in these processes have been generalized during encoding. Proposed analytical results have also been verified with the Simulation results of IBM QISKIT QASM. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 24 pages, 25 equations, 14 figures

arXiv:2303.06313 [pdf]

doi 10.1142/13337

Cloud Forensic: Issues, Challenges and Solution Models

Authors: Sayada Sonia Akter, Mohammad Shahriar Rahman

Abstract: Cloud computing is a web-based utility model that is becoming popular every day with the emergence of 4th Industrial Revolution, therefore, cybercrimes that affect web-based systems are also relevant to cloud computing. In order to conduct a forensic investigation into a cyber-attack, it is necessary to identify and locate the source of the attack as soon as possible. Although significant study ha… ▽ More Cloud computing is a web-based utility model that is becoming popular every day with the emergence of 4th Industrial Revolution, therefore, cybercrimes that affect web-based systems are also relevant to cloud computing. In order to conduct a forensic investigation into a cyber-attack, it is necessary to identify and locate the source of the attack as soon as possible. Although significant study has been done in this domain on obstacles and its solutions, research on approaches and strategies is still in its development stage. There are barriers at every stage of cloud forensics, therefore, before we can come up with a comprehensive way to deal with these problems, we must first comprehend the cloud technology and its forensics environment. Although there are articles that are linked to cloud forensics, there is not yet a paper that accumulated the contemporary concerns and solutions related to cloud forensic. Throughout this chapter, we have looked at the cloud environment, as well as the threats and attacks that it may be subjected to. We have also looked at the approaches that cloud forensics may take, as well as the various frameworks and the practical challenges and limitations they may face when dealing with cloud forensic investigations. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 23 pages; 6 figures; 4 tables. Book chapter of the book titled "A Practical Guide on Security and Privacy in Cyber Physical Systems Foundations, Applications and Limitations", World Scientific Series in Digital Forensics and Cybersecurity

arXiv:2301.12682 [pdf, other]

Image Contrast Enhancement using Fuzzy Technique with Parameter Determination using Metaheuristics

Authors: Mohimenul Kabir, Jaiaid Mobin, Ahmad Hassanat, M. Sohel Rahman

Abstract: In this work, we have presented a way to increase the contrast of an image. Our target is to find a transformation that will be image specific. We have used a fuzzy system as our transformation function. To tune the system according to an image, we have used Genetic Algorithm and Hill Climbing in multiple ways to evolve the fuzzy system and conducted several experiments. Different variants of the… ▽ More In this work, we have presented a way to increase the contrast of an image. Our target is to find a transformation that will be image specific. We have used a fuzzy system as our transformation function. To tune the system according to an image, we have used Genetic Algorithm and Hill Climbing in multiple ways to evolve the fuzzy system and conducted several experiments. Different variants of the method are tested on several images and two variants that are superior to others in terms of fitness are selected. We have also conducted a survey to assess the visual improvement of the enhancements made by the two variants. The survey indicates that one of the methods can enhance the contrast of the images visually. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: 14 pages, 7 figures, Image Processing, Computer Vision, Evolutionary Computation

arXiv:2212.03298 [pdf, other]

WiSwarm: Age-of-Information-based Wireless Networking for Collaborative Teams of UAVs

Authors: Vishrant Tripathi, Igor Kadota, Ezra Tal, Muhammad Shahir Rahman, Alexander Warren, Sertac Karaman, Eytan Modiano

Abstract: The Age-of-Information (AoI) metric has been widely studied in the theoretical communication networks and queuing systems literature. However, experimental evaluation of its applicability to complex real-world time-sensitive systems is largely lacking. In this work, we develop, implement, and evaluate an AoI-based application layer middleware that enables the customization of WiFi networks to the… ▽ More The Age-of-Information (AoI) metric has been widely studied in the theoretical communication networks and queuing systems literature. However, experimental evaluation of its applicability to complex real-world time-sensitive systems is largely lacking. In this work, we develop, implement, and evaluate an AoI-based application layer middleware that enables the customization of WiFi networks to the needs of time-sensitive applications. By controlling the storage and flow of information in the underlying WiFi network, our middleware can: (i) prevent packet collisions; (ii) discard stale packets that are no longer useful; and (iii) dynamically prioritize the transmission of the most relevant information. To demonstrate the benefits of our middleware, we implement a mobility tracking application using a swarm of UAVs communicating with a central controller via WiFi. Our experimental results show that, when compared to WiFi-UDP/WiFi-TCP, the middleware can improve information freshness by a factor of 109x/48x and tracking accuracy by a factor of 4x/6x, respectively. Most importantly, our results also show that the performance gains of our approach increase as the system scales and/or the traffic load increases. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: To be presented at IEEE INFOCOM 2023

arXiv:2212.02666 [pdf, other]

doi 10.1145/3494110.3528242

Transformers for End-to-End InfoSec Tasks: A Feasibility Study

Authors: Ethan M. Rudd, Mohammad Saidur Rahman, Philip Tully

Abstract: In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files - in a novel end-to-end approach, and explore a variety of architectural designs, training regimes, and experi… ▽ More In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files - in a novel end-to-end approach, and explore a variety of architectural designs, training regimes, and experimental settings to determine the ingredients necessary for performant detection models. We show that in contrast to conventional transformers trained on more standard NLP-related tasks, our URL transformer model requires a different training approach to reach high performance levels. Specifically, we show that 1) pre-training on a massive corpus of unlabeled URL data for an auto-regressive task does not readily transfer to binary classification of malicious or benign URLs, but 2) that using an auxiliary auto-regressive loss improves performance when training from scratch. We introduce a method for mixed objective optimization, which dynamically balances contributions from both loss terms so that neither one of them dominates. We show that this method yields quantitative evaluation metrics comparable to that of several top-performing benchmark classifiers. Unlike URLs, binary executables contain longer and more distributed sequences of information-rich bytes. To accommodate such lengthy byte sequences, we introduce additional context length into the transformer by providing its self-attention layers with an adaptive span similar to Sukhbaatar et al. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets, but also point out the need for further exploration into model improvements in scalability and compute efficiency. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: Post-print of a manuscript accepted to ACM Asia-CCS Workshop on Robust Malware Analysis (WoRMA) 2022. 11 Pages total. arXiv admin note: substantial text overlap with arXiv:2011.03040

Journal ref: Proceedings of the 1st Workshop on Robust Malware Analysis (2022) 21-31

arXiv:2212.00414 [pdf]

A Comprehensive Study on Machine Learning Methods to Increase the Prediction Accuracy of Classifiers and Reduce the Number of Medical Tests Required to Diagnose Alzheimer'S Disease

Authors: Md. Sharifur Rahman, Professor Girijesh Prasad

Abstract: Alzheimer's patients gradually lose their ability to think, behave, and interact with others. Medical history, laboratory tests, daily activities, and personality changes can all be used to diagnose the disorder. A series of time-consuming and expensive tests are used to diagnose the illness. The most effective way to identify Alzheimer's disease is using a Random-forest classifier in this study,… ▽ More Alzheimer's patients gradually lose their ability to think, behave, and interact with others. Medical history, laboratory tests, daily activities, and personality changes can all be used to diagnose the disorder. A series of time-consuming and expensive tests are used to diagnose the illness. The most effective way to identify Alzheimer's disease is using a Random-forest classifier in this study, along with various other Machine Learning techniques. The main goal of this study is to fine-tune the classifier to detect illness with fewer tests while maintaining a reasonable disease discovery accuracy. We successfully identified the condition in almost 94% of cases using four of the thirty frequently utilized indicators. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: Presented at the 3rd International Conference on Machine Learning Techniques and Data Science (MLDS 2022)

arXiv:2211.02141 [pdf, other]

Shapes2Toon: Generating Cartoon Characters from Simple Geometric Shapes

Authors: Simanta Deb Turja, Mohammad Imrul Jubair, Md. Shafiur Rahman, Md. Hasib Al Zadid, Mohtasim Hossain Shovon, Md. Faraz Kabir Khan

Abstract: Cartoons are an important part of our entertainment culture. Though drawing a cartoon is not for everyone, creating it using an arrangement of basic geometric primitives that approximates that character is a fairly frequent technique in art. The key motivation behind this technique is that human bodies - as well as cartoon figures - can be split down into various basic geometric primitives. Numero… ▽ More Cartoons are an important part of our entertainment culture. Though drawing a cartoon is not for everyone, creating it using an arrangement of basic geometric primitives that approximates that character is a fairly frequent technique in art. The key motivation behind this technique is that human bodies - as well as cartoon figures - can be split down into various basic geometric primitives. Numerous tutorials are available that demonstrate how to draw figures using an appropriate arrangement of fundamental shapes, thus assisting us in creating cartoon characters. This technique is very beneficial for children in terms of teaching them how to draw cartoons. In this paper, we develop a tool - shape2toon - that aims to automate this approach by utilizing a generative adversarial network which combines geometric primitives (i.e. circles) and generate a cartoon figure (i.e. Mickey Mouse) depending on the given approximation. For this purpose, we created a dataset of geometrically represented cartoon characters. We apply an image-to-image translation technique on our dataset and report the results in this paper. The experimental results show that our system can generate cartoon characters from input layout of geometric shapes. In addition, we demonstrate a web-based tool as a practical implication of our work. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Accepted as a full paper in AICCSA2022 (19th ACS/IEEE International Conference on Computer Systems and Applications)

arXiv:2210.12921 [pdf]

Investigating self-supervised, weakly supervised and fully supervised training approaches for multi-domain automatic speech recognition: a study on Bangladeshi Bangla

Authors: Ahnaf Mozib Samin, M. Humayon Kobir, Md. Mushtaq Shahriyar Rafee, M. Firoz Ahmed, Mehedi Hasan, Partha Ghosh, Shafkat Kibria, M. Shahidur Rahman

Abstract: Despite huge improvements in automatic speech recognition (ASR) employing neural networks, ASR systems still suffer from a lack of robustness and generalizability issues due to domain shifting. This is mainly because principal corpus design criteria are often not identified and examined adequately while compiling ASR datasets. In this study, we investigate the robustness of the state-of-the-art tr… ▽ More Despite huge improvements in automatic speech recognition (ASR) employing neural networks, ASR systems still suffer from a lack of robustness and generalizability issues due to domain shifting. This is mainly because principal corpus design criteria are often not identified and examined adequately while compiling ASR datasets. In this study, we investigate the robustness of the state-of-the-art transfer learning approaches such as self-supervised wav2vec 2.0 and weakly supervised Whisper as well as fully supervised convolutional neural networks (CNNs) for multi-domain ASR. We also demonstrate the significance of domain selection while building a corpus by assessing these models on a novel multi-domain Bangladeshi Bangla ASR evaluation benchmark - BanSpeech, which contains approximately 6.52 hours of human-annotated speech and 8085 utterances from 13 distinct domains. SUBAK.KO, a mostly read speech corpus for the morphologically rich language Bangla, has been used to train the ASR systems. Experimental evaluation reveals that self-supervised cross-lingual pre-training is the best strategy compared to weak supervision and full supervision to tackle the multi-domain ASR task. Moreover, the ASR models trained on SUBAK.KO face difficulty recognizing speech from domains with mostly spontaneous speech. The BanSpeech will be publicly available to meet the need for a challenging evaluation benchmark for Bangla ASR. △ Less

Submitted 10 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2210.11174 [pdf, other]

Overlapping Community Detection using Dynamic Dilated Aggregation in Deep Residual GCN

Authors: Md Nurul Muttakin, Md Iqbal Hossain, Md Saidur Rahman

Abstract: Overlapping community detection is a key problem in graph mining. Some research has considered applying graph convolutional networks (GCN) to tackle the problem. However, it is still challenging to incorporate deep graph convolutional networks in the case of general irregular graphs. In this study, we design a deep dynamic residual graph convolutional network (DynaResGCN) based on our novel dynami… ▽ More Overlapping community detection is a key problem in graph mining. Some research has considered applying graph convolutional networks (GCN) to tackle the problem. However, it is still challenging to incorporate deep graph convolutional networks in the case of general irregular graphs. In this study, we design a deep dynamic residual graph convolutional network (DynaResGCN) based on our novel dynamic dilated aggregation mechanisms and a unified end-to-end encoder-decoder-based framework to detect overlapping communities in networks. The deep DynaResGCN model is used as the encoder, whereas we incorporate the Bernoulli-Poisson (BP) model as the decoder. Consequently, we apply our overlapping community detection framework in a research topics dataset without having ground truth, a set of networks from Facebook having a reliable (hand-labeled) ground truth, and in a set of very large co-authorship networks having empirical (not hand-labeled) ground truth. Our experimentation on these datasets shows significantly superior performance over many state-of-the-art methods for the detection of overlapping communities in networks. △ Less

Submitted 28 September, 2024; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2208.10802 [pdf]

doi 10.1002/aisy.202200387

Time-lapse image classification using a diffractive neural network

Authors: Md Sadman Sakib Rahman, Aydogan Ozcan

Abstract: Diffractive deep neural networks (D2NNs) define an all-optical computing framework comprised of spatially engineered passive surfaces that collectively process optical input information by modulating the amplitude and/or the phase of the propagating light. Diffractive optical networks complete their computational tasks at the speed of light propagation through a thin diffractive volume, without an… ▽ More Diffractive deep neural networks (D2NNs) define an all-optical computing framework comprised of spatially engineered passive surfaces that collectively process optical input information by modulating the amplitude and/or the phase of the propagating light. Diffractive optical networks complete their computational tasks at the speed of light propagation through a thin diffractive volume, without any external computing power while exploiting the massive parallelism of optics. Diffractive networks were demonstrated to achieve all-optical classification of objects and perform universal linear transformations. Here we demonstrate, for the first time, a "time-lapse" image classification scheme using a diffractive network, significantly advancing its classification accuracy and generalization performance on complex input objects by using the lateral movements of the input objects and/or the diffractive network, relative to each other. In a different context, such relative movements of the objects and/or the camera are routinely being used for image super-resolution applications; inspired by their success, we designed a time-lapse diffractive network to benefit from the complementary information content created by controlled or random lateral shifts. We numerically explored the design space and performance limits of time-lapse diffractive networks, revealing a blind testing accuracy of 62.03% on the optical classification of objects from the CIFAR-10 dataset. This constitutes the highest inference accuracy achieved so far using a single diffractive network on the CIFAR-10 dataset. Time-lapse diffractive networks will be broadly useful for the spatio-temporal analysis of input signals using all-optical processors. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: 17 Pages, 4 Figures, 2 Tables

Journal ref: Advanced Intelligent Systems (2023)

arXiv:2208.06568 [pdf, other]

On the Limitations of Continual Learning for Malware Classification

Authors: Mohammad Saidur Rahman, Scott E. Coull, Matthew Wright

Abstract: Malicious software (malware) classification offers a unique challenge for continual learning (CL) regimes due to the volume of new samples received on a daily basis and the evolution of malware to exploit new vulnerabilities. On a typical day, antivirus vendors receive hundreds of thousands of unique pieces of software, both malicious and benign, and over the course of the lifetime of a malware cl… ▽ More Malicious software (malware) classification offers a unique challenge for continual learning (CL) regimes due to the volume of new samples received on a daily basis and the evolution of malware to exploit new vulnerabilities. On a typical day, antivirus vendors receive hundreds of thousands of unique pieces of software, both malicious and benign, and over the course of the lifetime of a malware classifier, more than a billion samples can easily accumulate. Given the scale of the problem, sequential training using continual learning techniques could provide substantial benefits in reducing training and storage overhead. To date, however, there has been no exploration of CL applied to malware classification tasks. In this paper, we study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios, including task, class, and domain incremental learning (IL). Specifically, using two realistic, large-scale malware datasets, we evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks. To our surprise, continual learning methods significantly underperformed naive Joint replay of the training data in nearly all settings -- in some cases reducing accuracy by more than 70 percentage points. A simple approach of selectively replaying 20% of the stored data achieves better performance, with 50% of the training time compared to Joint replay. Finally, we discuss potential reasons for the unexpectedly poor performance of the CL techniques, with the hope that it spurs further research on developing techniques that are more effective in the malware classification domain. △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: 19 pages, 11 figures, and 2 tables, Accepted at the Conference on Lifelong Learning Agents - CoLLAs 2022

arXiv:2206.14350 [pdf]

Convolutional Neural Network Based Partial Face Detection

Authors: Md. Towfiqul Islam, Tanzim Ahmed, A. B. M. Raihanur Rashid, Taminul Islam, Md. Sadekur Rahman, Md. Tarek Habib

Abstract: Due to the massive explanation of artificial intelligence, machine learning technology is being used in various areas of our day-to-day life. In the world, there are a lot of scenarios where a simple crime can be prevented before it may even happen or find the person responsible for it. A face is one distinctive feature that we have and can differentiate easily among many other species. But not ju… ▽ More Due to the massive explanation of artificial intelligence, machine learning technology is being used in various areas of our day-to-day life. In the world, there are a lot of scenarios where a simple crime can be prevented before it may even happen or find the person responsible for it. A face is one distinctive feature that we have and can differentiate easily among many other species. But not just different species, it also plays a significant role in determining someone from the same species as us, humans. Regarding this critical feature, a single problem occurs most often nowadays. When the camera is pointed, it cannot detect a person's face, and it becomes a poor image. On the other hand, where there was a robbery and a security camera installed, the robber's identity is almost indistinguishable due to the low-quality camera. But just making an excellent algorithm to work and detecting a face reduces the cost of hardware, and it doesn't cost that much to focus on that area. Facial recognition, widget control, and such can be done by detecting the face correctly. This study aims to create and enhance a machine learning model that correctly recognizes faces. Total 627 Data have been collected from different Bangladeshi people's faces on four angels. In this work, CNN, Harr Cascade, Cascaded CNN, Deep CNN & MTCNN are these five machine learning approaches implemented to get the best accuracy of our dataset. After creating and running the model, Multi-Task Convolutional Neural Network (MTCNN) achieved 96.2% best model accuracy with training data rather than other machine learning models. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted in 7th International Conference for Convergence in Technology (I2CT), 2022, 6 pages, 7 figures

arXiv:2205.08723 [pdf, other]

doi 10.1016/j.nima.2022.167773

Pulse Shape Discrimination of low-energy nuclear and electron recoils for improved particle identification in NaI:Tl

Authors: N. J. Spinks, L. J. Bignell, G. J. Lane, A. Akber, E. Barberio, T. Baroncelli, B. J. Coombes, J. T. H. Dowie, T. K. Eriksen, M. S. M. Gerathy, T. J. Gray, I. Mahmood, B. P. McCormick, W. J. D. Melbourne, A. J. Mitchell, F. Nuti, M. S. Rahman, F. Scutti, A. E. Stuchbery, H. Timmers, P. Urquijo, Y. Y. Zhong, M. J. Zurowski

Abstract: The scintillation mechanism in NaI:Tl crystals produces different pulse shapes that are dependent on the incoming particle type. The time distribution of scintillation light from nuclear recoil events decays faster than for electron recoil events and this difference can be categorised using various Pulse Shape Discrimination (PSD) techniques. In this study, we measured nuclear and electron recoils… ▽ More The scintillation mechanism in NaI:Tl crystals produces different pulse shapes that are dependent on the incoming particle type. The time distribution of scintillation light from nuclear recoil events decays faster than for electron recoil events and this difference can be categorised using various Pulse Shape Discrimination (PSD) techniques. In this study, we measured nuclear and electron recoils in a NaI:Tl crystal, with electron equivalent energies between 2 and 40 keV. We report on a new PSD approach, based on an event-type likelihood; this outperforms the charge-weighted mean-time, which is the conventional metric for PSD in NaI:Tl. Furthermore, we show that a linear combination of the two methods improves the discrimination power at these energies. △ Less

Submitted 18 May, 2022; originally announced May 2022.

Comments: 12 pages, 12 figures, 1 table

arXiv:2205.04225 [pdf, ps, other]

doi 10.1016/j.ipl.2022.106284

New Results on Pairwise Compatibility Graphs

Authors: Sheikh Azizul Hakim, Bishal Basak Papan, Md. Saidur Rahman

Abstract: A graph $G=(V,E)$ is called a pairwise compatibility graph (PCG) if there exists an edge-weighted tree $T$ and two non-negative real numbers $d_{min}$ and $d_{max}$ such that each leaf $u$ of $T$ corresponds to a vertex $u \in V$ and there is an edge $(u, v) \in E$ if and only if $d_{min} \leq d_{T}(u, v) \leq d_{max}$, where $d_T(u, v)$ is the sum of the weights of the edges on the unique path fr… ▽ More A graph $G=(V,E)$ is called a pairwise compatibility graph (PCG) if there exists an edge-weighted tree $T$ and two non-negative real numbers $d_{min}$ and $d_{max}$ such that each leaf $u$ of $T$ corresponds to a vertex $u \in V$ and there is an edge $(u, v) \in E$ if and only if $d_{min} \leq d_{T}(u, v) \leq d_{max}$, where $d_T(u, v)$ is the sum of the weights of the edges on the unique path from $u$ to $v$ in $T$. The tree $T$ is called the pairwise compatibility tree (PCT) of $G$. It has been proven that not all graphs are PCGs. Thus, it is interesting to know which classes of graphs are PCGs. In this paper, we prove that grid graphs are PCGs. Although there are a necessary condition and a sufficient condition known for a graph being a PCG, there are some classes of graphs that are intermediate to the classes defined by the necessary condition and the sufficient condition. In this paper, we show two examples of graphs that are included in these intermediate classes and prove that they are not PCGs. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: Manuscript accepted in Information Processing Letters

arXiv:2204.10380 [pdf, other]

The 6th AI City Challenge

Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Archana Venkatachalapathy, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Alice Li, Shangru Li, Rama Chellappa

Abstract: The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), and brick and mortar retail businesses. The four challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries.… ▽ More The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), and brick and mortar retail businesses. The four challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries. Track 1 addressed city-scale multi-target multi-camera (MTMC) vehicle tracking. Track 2 addressed natural-language-based vehicle track retrieval. Track 3 was a brand new track for naturalistic driving analysis, where the data were captured by several cameras mounted inside the vehicle focusing on driver safety, and the task was to classify driver actions. Track 4 was another new track aiming to achieve retail store automated checkout using only a single view camera. We released two leader boards for submissions based on different methods, including a public leader board for the contest, where no use of external data is allowed, and a general leader board for all submitted results. The top performance of participating teams established strong baselines and even outperformed the state-of-the-art in the proposed challenge tracks. △ Less

Submitted 9 June, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: Summary of the 6th AI City Challenge Workshop in conjunction with CVPR 2022. arXiv admin note: text overlap with arXiv:2104.12233

arXiv:2204.08096 [pdf]

Synthetic Distracted Driving (SynDD2) dataset for analyzing distracted behaviors and various gaze zones of a driver

Authors: Mohammed Shaiqur Rahman, Jiyang Wang, Senem Velipasalar Gursoy, David Anastasiu, Shuo Wang, Anuj Sharma

Abstract: This article presents a synthetic distracted driving (SynDD2 - a continuum of SynDD1) dataset for machine learning models to detect and analyze drivers' various distracted behavior and different gaze zones. We collected the data in a stationary vehicle using three in-vehicle cameras positioned at locations: on the dashboard, near the rearview mirror, and on the top right-side window corner. The da… ▽ More This article presents a synthetic distracted driving (SynDD2 - a continuum of SynDD1) dataset for machine learning models to detect and analyze drivers' various distracted behavior and different gaze zones. We collected the data in a stationary vehicle using three in-vehicle cameras positioned at locations: on the dashboard, near the rearview mirror, and on the top right-side window corner. The dataset contains two activity types: distracted activities and gaze zones for each participant, and each activity type has two sets: without appearance blocks and with appearance blocks such as wearing a hat or sunglasses. The order and duration of each activity for each participant are random. In addition, the dataset contains manual annotations for each activity, having its start and end time annotated. Researchers could use this dataset to evaluate the performance of machine learning algorithms to classify various distracting activities and gaze zones of drivers. △ Less

Submitted 10 April, 2023; v1 submitted 17 April, 2022; originally announced April 2022.

Showing 1–50 of 163 results for author: Rahman, M S