Search | arXiv e-print repository

Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning

Authors: Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar

Abstract: We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble. This not only results in performance improvements over the original REDQ (Chen et al. 2021) and its variant DroQ (Hi-raoka et al. 2022), thereby enhanc… ▽ More We present a novel method aimed at enhancing the sample efficiency of ensemble Q learning. Our proposed approach integrates multi-head self-attention into the ensembled Q networks while bootstrapping the state-action pairs ingested by the ensemble. This not only results in performance improvements over the original REDQ (Chen et al. 2021) and its variant DroQ (Hi-raoka et al. 2022), thereby enhancing Q predictions, but also effectively reduces both the average normalized bias and standard deviation of normalized bias within Q-function ensembles. Importantly, our method also performs well even in scenarios with a low update-to-data (UTD) ratio. Notably, the implementation of our proposed method is straightforward, requiring minimal modifications to the base model. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: FLAIRS-37 (2024)

arXiv:2405.07153 [pdf, other]

Photon loss effects on light-mediated non-Gaussian entangled Bose-Einstein condensates projecting with different photon measurement outcomes

Authors: Shuai Gao, Manish Chaudhary, Alexey N. Pyrkov, Ebubechukwu O. Ilo-Okeke, Xin Meng, Jingyan Feng, Muhammad Jamil Khan, Tim Byners, Chaogang Lou

Abstract: The theory of quantum information processing for macroscopic qubits is based on the fact that every macroscopic qubit has a conserved number of particles. However, from an experimental point of view, every such qubit experiences processes of decoherence that impact the possibilities for entanglement generation between such qubits and use in quantum information processing efficiently. One of the mo… ▽ More The theory of quantum information processing for macroscopic qubits is based on the fact that every macroscopic qubit has a conserved number of particles. However, from an experimental point of view, every such qubit experiences processes of decoherence that impact the possibilities for entanglement generation between such qubits and use in quantum information processing efficiently. One of the most prospective methods for generating entanglement between distant atomic BECs is quantum nondemolition measurements. Here, we study how the effects of photon measurement impact the entanglement when photon loss decoherence is included. We employ the thermally entangled state representation (TESR) and integral within the ordered operator(IWOP) approach to obtain the accurate density matrix in a photon loss channel. We demonstrate that varying outcomes of photon number measurements lead to the generation of distinct entangled states, each exhibiting unique characteristics. We find that using the Hofmann-Takeuchi and Duan-Giedke-Cirac-Zoller criterion provides advantages in entanglement detection compared to the Wineland squeezing and EPR steering criterion in such settings. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 13 pages, 9 figures

arXiv:2405.06128 [pdf, other]

Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion

Authors: Syed Hammad Ahmed, Muhammad Junaid Khan, Gita Sukthankar

Abstract: Due to the rise in video content creation targeted towards children, there is a need for robust content moderation schemes for video hosting platforms. A video that is visually benign may include audio content that is inappropriate for young children while being impossible to detect with a unimodal content moderation system. Popular video hosting platforms for children such as YouTube Kids still p… ▽ More Due to the rise in video content creation targeted towards children, there is a need for robust content moderation schemes for video hosting platforms. A video that is visually benign may include audio content that is inappropriate for young children while being impossible to detect with a unimodal content moderation system. Popular video hosting platforms for children such as YouTube Kids still publish videos which contain audio content that is not conducive to a child's healthy behavioral and physical development. A robust classification of malicious videos requires audio representations in addition to video features. However, recent content moderation approaches rarely employ multimodal architectures that explicitly consider non-speech audio cues. To address this, we present an efficient adaptation of CLIP (Contrastive Language-Image Pre-training) that can leverage contextual audio cues for enhanced content moderation. We incorporate 1) the audio modality and 2) prompt learning, while keeping the backbone modules of each modality frozen. We conduct our experiments on a multimodal version of the MOB (Malicious or Benign) dataset in supervised and few-shot settings. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 8 pages, 3 figures, Accepted at The 37th International FLAIRS Conference

arXiv:2404.17704 [pdf, other]

SPLICE -- Streamlining Digital Pathology Image Processing

Authors: Areej Alsaafin, Peyman Nejat, Abubakr Shafique, Jibran Khan, Saghir Alfasly, Ghazal Alabtah, H. R. Tizhoosh

Abstract: Digital pathology and the integration of artificial intelligence (AI) models have revolutionized histopathology, opening new opportunities. With the increasing availability of Whole Slide Images (WSIs), there's a growing demand for efficient retrieval, processing, and analysis of relevant images from vast biomedical archives. However, processing WSIs presents challenges due to their large size and… ▽ More Digital pathology and the integration of artificial intelligence (AI) models have revolutionized histopathology, opening new opportunities. With the increasing availability of Whole Slide Images (WSIs), there's a growing demand for efficient retrieval, processing, and analysis of relevant images from vast biomedical archives. However, processing WSIs presents challenges due to their large size and content complexity. Full computer digestion of WSIs is impractical, and processing all patches individually is prohibitively expensive. In this paper, we propose an unsupervised patching algorithm, Sequential Patching Lattice for Image Classification and Enquiry (SPLICE). This novel approach condenses a histopathology WSI into a compact set of representative patches, forming a "collage" of WSI while minimizing redundancy. SPLICE prioritizes patch quality and uniqueness by sequentially analyzing a WSI and selecting non-redundant representative features. We evaluated SPLICE for search and match applications, demonstrating improved accuracy, reduced computation time, and storage requirements compared to existing state-of-the-art methods. As an unsupervised method, SPLICE effectively reduces storage requirements for representing tissue images by 50%. This reduction enables numerous algorithms in computational pathology to operate much more efficiently, paving the way for accelerated adoption of digital pathology. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Under review for publication

arXiv:2404.13085 [pdf]

High-efficiency perovskite-organic blend light-emitting diodes featuring self-assembled monolayers as hole-injecting interlayers

Authors: Murali Gedda, Despoina Gkeka, Mohamad Insan Nugraha, Alberto D. Scaccabarozzi, Emre Yengel, Jafar I. Khan, Iain Hamilton, Yuanbao Lin, Marielle Deconinck, Yana Vaynzof, Frédéric Laquai, Donal D. C. Bradley, Thomas D. Anthopoulos

Abstract: The high photoluminescence efficiency, color purity, extended gamut, and solution processability make low-dimensional hybrid perovskites attractive for light-emitting diode (PeLED) applications. However, controlling the microstructure of these materials to improve the device performance remains challenging. Here, the development of highly efficient green PeLEDs based on blends of the quasi-2D (q2D… ▽ More The high photoluminescence efficiency, color purity, extended gamut, and solution processability make low-dimensional hybrid perovskites attractive for light-emitting diode (PeLED) applications. However, controlling the microstructure of these materials to improve the device performance remains challenging. Here, the development of highly efficient green PeLEDs based on blends of the quasi-2D (q2D) perovskite, PEA2Cs4Pb5Br16, and the wide bandgap organic semiconductor 2,7 dioctyl[1] benzothieno[3,2-b]benzothiophene (C8-BTBT) is reported. The presence of C8-BTBT enables the formation of single-crystal-like q2D PEA2Cs4Pb5Br16 domains that are uniform and highly luminescent. Combining the PEA2Cs4Pb5Br16:C8-BTBT with self-assembled monolayers (SAMs) as hole-injecting layers (HILs), yields green PeLEDs with greatly enhanced performance characteristics, including external quantum efficiency up to 18.6%, current efficiency up to 46.3 cd/A, the luminance of 45 276 cd m^-2, and improved operational stability compared to neat PeLEDs. The enhanced performance originates from multiple synergistic effects, including enhanced hole-injection enabled by the SAM HILs, the single crystal-like quality of the perovskite phase, and the reduced concentration of electronic defects. This work highlights perovskite:organic blends as promising systems for use in LEDs, while the use of SAM HILs creates new opportunities toward simpler and more stable PeLEDs. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11259 [pdf]

Photophysics of defect-passivated quasi-2D (PEA)2PbBr4 perovskite using an organic small-molecule

Authors: Jafar I. Khan, Murali Gedda, Mingcong Wang, Emre Yengel, Joshua A. Kreß, Yana Vaynzof, Thomas D. Anthopoulos, Frédéric Laquai

Abstract: 2D Ruddlesden - Popper perovskites are promising candidates for energy harvesting applications due to their tunable optical properties and excellent ambient stability. Moreover, they are solution-processable and compatible with upscalable manufacturing via various printing techniques. Unfortunately, such methods often induce large degrees of heterogeneity due to poorly controlled crystallization.… ▽ More 2D Ruddlesden - Popper perovskites are promising candidates for energy harvesting applications due to their tunable optical properties and excellent ambient stability. Moreover, they are solution-processable and compatible with upscalable manufacturing via various printing techniques. Unfortunately, such methods often induce large degrees of heterogeneity due to poorly controlled crystallization. Here, we address this issue by blending the well-known 2D perovskite (PEA)2PbBr4 with an organic small-molecule, namely C8-BTBT, employed as an additive with different blending ratios. Using terahertz (THz) absorption and temperature-dependent photoluminescence (PL) spectroscopy techniques we observe that with the C8-BTBT additive the photophysical properties are altered while the perovskite structure in the film remains unaffected. More precisely, the inclusion of trace amounts of C8-BTBT in the hybrid films results in defect passivation at perovskite platelet boundaries and at the surfaces, as indicated by increased carrier lifetimes and substantially increased photoluminescence quantum yields (PLQY). This in turn improves the responsivity of photodetectors using the 2D perovskite as active layer. Our study highlights a straightforward strategy for fabricating high-quality 2D perovskites via large-area processing techniques. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10162 [pdf, other]

Optimal Kernel Tuning Parameter Prediction using Deep Sequence Models

Authors: Khawir Mahmood, Jehandad Khan, Hammad Afzal

Abstract: GPU kernels have come to the forefront of computing due to their utility in varied fields, from high-performance computing to machine learning. A typical GPU compute kernel is invoked millions, if not billions of times in a typical application, which makes their performance highly critical. Due to the unknown nature of the optimization surface, an exhaustive search is required to discover the glob… ▽ More GPU kernels have come to the forefront of computing due to their utility in varied fields, from high-performance computing to machine learning. A typical GPU compute kernel is invoked millions, if not billions of times in a typical application, which makes their performance highly critical. Due to the unknown nature of the optimization surface, an exhaustive search is required to discover the global optimum, which is infeasible due to the possible exponential number of parameter combinations. In this work, we propose a methodology that uses deep sequence-to-sequence models to predict the optimal tuning parameters governing compute kernels. This work considers the prediction of kernel parameters as a sequence to the sequence translation problem, borrowing models from the Natural Language Processing (NLP) domain. Parameters describing the input, output and weight tensors are considered as the input language to the model that emits the corresponding kernel parameters. In essence, the model translates the problem parameter language to kernel parameter language. The core contributions of this work are: a) Proposing that a sequence to sequence model can accurately learn the performance dynamics of a GPU compute kernel b) A novel network architecture which predicts the kernel tuning parameters for GPU kernels, c) A constrained beam search which incorporates the physical limits of the GPU hardware as well as other expert knowledge reducing the search space. The proposed algorithm can achieve more than 90% accuracy on various convolutional kernels in MIOpen, the AMD machine learning primitives library. As a result, the proposed technique can reduce the development time and compute resources required to tune unseen input configurations, resulting in shorter development cycles, reduced development costs, and better user experience. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08024 [pdf, other]

The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health

Authors: M. Jaleed Khan, Ioana Duta, Beth Albert, William Cooke, Manu Vatish, Gabriel Davis Jones

Abstract: The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford M… ▽ More The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford Maternity (OxMat) dataset, the world's largest curated dataset of CTGs, featuring raw time series CTG data and extensive clinical data for both mothers and babies, which is ideally placed for machine learning. The OxMat dataset addresses the critical gap in women's health data by providing over 177,211 unique CTG recordings from 51,036 pregnancies, carefully curated and reviewed since 1991. The dataset also comprises over 200 antepartum, intrapartum and postpartum clinical variables, ensuring near-complete data for crucial outcomes such as stillbirth and acidaemia. While this dataset also covers the intrapartum stage, around 94% of the constituent CTGS are antepartum. This allows for a unique focus on the underserved antepartum period, in which early detection of at-risk fetuses can significantly improve health outcomes. Our comprehensive review of existing datasets reveals the limitations of current datasets: primarily, their lack of sufficient volume, detailed clinical data and antepartum data. The OxMat dataset lays a foundation for future AI-driven prenatal care, offering a robust resource for developing and testing algorithms aimed at improving maternal and fetal health outcomes. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.16347 [pdf, other]

doi 10.1145/3597503.3639194

ChatGPT Incorrectness Detection in Software Reviews

Authors: Minaoar Hossain Tanzil, Junaed Younus Khan, Gias Uddin

Abstract: We conducted a survey of 135 software engineering (SE) practitioners to understand how they use Generative AI-based chatbots like ChatGPT for SE tasks. We find that they want to use ChatGPT for SE tasks like software library selection but often worry about the truthfulness of ChatGPT responses. We developed a suite of techniques and a tool called CID (ChatGPT Incorrectness Detector) to automatical… ▽ More We conducted a survey of 135 software engineering (SE) practitioners to understand how they use Generative AI-based chatbots like ChatGPT for SE tasks. We find that they want to use ChatGPT for SE tasks like software library selection but often worry about the truthfulness of ChatGPT responses. We developed a suite of techniques and a tool called CID (ChatGPT Incorrectness Detector) to automatically test and detect the incorrectness in ChatGPT responses. CID is based on the iterative prompting to ChatGPT by asking it contextually similar but textually divergent questions (using an approach that utilizes metamorphic relationships in texts). The underlying principle in CID is that for a given question, a response that is different from other responses (across multiple incarnations of the question) is likely an incorrect response. In a benchmark study of library selection, we show that CID can detect incorrect responses from ChatGPT with an F1-score of 0.74 - 0.75. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Journal ref: IEEE/ACM 46th International Conference on Software Engineering (ICSE 2024)

arXiv:2403.14293 [pdf, other]

Human Reactions to Incorrect Answers from Robots

Authors: Ponkoj Chandra Shill, Md. Azizul Hakim, Muhammad Jahanzeb Khan, Bashira Akter Anima

Abstract: As robots grow more and more integrated into numerous industries, it is critical to comprehend how humans respond to their failures. This paper systematically studies how trust dynamics and system design are affected by human responses to robot failures. The three-stage survey used in the study provides a thorough understanding of human-robot interactions. While the second stage concentrates on in… ▽ More As robots grow more and more integrated into numerous industries, it is critical to comprehend how humans respond to their failures. This paper systematically studies how trust dynamics and system design are affected by human responses to robot failures. The three-stage survey used in the study provides a thorough understanding of human-robot interactions. While the second stage concentrates on interaction details, such as robot precision and error acknowledgment, the first stage collects demographic data and initial levels of trust. In the last phase, participants' perceptions are examined after the encounter, and trust dynamics, forgiveness, and propensity to suggest robotic technologies are evaluated. Results show that participants' trust in robotic technologies increased significantly when robots acknowledged their errors or limitations to participants and their willingness to suggest robots for activities in the future points to a favorable change in perception, emphasizing the role that direct engagement has in influencing trust dynamics. By providing useful advice for creating more sympathetic, responsive, and reliable robotic systems, the study advances the science of human-robot interaction and promotes a wider adoption of robotic technologies. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 6 pages, 6 figures, 1 table, Ro-Man 2024

arXiv:2402.14571 [pdf, other]

Sorting of mesoporous silica derivatives by random optical fields

Authors: Mohammad Hadi Sadri, Ramin Jamali, Asif Jamal Khan, Fozia Rehman, Ali-Reza Moradi

Abstract: Mesoporous silica particles are promising candidates for drug delivery applications. In this paper, we first synthesize mesoporous silica MCM-41 and its derivative MCM-41GA with anchored glutaraldehyde bridges, and characterize them using a variety of techniques, including nitrogen adsorption/desorption, X-ray diffraction, NMR spectroscopy, scanning electron microscopy, and thermogravimetric analy… ▽ More Mesoporous silica particles are promising candidates for drug delivery applications. In this paper, we first synthesize mesoporous silica MCM-41 and its derivative MCM-41GA with anchored glutaraldehyde bridges, and characterize them using a variety of techniques, including nitrogen adsorption/desorption, X-ray diffraction, NMR spectroscopy, scanning electron microscopy, and thermogravimetric analysis. Then, we employ random optical fields to sort mesoporous silica particles. Random optical fields by containing local intensity gradients throughout a wide range of field of view provide an elegant, easy-to-implement, and low-cost variant of multiple optical tweezers, which is known as speckle tweezers (ST). ST, similar to multiple optical tweezers, for manipulation tasks, such as trapping, sorting, and guiding of collection of micro and sub-micro objects in several disciplines including statistical physics, chemistry, microfluidics and material science. We show that ST can restrict, sieve, and sort MCM-41 and MCM-41GA particles. The different interaction of mesoporous silica variations with the applied ST may be attributed to the pre-applied modification and the differences in the porosity structure and distribution. Therefore, the results provide insight into the textural and chemical characteristics of mesoporous materials, contributing to a deeper understanding of their potential applications. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.10389 [pdf]

Enabling Zero Trust Security in IoMT Edge Network

Authors: Maha Ali Allouzi, Javed Khan

Abstract: Internet of Medical Things (IoMT) deals with a patient-data-rich segment, which makes security and privacy a severe concern for patients. Therefore, access control is a significant aspect of ensuring trust in the IoMT. However, deploying existing authentication and authorization solutions to the Internet of Medical Things (IoMT) is not straightforward because of highly dynamic and possibly unprote… ▽ More Internet of Medical Things (IoMT) deals with a patient-data-rich segment, which makes security and privacy a severe concern for patients. Therefore, access control is a significant aspect of ensuring trust in the IoMT. However, deploying existing authentication and authorization solutions to the Internet of Medical Things (IoMT) is not straightforward because of highly dynamic and possibly unprotected environments and untrusted supply chain for the IoT devices. In this article, we propose Soter, a Zero-Trust based authentication system for the IoMT. Soter Incorporates trust negotiation mechanisms within the Zero Trust framework to enable dynamic trust establishment. When a user or device seeks access to a resource, initiate a trust negotiation process. During this process, credentials, attributes, and contextual information are exchanged between the requester and the resource owner. Soter defines access rules based on various factors, including user identity, device health, and location. Access is granted or denied based on these conditions. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.00195 [pdf, other]

Dataset Condensation Driven Machine Unlearning

Authors: Junaid Iqbal Khan

Abstract: The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella… ▽ More The current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning. The naive approach to unlearning training data by retraining over the complement of the forget samples is susceptible to computational challenges. These challenges have been effectively addressed through a collection of techniques falling under the umbrella of machine unlearning. However, there still exists a lack of sufficiency in handling persistent computational challenges in harmony with the utility and privacy of unlearned model. We attribute this to the lack of work on improving the computational complexity of approximate unlearning from the perspective of the training dataset. In this paper, we aim to fill this gap by introducing dataset condensation as an essential component of machine unlearning in the context of image classification. To achieve this goal, we propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency. Furthermore, we present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks. Additionally, we explore a new application of our approach, which involves removing data from `condensed model', which can be employed to quickly train any arbitrary model without being influenced by unlearning samples. The corresponding code is available at \href{https://github.com/algebraicdianuj/DC_U}{URL}. △ Less

Submitted 12 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.03271 [pdf, other]

Analysis and Validation of Image Search Engines in Histopathology

Authors: Isaiah Lahr, Saghir Alfasly, Peyman Nejat, Jibran Khan, Luke Kottom, Vaishnavi Kumbhar, Areej Alsaafin, Abubakr Shafique, Sobhan Hemati, Ghazal Alabtah, Nneka Comfere, Dennis Murphee, Aaron Mangold, Saba Yasir, Chady Meroueh, Lisa Boardman, Vijay H. Shah, Joaquin J. Garcia, H. R. Tizhoosh

Abstract: Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient ma… ▽ More Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient matching. In this paper, we report extensive analysis and validation of four search methods bag of visual words (BoVW), Yottixel, SISH, RetCCL, and some of their potential variants. We analyze their algorithms and structures and assess their performance. For this evaluation, we utilized four internal datasets ($1269$ patients) and three public datasets ($1207$ patients), totaling more than $200,000$ patches from $38$ different classes/subtypes across five primary sites. Certain search engines, for example, BoVW, exhibit notable efficiency and speed but suffer from low accuracy. Conversely, search engines like Yottixel demonstrate efficiency and speed, providing moderately accurate results. Recent proposals, including SISH, display inefficiency and yield inconsistent outcomes, while alternatives like RetCCL prove inadequate in both accuracy and efficiency. Further research is imperative to address the dual aspects of accuracy and minimal storage requirements in histopathological image search. △ Less

Submitted 8 June, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2401.03139 [pdf]

doi 10.2139/ssrn.4861487

A Molecular Dynamics Study of Mechanical Properties of Vertically Stacked Silicene/MoS2 van der Waals Heterostructure

Authors: Bishwajit Kar, Plabon Paul, Md Arshadur Rahman, Mohammad Jane Alam Khan

Abstract: Silicene is an intriguing silicon allotrope with a honeycomb lattice structure similar to graphene with slightly buckled geometry. Molybdenum disulfide (MoS2), on the other hand, is a significant 2D transition metal dichalcogenide that has demonstrated promise in a variety of applications. Van der Waals heterostructures, which are created by stacking distinct 2D crystals on top of each other, are… ▽ More Silicene is an intriguing silicon allotrope with a honeycomb lattice structure similar to graphene with slightly buckled geometry. Molybdenum disulfide (MoS2), on the other hand, is a significant 2D transition metal dichalcogenide that has demonstrated promise in a variety of applications. Van der Waals heterostructures, which are created by stacking distinct 2D crystals on top of each other, are becoming increasingly important due to their unique optoelectronic and electromechanical properties. Using molecular dynamics simulations, the mechanical characteristics of vertically stacked Silicene/MoS2 van der Waals heterostructures are examined in this study. The response and structural stability of the heterostructures at various loading orientations and temperatures are given particular attention. The research findings highlight that the fracture strength of the Silicene/MoS2 heterostructure decreases by 40% in both armchair and zigzag orientations when the temperature is raised from 100K to 600K. Furthermore, a linear decrease in Young's modulus is observed as temperature rises. It is noteworthy that the Rule of Mixture (ROM) predictions for Young's Moduli are observed to be marginally lower than the simulation results. The analyses reveal that the silicene layer fractures first under both loading directions shows crack propagation at +-60°in the armchair and predominantly perpendicular in zigzag, followed by subsequent MoS2 layer failure. The study also shows that the MoS2 layer largely determines the elastic properties of the heterostructure, whereas the silicene layer primarily dictates the failure of the heterostructure. These findings offer an in-depth understanding of the mechanical properties of Silicene/MoS2 heterostructures, with significant implications for their use in cutting-edge nanoelectronics and nanomechanical systems. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: The 14th International Conference on Mechanical Engineering (ICME2023)

arXiv:2311.09902 [pdf, other]

Selection of Distinct Morphologies to Divide & Conquer Gigapixel Pathology Images

Authors: Abubakr Shafique, Saghir Alfasly, Areej Alsaafin, Peyman Nejat, Jibran A. Khan, H. R. Tizhoosh

Abstract: Whole slide images (WSIs) are massive digital pathology files illustrating intricate tissue structures. Selecting a small, representative subset of patches from each WSI is essential yet challenging. Therefore, following the "Divide & Conquer" approach becomes essential to facilitate WSI analysis including the classification and the WSI matching in computational pathology. To this end, we propose… ▽ More Whole slide images (WSIs) are massive digital pathology files illustrating intricate tissue structures. Selecting a small, representative subset of patches from each WSI is essential yet challenging. Therefore, following the "Divide & Conquer" approach becomes essential to facilitate WSI analysis including the classification and the WSI matching in computational pathology. To this end, we propose a novel method termed "Selection of Distinct Morphologies" (SDM) to choose a subset of WSI patches. The aim is to encompass all inherent morphological variations within a given WSI while simultaneously minimizing the number of selected patches to represent these variations, ensuring a compact yet comprehensive set of patches. This systematically curated patch set forms what we term a "montage". We assess the representativeness of the SDM montage across various public and private histopathology datasets. This is conducted by using the leave-one-out WSI search and matching evaluation method, comparing it with the state-of-the-art Yottixel's mosaic. SDM demonstrates remarkable efficacy across all datasets during its evaluation. Furthermore, SDM eliminates the necessity for empirical parameterization, a crucial aspect of Yottixel's mosaic, by inherently optimizing the selection process to capture the distinct morphological features within the WSI. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.08359 [pdf, other]

Rotation-Agnostic Image Representation Learning for Digital Pathology

Authors: Saghir Alfasly, Abubakr Shafique, Peyman Nejat, Jibran Khan, Areej Alsaafin, Ghazal Alabtah, H. R. Tizhoosh

Abstract: This paper addresses complex challenges in histopathological image analysis through three key contributions. Firstly, it introduces a fast patch selection method, FPS, for whole-slide image (WSI) analysis, significantly reducing computational cost while maintaining accuracy. Secondly, it presents PathDino, a lightweight histopathology feature extractor with a minimal configuration of five Transfor… ▽ More This paper addresses complex challenges in histopathological image analysis through three key contributions. Firstly, it introduces a fast patch selection method, FPS, for whole-slide image (WSI) analysis, significantly reducing computational cost while maintaining accuracy. Secondly, it presents PathDino, a lightweight histopathology feature extractor with a minimal configuration of five Transformer blocks and only 9 million parameters, markedly fewer than alternatives. Thirdly, it introduces a rotation-agnostic representation learning paradigm using self-supervised learning, effectively mitigating overfitting. We also show that our compact model outperforms existing state-of-the-art histopathology-specific vision transformers on 12 diverse datasets, including both internal datasets spanning four sites (breast, liver, skin, and colorectal) and seven public datasets (PANDA, CAMELYON16, BRACS, DigestPath, Kather, PanNuke, and WSSS4LUAD). Notably, even with a training dataset of 6 million histopathology patches from The Cancer Genome Atlas (TCGA), our approach demonstrates an average 8.5% improvement in patch-level majority vote performance. These contributions provide a robust framework for enhancing image analysis in digital pathology, rigorously validated through extensive evaluation. Project Page: https://kimialabmayo.github.io/PathDino-Page/ △ Less

Submitted 12 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: CVPR 2024 - 23 pages, 10 figures, and 18 tables

arXiv:2311.01669 [pdf]

Motor vehicles accidents and teenage drivers: A statistical analysis of their age and injuries

Authors: Debo Brata Paul Argha, Md Javed Imtiaze Khan

Abstract: Motorcycle accidents are a prevalent problem in Texas, resulting in hundreds of injuries and deaths each year. Motorcycles provide the driver with little physical protection during accidents compared to cars and other vehicles, so when there is a collision involving a motorcycle, the motorcyclist is likely to be injured. While there are numerous reasons for motorcycle accidents, most are caused by… ▽ More Motorcycle accidents are a prevalent problem in Texas, resulting in hundreds of injuries and deaths each year. Motorcycles provide the driver with little physical protection during accidents compared to cars and other vehicles, so when there is a collision involving a motorcycle, the motorcyclist is likely to be injured. While there are numerous reasons for motorcycle accidents, most are caused by negligence and could have been avoided. Because of the increasing popularity of motorcycles and scooter in Texas, coupled with an increase in the number of motorcycle accidents, the Texas Department of Transportation (TxDOT) has amped its efforts to improve motorcycle safety. From the data, it has been visible that teenage drivers are the most vulnerable to motorcycle accidents. In this report, we have tried to find out the probability of young driver and passenger motorcyclist's injury based on different conditions and to predict the rate of changing injury to this group in the upcoming years. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 10 pages

arXiv:2309.11510 [pdf, other]

When is a Foundation Model a Foundation Model

Authors: Saghir Alfasly, Peyman Nejat, Sobhan Hemati, Jibran Khan, Isaiah Lahr, Areej Alsaafin, Abubakr Shafique, Nneka Comfere, Dennis Murphree, Chady Meroueh, Saba Yasir, Aaron Mangold, Lisa Boardman, Vijay Shah, Joaquin J. Garcia, H. R. Tizhoosh

Abstract: Recently, several studies have reported on the fine-tuning of foundation models for image-text modeling in the field of medicine, utilizing images from online data sources such as Twitter and PubMed. Foundation models are large, deep artificial neural networks capable of learning the context of a specific domain through training on exceptionally extensive datasets. Through validation, we have obse… ▽ More Recently, several studies have reported on the fine-tuning of foundation models for image-text modeling in the field of medicine, utilizing images from online data sources such as Twitter and PubMed. Foundation models are large, deep artificial neural networks capable of learning the context of a specific domain through training on exceptionally extensive datasets. Through validation, we have observed that the representations generated by such models exhibit inferior performance in retrieval tasks within digital pathology when compared to those generated by significantly smaller, conventional deep networks. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.07284 [pdf]

Toward Lossless Homomorphic Encryption for Scientific Computation

Authors: Muhammad Jahanzeb Khan, Bo Fang, Dongfang Zhao

Abstract: This paper presents a comprehensive investigation into encrypted computations using the CKKS (Cheon-Kim-Kim-Song) scheme, with a focus on multi-dimensional vector operations and real-world applications. Through two meticulously designed experiments, the study explores the potential of the CKKS scheme in Super Computing and its implications for data privacy and computational efficiency. The first e… ▽ More This paper presents a comprehensive investigation into encrypted computations using the CKKS (Cheon-Kim-Kim-Song) scheme, with a focus on multi-dimensional vector operations and real-world applications. Through two meticulously designed experiments, the study explores the potential of the CKKS scheme in Super Computing and its implications for data privacy and computational efficiency. The first experiment reveals the promising applicability of CKKS to matrix multiplication, indicating marginal differences in Euclidean distance and near-to-zero mean square error across various matrix sizes. The second experiment, applied to a wildfire dataset, illustrates the feasibility of using encrypted machine learning models without significant loss in accuracy. The insights gleaned from the research set a robust foundation for future innovations, including the potential for GPU acceleration in CKKS computations within TenSEAL. Challenges such as noise budget computation, accuracy loss in multiplication, and the distinct characteristics of arithmetic operations in the context of CKKS are also discussed. The paper serves as a vital step towards understanding the complexities and potentials of encrypted computations, with broad implications for secure data processing and privacy preservation in various scientific domains. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.02795 [pdf]

ZePoP: A Distributed Leader Election Protocol using the Delay-based Closeness Centrality for Peer-to-Peer Applications

Authors: Md Amjad Hossain, Javed I. Khan

Abstract: This paper presents ZePoP, a leader election protocol for distributed systems, optimizing a delay-based closeness centrality. We design the protocol specifically for the Peer to Peer(P2P) applications, where the leader peer (node) is responsible for collecting, processing, and redistributing data or control signals satisfying some timing constraints. The protocol elects an optimal leader node in t… ▽ More This paper presents ZePoP, a leader election protocol for distributed systems, optimizing a delay-based closeness centrality. We design the protocol specifically for the Peer to Peer(P2P) applications, where the leader peer (node) is responsible for collecting, processing, and redistributing data or control signals satisfying some timing constraints. The protocol elects an optimal leader node in the dynamically changing network and constructs a Data Collection and Distribution Tree (DCDT) rooted at the leader node. The elected optimal leader is closest to all nodes in the system compared to other nodes. We validate the proposed protocol through theoretical proofs as well as experimental results. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2305.15551 [pdf, other]

doi 10.32473/flairs.36.133315

Malicious or Benign? Towards Effective Content Moderation for Children's Videos

Authors: Syed Hammad Ahmed, Muhammad Junaid Khan, H. M. Umer Qaisar, Gita Sukthankar

Abstract: Online video platforms receive hundreds of hours of uploads every minute, making manual content moderation impossible. Unfortunately, the most vulnerable consumers of malicious video content are children from ages 1-5 whose attention is easily captured by bursts of color and sound. Scammers attempting to monetize their content may craft malicious children's videos that are superficially similar to… ▽ More Online video platforms receive hundreds of hours of uploads every minute, making manual content moderation impossible. Unfortunately, the most vulnerable consumers of malicious video content are children from ages 1-5 whose attention is easily captured by bursts of color and sound. Scammers attempting to monetize their content may craft malicious children's videos that are superficially similar to educational videos, but include scary and disgusting characters, violent motions, loud music, and disturbing noises. Prominent video hosting platforms like YouTube have taken measures to mitigate malicious content on their platform, but these videos often go undetected by current content moderation tools that are focused on removing pornographic or copyrighted content. This paper introduces our toolkit Malicious or Benign for promoting research on automated content moderation of children's videos. We present 1) a customizable annotation tool for videos, 2) a new dataset with difficult to detect test cases of malicious content and 3) a benchmark suite of state-of-the-art video classification models. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures, The 36th International FLAIRS Conference

Journal ref: The International FLAIRS Conference Proceedings. 36, 1 (May 2023)

arXiv:2305.01315 [pdf, other]

Insights into Software Development Approaches: Mining Q&A Repositories

Authors: Arif Ali Khan, Javed Ali Khan, Muhammad Azeem Akbar, Peng Zhou, Mahdi Fahmideh

Abstract: Context: Software practitioners adopt approaches like DevOps, Scrum, and Waterfall for high-quality software development. However, limited research has been conducted on exploring software development approaches concerning practitioners discussions on Q&A forums. Objective: We conducted an empirical study to analyze developers discussions on Q&A forums to gain insights into software development ap… ▽ More Context: Software practitioners adopt approaches like DevOps, Scrum, and Waterfall for high-quality software development. However, limited research has been conducted on exploring software development approaches concerning practitioners discussions on Q&A forums. Objective: We conducted an empirical study to analyze developers discussions on Q&A forums to gain insights into software development approaches in practice. Method: We analyzed 13,903 developers posts across Stack Overflow (SO), Software Engineering Stack Exchange (SESE), and Project Management Stack Exchange (PMSE) forums. A mixed method approach, consisting of the topic modeling technique (i.e., Latent Dirichlet Allocation (LDA)) and qualitative analysis, is used to identify frequently discussed topics of software development approaches, trends (popular, difficult topics), and the challenges faced by practitioners in adopting different software development approaches. Findings: We identified 15 frequently mentioned software development approaches topics on Q&A sites and observed an increase in trends for the top-3 most difficult topics requiring more attention. Finally, our study identified 49 challenges faced by practitioners while deploying various software development approaches, and we subsequently created a thematic map to represent these findings. Conclusions: The study findings serve as a useful resource for practitioners to overcome challenges, stay informed about current trends, and ultimately improve the quality of software products they develop. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2303.14542 [pdf, other]

Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation

Authors: Junaed Younus Khan, Gias Uddin

Abstract: Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code example and it is one of the common documentation issues as found by several studies. Hence in this paper, we consider automatic code example generation for docu… ▽ More Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code example and it is one of the common documentation issues as found by several studies. Hence in this paper, we consider automatic code example generation for documentation, a direction less explored by the existing research. We employ Codex, a GPT-3 based model, pre-trained on both natural and programming languages to generate code examples from source code and documentation given as input. Our preliminary investigation on 40 scikit-learn methods reveals that this approach is able to generate good code examples where 72.5% code examples were executed without error (passability) and 82.5% properly dealt with the target method and documentation (relevance). We also find that incorporation of error logs (produced by the compiler while executing a failed code example) in the input further improves the passability from 72.5% to 87.5%. Thus, our investigation sets the base of documentation-specific code example generation and warrants in-depth future studies. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: Accepted in 30th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2023) - ERA

arXiv:2303.08371 [pdf, other]

Comparative Evaluation of Data Decoupling Techniques for Federated Machine Learning with Database as a Service

Authors: Muhammad Jahanzeb Khan, Rui Hu, Mohammad Sadoghi, Dongfang Zhao

Abstract: Federated Learning (FL) is a machine learning approach that allows multiple clients to collaboratively learn a shared model without sharing raw data. However, current FL systems provide an all-in-one solution, which can hinder the wide adoption of FL in certain domains such as scientific applications. To overcome this limitation, this paper proposes a decoupling approach that enables clients to cu… ▽ More Federated Learning (FL) is a machine learning approach that allows multiple clients to collaboratively learn a shared model without sharing raw data. However, current FL systems provide an all-in-one solution, which can hinder the wide adoption of FL in certain domains such as scientific applications. To overcome this limitation, this paper proposes a decoupling approach that enables clients to customize FL applications with specific data subsystems. To evaluate this approach, the authors develop a framework called Data-Decoupling Federated Learning (DDFL) and compare it with state-of-the-art FL systems that tightly couple data management and computation. Extensive experiments on various datasets and data management subsystems show that DDFL achieves comparable or better performance in terms of training time, inference accuracy, and database query time. Moreover, DDFL provides clients with more options to tune their FL applications regarding data-related metrics. The authors also provide a detailed qualitative analysis of DDFL when integrated with mainstream database systems. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 14 pages, 15 figures, 3 tables

ACM Class: F.2.2, I.2.7

arXiv:2303.08362 [pdf]

Transfer Learning Based Diagnosis and Analysis of Lung Sound Aberrations

Authors: Hafsa Gulzar, Jiyun Li, Arslan Manzoor, Sadaf Rehmat, Usman Amjad, Hadiqa Jalil Khan

Abstract: With the development of computer -systems that can collect and analyze enormous volumes of data, the medical profession is establishing several non-invasive tools. This work attempts to develop a non-invasive technique for identifying respiratory sounds acquired by a stethoscope and voice recording software via machine learning techniques. This study suggests a trained and proven CNN-based approac… ▽ More With the development of computer -systems that can collect and analyze enormous volumes of data, the medical profession is establishing several non-invasive tools. This work attempts to develop a non-invasive technique for identifying respiratory sounds acquired by a stethoscope and voice recording software via machine learning techniques. This study suggests a trained and proven CNN-based approach for categorizing respiratory sounds. A visual representation of each audio sample is constructed, allowing resource identification for classification using methods like those used to effectively describe visuals. We used a technique called Mel Frequency Cepstral Coefficients (MFCCs). Here, features are retrieved and categorized via VGG16 (transfer learning) and prediction is accomplished using 5-fold cross-validation. Employing various data splitting techniques, Respiratory Sound Database obtained cutting-edge results, including accuracy of 95%, precision of 88%, recall score of 86%, and F1 score of 81%. The ICBHI dataset is used to train and test the model. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 12 pages, 9 figures

arXiv:2303.01197 [pdf, other]

Document Provenance and Authentication through Authorship Classification

Authors: Muhammad Tayyab Zamir, Muhammad Asif Ayub, Jebran Khan, Muhammad Jawad Ikram, Nasir Ahmad, Kashif Ahmad

Abstract: Style analysis, which is relatively a less explored topic, enables several interesting applications. For instance, it allows authors to adjust their writing style to produce a more coherent document in collaboration. Similarly, style analysis can also be used for document provenance and authentication as a primary step. In this paper, we propose an ensemble-based text-processing framework for the… ▽ More Style analysis, which is relatively a less explored topic, enables several interesting applications. For instance, it allows authors to adjust their writing style to produce a more coherent document in collaboration. Similarly, style analysis can also be used for document provenance and authentication as a primary step. In this paper, we propose an ensemble-based text-processing framework for the classification of single and multi-authored documents, which is one of the key tasks in style analysis. The proposed framework incorporates several state-of-the-art text classification algorithms including classical Machine Learning (ML) algorithms, transformers, and deep learning algorithms both individually and in merit-based late fusion. For the merit-based late fusion, we employed several weight optimization and selection methods to assign merit-based weights to the individual text classification algorithms. We also analyze the impact of the characters on the task that are usually excluded in NLP applications during pre-processing by conducting experiments on both clean and un-clean data. The proposed framework is evaluated on a large-scale benchmark dataset, significantly improving performance over the existing solutions. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 7 pages; 3 tables; 1 figure

Journal ref: IEEE ICAISC 2023

arXiv:2301.00321 [pdf, other]

Floods Relevancy and Identification of Location from Twitter Posts using NLP Techniques

Authors: Muhammad Suleman, Muhammad Asif, Tayyab Zamir, Ayaz Mehmood, Jebran Khan, Nasir Ahmad, Kashif Ahmad

Abstract: This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of l… ▽ More This paper presents our solutions for the MediaEval 2022 task on DisasterMM. The task is composed of two subtasks, namely (i) Relevance Classification of Twitter Posts (RCTP), and (ii) Location Extraction from Twitter Texts (LETT). The RCTP subtask aims at differentiating flood-related and non-relevant social posts while LETT is a Named Entity Recognition (NER) task and aims at the extraction of location information from the text. For RCTP, we proposed four different solutions based on BERT, RoBERTa, Distil BERT, and ALBERT obtaining an F1-score of 0.7934, 0.7970, 0.7613, and 0.7924, respectively. For LETT, we used three models namely BERT, RoBERTa, and Distil BERTA obtaining an F1-score of 0.6256, 0.6744, and 0.6723, respectively. △ Less

Submitted 31 December, 2022; originally announced January 2023.

Comments: 5 pages, 1 figure, and 4 tables

arXiv:2211.11916 [pdf, other]

Preprint: Open Source Compiling for V1Model RMT Switch: Making Data Center Networking Innovation Accessible

Authors: Debobroto Das Robin, Javed I. Khan

Abstract: Very few of the innovations in deep networking have seen data center scale implementation. Because the Data Center network's extreme scale performance requires hardware implementation, which is only accessible to a few. However, the emergence of reconfigurable match-action table (RMT) paradigm-based switches have finally opened up the development life cycle of data plane devices. The P4 language i… ▽ More Very few of the innovations in deep networking have seen data center scale implementation. Because the Data Center network's extreme scale performance requires hardware implementation, which is only accessible to a few. However, the emergence of reconfigurable match-action table (RMT) paradigm-based switches have finally opened up the development life cycle of data plane devices. The P4 language is the dominant language choice for programming these devices. Now, Network operators can implement the desired feature over white box RMT switches. The process involves an innovator writing new algorithms in the P4 language and getting them compiled for the target hardware. However, there is still a roadblock. After designing an algorithm, the P4 program's compilation technology is not fully open-source. Thus, it is very difficult for an average researcher to get deep insight into the performance of his/her innovation when executed at the silicon level. There is no open-source compiler backend available for this purpose. Proprietary compiler backends provided by different hardware vendors are available for this purpose. However, they are closed-source and do not provide access to the internal mapping mechanisms. Which inhibits experimenting with new mapping algorithms and innovative instruction sets for reconfigurable match-action table architecture. This paper describes our work toward an open-source compiler backend for compiling P416 targeted for the V1Model architecture-based programmable switches. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.12892

arXiv:2209.02235 [pdf, other]

Automatic Code Documentation Generation Using GPT-3

Authors: Junaed Younus Khan, Gias Uddin

Abstract: Source code documentation is an important artifact for efficient software development. Code documentation could greatly benefit from automation since manual documentation is often labouring, resource and time-intensive. In this paper, we employed Codex for automatic code documentation creation. Codex is a GPT-3 based model pre-trained on both natural and programming languages. We find that Codex o… ▽ More Source code documentation is an important artifact for efficient software development. Code documentation could greatly benefit from automation since manual documentation is often labouring, resource and time-intensive. In this paper, we employed Codex for automatic code documentation creation. Codex is a GPT-3 based model pre-trained on both natural and programming languages. We find that Codex outperforms existing techniques even with basic settings like one-shot learning (i.e., providing only one example for training). Codex achieves an overall BLEU score of 20.6 for six different programming languages (11.2% improvement over earlier state-of-the-art techniques). Thus, Codex shows promise and warrants in-depth future studies for automatic code documentation generation to support diverse development tasks. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: Accepted in IEEE/ACM International Conference on Automated Software Engineering (ASE 2022) - NIER

arXiv:2208.12892 [pdf, other]

An Open-Source P416 Compiler Backend for Reconfigurable Match-Action Table Switches

Authors: Debobroto Das Robin, Javed I. Khan

Abstract: The P4 language has become the dominant choice for programming the reconfigurable match-action table based programmable switches. V1Model architecture is the most widely available realization of this paradigm. The open-source compiler frontend developed by the P4 consortium can execute syntax analysis and derive a hardware-independent representation of a program written using the latest version of… ▽ More The P4 language has become the dominant choice for programming the reconfigurable match-action table based programmable switches. V1Model architecture is the most widely available realization of this paradigm. The open-source compiler frontend developed by the P4 consortium can execute syntax analysis and derive a hardware-independent representation of a program written using the latest version of P4 (also known as P416 ). A compiler backend is required to map this intermediate representation to the hardware resources of a V1Model switch. However, there is no open-source compiler backend available to check the realizability of a P416 program over a V1Model switch. Proprietary tools provided by different hardware vendors are available for this purpose. However, they are closed source and do not provide access to the internal mapping mechanisms. Which inhibits experimenting with new mapping algorithms and innovative instruction sets for reconfigurable match-action table architecture. Moreover, the proprietary compiler backends are costly and come with various non-disclosure agreements. These factors pose serious challenges to programmable switch-related research. In this work, we present an open-source P416 compiler backend for the V1Model architecture-based programmable switches. It uses heuristic-based mapping algorithms to map a P416 program over the hardware resources of a V1Model switch. It allows developers to rapidly prototype different mapping algorithms. It also gives various resource usage statistics of a P416 program, enabling comparison among multiple P416 schemes. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2208.07298 [pdf, other]

Transformer-based Value Function Decomposition for Cooperative Multi-agent Reinforcement Learning in StarCraft

Authors: Muhammad Junaid Khan, Syed Hammad Ahmed, Gita Sukthankar

Abstract: The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for cooperative multi-agent reinforcement learning (MARL). SMAC focuses exclusively on the problem of StarCraft micromanagement and assumes that each unit is controlled individually by a learning agent that acts independently and only possesses local information; centralized training is assumed to occur… ▽ More The StarCraft II Multi-Agent Challenge (SMAC) was created to be a challenging benchmark problem for cooperative multi-agent reinforcement learning (MARL). SMAC focuses exclusively on the problem of StarCraft micromanagement and assumes that each unit is controlled individually by a learning agent that acts independently and only possesses local information; centralized training is assumed to occur with decentralized execution (CTDE). To perform well in SMAC, MARL algorithms must handle the dual problems of multi-agent credit assignment and joint action evaluation. This paper introduces a new architecture TransMix, a transformer-based joint action-value mixing network which we show to be efficient and scalable as compared to the other state-of-the-art cooperative MARL solutions. TransMix leverages the ability of transformers to learn a richer mixing function for combining the agents' individual value functions. It achieves comparable performance to previous work on easy SMAC scenarios and outperforms other techniques on hard scenarios, as well as scenarios that are corrupted with Gaussian noise to simulate fog of war. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: AIIDE 2022

arXiv:2207.09378 [pdf, other]

P4TE: PISA Switch Based Traffic Engineering in Fat-Tree Data Center Networks

Authors: Debobroto Das Robin, Javed I. Khan

Abstract: This work presents P4TE, an in-band traffic monitoring, load-aware packet forwarding, and flow rate controlling mechanism for traffic engineering in fat-tree topology-based data center networks using PISA switches. It achieves sub-RTT reaction time to change in network conditions, improved flow completion time, and balanced link utilization. Unlike the classical probe-based monitoring approach, P4… ▽ More This work presents P4TE, an in-band traffic monitoring, load-aware packet forwarding, and flow rate controlling mechanism for traffic engineering in fat-tree topology-based data center networks using PISA switches. It achieves sub-RTT reaction time to change in network conditions, improved flow completion time, and balanced link utilization. Unlike the classical probe-based monitoring approach, P4TE uses an in-band monitoring approach to identify traffic events in the data plane. Based on these events, it re-adjusts the priorities of the paths. It uses a heuristic-based load-aware forwarding path selection mechanism to respond to changing network conditions and control the flow rate by sending feedback to the end hosts. It is implementable on emerging v1model.p4 architecture-based programmable switches and capable of maintaining the line-rate performance. Our evaluation shows that P4TE uses a small amount of resources in the PISA pipeline and achieves an improved flow completion time than ECMP and HULA. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Journal ref: Elsevier Computer Networks 2022

arXiv:2204.13574 [pdf, other]

An Explainable Regression Framework for Predicting Remaining Useful Life of Machines

Authors: Talhat Khan, Kashif Ahmad, Jebran Khan, Imran Khan, Nasir Ahmad

Abstract: Prediction of a machine's Remaining Useful Life (RUL) is one of the key tasks in predictive maintenance. The task is treated as a regression problem where Machine Learning (ML) algorithms are used to predict the RUL of machine components. These ML algorithms are generally used as a black box with a total focus on the performance without identifying the potential causes behind the algorithms' decis… ▽ More Prediction of a machine's Remaining Useful Life (RUL) is one of the key tasks in predictive maintenance. The task is treated as a regression problem where Machine Learning (ML) algorithms are used to predict the RUL of machine components. These ML algorithms are generally used as a black box with a total focus on the performance without identifying the potential causes behind the algorithms' decisions and their working mechanism. We believe, the performance (in terms of Mean Squared Error (MSE), etc.,) alone is not enough to build the trust of the stakeholders in ML prediction rather more insights on the causes behind the predictions are needed. To this aim, in this paper, we explore the potential of Explainable AI (XAI) techniques by proposing an explainable regression framework for the prediction of machines' RUL. We also evaluate several ML algorithms including classical and Neural Networks (NNs) based solutions for the task. For the explanations, we rely on two model agnostic XAI methods namely Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP). We believe, this work will provide a baseline for future research in the domain. △ Less

Submitted 30 April, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: 6 pages, 3 figures

arXiv:2202.04462 [pdf]

Social Media as an Instant Source of Feedback on Water Quality

Authors: Khubaib Ahmad, Muhammad Asif Ayub, Kashif Ahmad, Jebran Khan, Nasir Ahmad, Ala Al-Fuqaha

Abstract: This paper focuses on an important environmental challenge; namely, water quality by analyzing the potential of social media as an immediate source of feedback. The main goal of the work is to automatically analyze and retrieve social media posts relevant to water quality with particular attention to posts describing different aspects of water quality, such as watercolor, smell, taste, and related… ▽ More This paper focuses on an important environmental challenge; namely, water quality by analyzing the potential of social media as an immediate source of feedback. The main goal of the work is to automatically analyze and retrieve social media posts relevant to water quality with particular attention to posts describing different aspects of water quality, such as watercolor, smell, taste, and related illnesses. To this aim, we propose a novel framework incorporating different preprocessing, data augmentation, and classification techniques. In total, three different Neural Networks (NNs) architectures, namely (i) Bidirectional Encoder Representations from Transformers (BERT), (ii) Robustly Optimized BERT Pre-training Approach (XLM-RoBERTa), and (iii) custom Long short-term memory (LSTM) model, are employed in a merit-based fusion scheme. For merit-based weight assignment to the models, several optimization and search techniques are compared including a Particle Swarm Optimization (PSO), a Genetic Algorithm (GA), Brute Force (BF), Nelder-Mead, and Powell's optimization methods. We also provide an evaluation of the individual models where the highest F1-score of 0.81 is obtained with the BERT model. In merit-based fusion, overall better results are obtained with BF achieving an F1-score score of 0.852. We also provide comparison against existing methods, where a significant improvement for our proposed solutions is obtained. We believe such rigorous analysis of this relatively new topic will provide a baseline for future research. △ Less

Submitted 27 July, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: 10 pages, 2 figures, 8 tables

arXiv:2201.11626 [pdf, ps, other]

doi 10.1002/aenm.202102363

Chemical design rules for non-fullerene acceptors in organic solar cells

Authors: A. Markina, K. -H. Lin, W. Liu, C. Poelking, Y. Firdaus, D. R. Villalva, J. I. Khan, S. H. K. Paleti, G. T. Harrison, J. Gorenflot, W. Zhang, S. De Wolf, I. McCulloch, T. D. Anthopoulos, D. Baran, F. Laquai, D. Andrienko

Abstract: Efficiencies of organic solar cells have practically doubled since the development of non-fullerene acceptors (NFAs). However, generic chemical design rules for donor-NFA combinations are still needed. Such rules are proposed by analyzing inhomogeneous electrostatic fields at the donor-acceptor interface. It is shown that an acceptor-donor-acceptor molecular architecture, and molecular alignment p… ▽ More Efficiencies of organic solar cells have practically doubled since the development of non-fullerene acceptors (NFAs). However, generic chemical design rules for donor-NFA combinations are still needed. Such rules are proposed by analyzing inhomogeneous electrostatic fields at the donor-acceptor interface. It is shown that an acceptor-donor-acceptor molecular architecture, and molecular alignment parallel to the interface, results in energy level bending that destabilizes the charge transfer state, thus promoting its dissociation into free charges. By analyzing a series of PCE10:NFA solar cells, with NFAs including Y6, IEICO, and ITIC, as well as their halogenated derivatives, it is suggested that the molecular quadrupole moment of ca 75 Debye A balances the losses in the open circuit voltage and gains in charge generation efficiency. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Journal ref: Adv. Energy Mater., 2102363, 2-11, 2021

arXiv:2201.11327 [pdf, other]

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

Authors: chengran Yang, Bowen Xu, Junaed younus Khan, Gias Uddin, Donggyun Han, Zhou Yang, David Lo

Abstract: APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews… ▽ More APIs (Application Programming Interfaces) are reusable software libraries and are building blocks for modern rapid software development. Previous research shows that programmers frequently share and search for reviews of APIs on the mainstream software question and answer (Q&A) platforms like Stack Overflow, which motivates researchers to design tasks and approaches related to process API reviews automatically. Among these tasks, classifying API reviews into different aspects (e.g., performance or security), which is called the aspect-based API review classification, is of great importance. The current state-of-the-art (SOTA) solution to this task is based on the traditional machine learning algorithm. Inspired by the great success achieved by pre-trained models on many software engineering tasks, this study fine-tunes six pre-trained models for the aspect-based API review classification task and compares them with the current SOTA solution on an API review benchmark collected by Uddin et al. The investigated models include four models (BERT, RoBERTa, ALBERT and XLNet) that are pre-trained on natural languages, BERTOverflow that is pre-trained on text corpus extracted from posts on Stack Overflow, and CosSensBERT that is designed for handling imbalanced data. The results show that all the six fine-tuned models outperform the traditional machine learning-based tool. More specifically, the improvement on the F1-score ranges from 21.0% to 30.2%. We also find that BERTOverflow, a model pre-trained on the corpus from Stack Overflow, does not show better performance than BERT. The result also suggests that CosSensBERT also does not exhibit better performance than BERT in terms of F1, but it is still worthy of being considered as it achieves better performance on MCC and AUC. △ Less

Submitted 27 January, 2022; originally announced January 2022.

Comments: Accepted by Research Track in SANER 2022

arXiv:2201.04241 [pdf, other]

Automatic Detection and Analysis of Technical Debts in Peer-Review Documentation of R Packages

Authors: Junaed Younus Khan, Gias Uddin

Abstract: Technical debt (TD) is a metaphor for code-related problems that arise as a result of prioritizing speedy delivery over perfect code. Given that the reduction of TDs can have long-term positive impact in the software engineering life-cycle (SDLC), TDs are studied extensively in the literature. However, very few of the existing research focused on the technical debts of R programming language despi… ▽ More Technical debt (TD) is a metaphor for code-related problems that arise as a result of prioritizing speedy delivery over perfect code. Given that the reduction of TDs can have long-term positive impact in the software engineering life-cycle (SDLC), TDs are studied extensively in the literature. However, very few of the existing research focused on the technical debts of R programming language despite its popularity and usage. Recent research by Codabux et al. [21] finds that R packages can have 10 diverse TD types analyzing peer-review documentation. However, the findings are based on the manual analysis of a small sample of R package review comments. In this paper, we develop a suite of Machine Learning (ML) classifiers to detect the 10 TDs automatically. The best performing classifier is based on the deep ML model BERT, which achieves F1-scores of 0.71 - 0.91. We then apply the trained BERT models on all available peer-review issue comments from two platforms, rOpenSci and BioConductor (13.5K review comments coming from a total of 1297 R packages). We conduct an empirical study on the prevalence and evolution of 10 TDs in the two R platforms. We discovered documentation debt is the most prevalent among all types of TD, and it is also expanding rapidly. We also find that R packages of generic platform (i.e. rOpenSci) are more prone to TD compared to domain-specific platform (i.e. BioConductor). Our empirical study findings can guide future improvements opportunities in R package documentation. Our ML models can be used to automatically monitor the prevalence and evolution of TDs in R package documentation. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: Accepted in SANER 2022

arXiv:2112.07913 [pdf, other]

A Comparative Analysis of Machine Learning Approaches for Automated Face Mask Detection During COVID-19

Authors: Junaed Younus Khan, Md Abdullah Al Alamin

Abstract: The World Health Organization (WHO) has recommended wearing face masks as one of the most effective measures to prevent COVID-19 transmission. In many countries, it is now mandatory to wear face masks, specially in public places. Since manual monitoring of face masks is often infeasible in the middle of the crowd, automatic detection can be beneficial. To facilitate that, we explored a number of d… ▽ More The World Health Organization (WHO) has recommended wearing face masks as one of the most effective measures to prevent COVID-19 transmission. In many countries, it is now mandatory to wear face masks, specially in public places. Since manual monitoring of face masks is often infeasible in the middle of the crowd, automatic detection can be beneficial. To facilitate that, we explored a number of deep learning models (i.e., VGG1, VGG19, ResNet50) for face-mask detection and evaluated them on two benchmark datasets. We also evaluated transfer learning (i.e., VGG19, ResNet50 pre-trained on ImageNet) in this context. We find that while the performances of all the models are quite good, transfer learning models achieve the best performance. Transfer learning improves the performance by 0.10\%--0.40\% with 30\% less training time. Our experiment also shows these high-performing models are not quite robust for real-world cases where the test dataset comes from a different distribution. Without any fine-tuning, the performance of these models drops by 47\% in cross-domain settings. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2112.06245 [pdf]

doi 10.1002/aenm.202203464

Rationalizing the influence of tunable energy levels on quantum efficiency to design optimal non-fullerene acceptor-based ternary organic solar cells

Authors: Safakath Karuthedath, Sri H. K . Paleti, Anirudh Sharma, Hang Yin, Catherine S. P. De Castro, Si Chen, Han Xi, Nisreen Alshehri, Nicolas Ramos, Jafar I. Khan, Jaime Martin, Gang Li, Frédéric Laquai, Derya Baran, Julien Gorenflot

Abstract: Non-fullerene acceptor (NFA)-based ternary bulk heterojunction solar cells (TSC) are the most efficient organic solar cells (OSCs) today due to their broader absorption and quantum efficiencies (QE) often surpassing those of corresponding binary blends. We study how the energetics driving charge transfer at the electron donor:electron acceptor (D/A) interfaces impact the QE in blends of PBDB-T-2F… ▽ More Non-fullerene acceptor (NFA)-based ternary bulk heterojunction solar cells (TSC) are the most efficient organic solar cells (OSCs) today due to their broader absorption and quantum efficiencies (QE) often surpassing those of corresponding binary blends. We study how the energetics driving charge transfer at the electron donor:electron acceptor (D/A) interfaces impact the QE in blends of PBDB-T-2F donor with several pairs of lower bandgap NFAs. As in binary blends, the ionization energy offset between donor and acceptor (ΔIE) controls the QE and maximizes for ΔIE > 0.5 eV. However, ΔIE is not controlled by the individual NFAs IEs but by their average, weighted for their blending ratio. Using this property, we improved the QE of a PBDB-T-2F:IEICO binary blend that had an insufficient ΔIE for charge generation by adding a deep IE third component: IT-4F. Combining two NFAs enables to optimize the D/A energy alignment and cells' QE without molecular engineering. △ Less

Submitted 12 February, 2023; v1 submitted 12 December, 2021; originally announced December 2021.

Comments: S Karuthedath and S H K Paleti contributed equally. MS: 35 pages, 9 figures. SI: 21 pages, 23 figures - updates: added a model scheme as Fig1, updated T1 EQE and blends PL in Fig 2 (former 1). Added Figure (5): charge generation upon acceptor excitation. Corrected ΔIEs in fig7 and 9 (a few tens of meV off). Added more blends to show generality (2 groups, 10 blends compositions)

Journal ref: Adv. Energy Mater. 2023, 2203464

arXiv:2110.05343 [pdf, other]

Leveraging Transformers for StarCraft Macromanagement Prediction

Authors: Muhammad Junaid Khan, Shah Hassan, Gita Sukthankar

Abstract: Inspired by the recent success of transformers in natural language processing and computer vision applications, we introduce a transformer-based neural architecture for two key StarCraft II (SC2) macromanagement tasks: global state and build order prediction. Unlike recurrent neural networks which suffer from a recency bias, transformers are able to capture patterns across very long time horizons,… ▽ More Inspired by the recent success of transformers in natural language processing and computer vision applications, we introduce a transformer-based neural architecture for two key StarCraft II (SC2) macromanagement tasks: global state and build order prediction. Unlike recurrent neural networks which suffer from a recency bias, transformers are able to capture patterns across very long time horizons, making them well suited for full game analysis. Our model utilizes the MSC (Macromanagement in StarCraft II) dataset and improves on the top performing gated recurrent unit (GRU) architecture in predicting global state and build order as measured by mean accuracy over multiple time horizons. We present ablation studies on our proposed architecture that support our design decisions. One key advantage of transformers is their ability to generalize well, and we demonstrate that our model achieves an even better accuracy when used in a transfer learning setting in which models trained on games with one racial matchup (e.g., Terran vs. Protoss) are transferred to a different one. We believe that transformers' ability to model long games, potential for parallelization, and generalization performance make them an excellent choice for StarCraft agents. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 6 pages, 2 figures, IEEE ICMLA 2021

arXiv:2108.05085 [pdf, other]

Researcher or Crowd Member? Why not both! The Open Research Knowledge Graph for Applying and Communicating CrowdRE Research

Authors: Oliver Karras, Eduard C. Groen, Javed Ali Khan, Sören Auer

Abstract: In recent decades, there has been a major shift towards improved digital access to scholarly works. However, even now that these works are available in digital form, they remain document-based, making it difficult to communicate the knowledge they contain. The next logical step is to extend these works with more flexible, fine-grained, semantic, and context-sensitive representations of scholarly k… ▽ More In recent decades, there has been a major shift towards improved digital access to scholarly works. However, even now that these works are available in digital form, they remain document-based, making it difficult to communicate the knowledge they contain. The next logical step is to extend these works with more flexible, fine-grained, semantic, and context-sensitive representations of scholarly knowledge. The Open Research Knowledge Graph (ORKG) is a platform that structures and interlinks scholarly knowledge, relying on crowdsourced contributions from researchers (as a crowd) to acquire, curate, publish, and process this knowledge. In this experience report, we consider the ORKG in the context of Crowd-based Requirements Engineering (CrowdRE) from two perspectives: (1) As CrowdRE researchers, we investigate how the ORKG practically applies CrowdRE techniques to involve scholars in its development to make it align better with their academic work. We determined that the ORKG readily provides social and financial incentives, feedback elicitation channels, and support for context and usage monitoring, but that there is improvement potential regarding automated user feedback analyses and a holistic CrowdRE approach. (2) As crowd members, we explore how the ORKG can be used to communicate scholarly knowledge about CrowdRE research. For this purpose, we curated qualitative and quantitative scholarly knowledge in the ORKG based on papers contained in two previously published systematic literature reviews (SLRs) on CrowdRE. This knowledge can be explored and compared interactively, and with more data than what the SLRs originally contained. Therefore, the ORKG improves access and communication of the scholarly knowledge about CrowdRE research. For both perspectives, we found the ORKG to be a useful multi-tool for CrowdRE research. △ Less

Submitted 11 August, 2021; originally announced August 2021.

Comments: Accepted for publication at 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW)

arXiv:2107.05044 [pdf, other]

doi 10.3390/galaxies11020057

The simplest parametrization of equation of state parameter in the scalar field Universe

Authors: Preeti Shrivastava, A. J. Khan, G. K. Goswami, Anil Kumar Yadav, J. K. Singh

Abstract: In this paper, we have investigated a scalar field cosmological model of accelerating Universe with the simplest parametrization of equation of state parameter of the scalar field. We used $H(z)$ data, pantheon compilation of SN Ia data and BAO data to constrained the model parameters using $χ^{2}$ minimization technique. We obtain the present values of Hubble constant $H_{0}$ as… ▽ More In this paper, we have investigated a scalar field cosmological model of accelerating Universe with the simplest parametrization of equation of state parameter of the scalar field. We used $H(z)$ data, pantheon compilation of SN Ia data and BAO data to constrained the model parameters using $χ^{2}$ minimization technique. We obtain the present values of Hubble constant $H_{0}$ as $66.2^{+1.42}_{-1.34}$, $70.7^{+0.32}_{-0.31}$ and $67.74^{+1.24}_{-1.04}$ for $H(z)$, $H(z)$ + Pantheon and $H(z)$ + BAO respectively. Also, we have estimated the present age of the Universe in derived model $t_{0} = 14.38^{+0.63}_{-0.64}$ for joint $H(z)$ and pantheon compilation of SN Ia data which has only $0.88~σ$ tension with its empirical value obtained in Plank collaboration \cite{Ade/2016}. Moreover, the present values of the deceleration parameter $q_{0}$ come out to be $-0.55^{+0.031}_{-0.038}$, $-0.61^{+0.030}_{-0.021}$ and $-0.627^{+0.022}_{-0.025}$ by bounding the Universe in derived model with $H(z)$, $H(z)$ + Pantheon compilation of SN Ia and $H(z)$ + BAO data sets respectively. We also have performed the state-finder diagnostics to discover the nature of dark energy. △ Less

Submitted 11 July, 2021; originally announced July 2021.

Comments: 8 Pages, 12 Figures

Journal ref: Galaxies 11, 57 (2023)

arXiv:2107.01202 [pdf]

Language Identification of Hindi-English tweets using code-mixed BERT

Authors: Mohd Zeeshan Ansari, M M Sufyan Beg, Tanvir Ahmad, Mohd Jazib Khan, Ghazali Wasim

Abstract: Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, th… ▽ More Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, the pretrained language models are even more beneficial for learning common language representations. Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper. The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2106.07307 [pdf]

A Recipe for Social Media Analysis

Authors: Shahid Alam, Juvariya Khan

Abstract: The Ubiquitous nature of smartphones has significantly increased the use of social media platforms, such as Facebook, Twitter, TikTok, and LinkedIn, etc., among the public, government, and businesses. Facebook generated ~70 billion USD in 2019 in advertisement revenues alone, a ~27% increase from the previous year. Social media has also played a strong role in outbreaks of social protests responsi… ▽ More The Ubiquitous nature of smartphones has significantly increased the use of social media platforms, such as Facebook, Twitter, TikTok, and LinkedIn, etc., among the public, government, and businesses. Facebook generated ~70 billion USD in 2019 in advertisement revenues alone, a ~27% increase from the previous year. Social media has also played a strong role in outbreaks of social protests responsible for political changes in different countries. As we can see from the above examples, social media plays a big role in business intelligence and international politics. In this paper, we present and discuss a high-level functional intelligence model (recipe) of Social Media Analysis (SMA). This model synthesizes the input data and uses operational intelligence to provide actionable recommendations. In addition, it also matches the synthesized function of the experiences and learning gained from the environment. The SMA model presented is independent of the application domain, and can be applied to different domains, such as Education, Healthcare and Government, etc. Finally, we also present some of the challenges faced by SMA and how the SMA model presented in this paper solves them. △ Less

Submitted 14 June, 2021; originally announced June 2021.

arXiv:2105.13435 [pdf, other]

Intrusion Detection using Machine Learning Techniques: An Experimental Comparison

Authors: Kathryn-Ann Tait, Jan Sher Khan, Fehaid Alqahtani, Awais Aziz Shah, Fadia Ali Khan, Mujeeb Ur Rehman, Wadii Boulila, Jawad Ahmad

Abstract: Due to an exponential increase in the number of cyber-attacks, the need for improved Intrusion Detection Systems (IDS) is apparent than ever. In this regard, Machine Learning (ML) techniques are playing a pivotal role in the early classification of the attacks in case of intrusion detection within the system. However, due to a large number of algorithms available, the selection of the right method… ▽ More Due to an exponential increase in the number of cyber-attacks, the need for improved Intrusion Detection Systems (IDS) is apparent than ever. In this regard, Machine Learning (ML) techniques are playing a pivotal role in the early classification of the attacks in case of intrusion detection within the system. However, due to a large number of algorithms available, the selection of the right method is a challenging task. To resolve this issue, this paper analyses some of the current state-of-the-art intrusion detection methods and discusses their pros and cons. Further, a review of different ML methods is carried out with four methods showing to be the most suitable one for classifying attacks. Several algorithms are selected and investigated to evaluate the performance of IDS. These IDS classifies binary and multiclass attacks in terms of detecting whether or not the traffic has been considered as benign or an attack. The experimental results demonstrate that binary classification has greater consistency in their accuracy results which ranged from 0.9938 to 0.9977, while multiclass ranges from 0.9294 to 0.9983. However, it has been also observed that multiclass provides the best results with the algorithm k-Nearest neighbor giving an accuracy score of 0.9983 while the binary classification highest score is 0.9977 from Random Forest. The experimental results demonstrate that multiclass classification produces better performance in terms of intrusion detection by specifically differentiating between the attacks and allowing a more targeted response to an attack. △ Less

Submitted 27 May, 2021; originally announced May 2021.

arXiv:2104.11580 [pdf]

Identifying and Modeling Security Threats for IoMT Edge Network using Markov Chain and Common Vulnerability Scoring System (CVSS)

Authors: Maha Ali Allouzi, Javed I. Khan

Abstract: In this work, we defined an attack vector for networks utilizing the Internet of Medical Things (IoMT) devices and compute the probability distribution of IoMT security threats based on Markov chain and Common Vulnerability Scoring System (CVSS). IoMT is an emerging technology that improves patients' quality of life by permitting personalized e-health services without restrictions on time and site… ▽ More In this work, we defined an attack vector for networks utilizing the Internet of Medical Things (IoMT) devices and compute the probability distribution of IoMT security threats based on Markov chain and Common Vulnerability Scoring System (CVSS). IoMT is an emerging technology that improves patients' quality of life by permitting personalized e-health services without restrictions on time and site. The IoMT consists of embedded objects, sensors, and actuators that transmit and receive medical data. These Medical devices are vulnerable to different types of security threats, and thus, they pose a significant risk to patient's privacy and safety. Because security is a critical factor for successfully merging IoMT into pervasive healthcare systems, there is an urgent need for new security mechanisms to prevent threats on the IoMT edge network. Toward this direction, the first step is defining an attack vector that an attacker or unauthorized user can take advantage of to penetrate and tamper with medical data. In this article, we specify a threat model for the IoMT edge network. We identify any vulnerabilities or weaknesses within the IoMT network that allow unauthorized privileges and threats that can utilize these weaknesses to compromise the IoMT edge network. Finally, we compute the probability distribution of IoMT threats based on the Markov transition probability matrix. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2104.01898 [pdf]

Catalyst-Free Growth of Atomically-thin Bi2O2Se Nanoribbons for High-performance Electronics and Optoelectronics

Authors: Usman Khan, Lei Tang, Baofu Ding, Yuting Luo, Simin Feng, Wenjun Chen, Muhammad Jahangir Khan, Bilu Liu, Hui-Ming Cheng

Abstract: One-dimensional (1D) materials have attracted significant research interest due to their unique quantum confinement effects and edge-related properties. Atomically thin 1D nanoribbon is particularly interesting because it is a valuable platform with physical limits of both thickness and width. Here, we develop a catalyst-free growth method and achieves the growth of Bi2O2Se nanostructures with tun… ▽ More One-dimensional (1D) materials have attracted significant research interest due to their unique quantum confinement effects and edge-related properties. Atomically thin 1D nanoribbon is particularly interesting because it is a valuable platform with physical limits of both thickness and width. Here, we develop a catalyst-free growth method and achieves the growth of Bi2O2Se nanostructures with tunable dimensionality. Significantly, Bi2O2Se nanoribbons with thickness down to 0.65 nm, corresponding to monolayer, are successfully grown for the first time. Electrical and optoelectronic measurements show that Bi2O2Se nanoribbons possess decent performance in terms of mobility, on/off ratio, and photoresponsivity, suggesting their promising for devices. This work not only reports a new method for the growth of atomically thin nanoribbons but also provides a platform to study properties and applications of such nanoribbon materials at thickness limit. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: 15 pages, 4 figures

arXiv:2103.15719 [pdf, ps, other]

Characterizations of Matrix Valued Asymmetric Truncated Toeplitz Operators

Authors: Rewayat Khan, Yagoub Ameur, Jamroz Khan

Abstract: Matrix valued asymmetric truncated Toeplitz operators are compressions of multiplication operators acting between two possibly different model spaces. In this paper, we characterize matrix valued asymmetric truncated Toeplitz operators by using compressed shifts. Matrix valued asymmetric truncated Toeplitz operators are compressions of multiplication operators acting between two possibly different model spaces. In this paper, we characterize matrix valued asymmetric truncated Toeplitz operators by using compressed shifts. △ Less

Submitted 1 May, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: Dr. B. Łanucha waived all claim of copyright in this work

MSC Class: 47B35; 47B32; 30D20

arXiv:2102.08486 [pdf, other]

Automatic Detection of Five API Documentation Smells: Practitioners' Perspectives

Authors: Junaed Younus Khan, Md. Tawkat Islam Khondaker, Gias Uddin, Anindya Iqbal

Abstract: The learning and usage of an API is supported by official documentation. Like source code, API documentation is itself a software product. Several research results show that bad design in API documentation can make the reuse of API features difficult. Indeed, similar to code smells or code antipatterns, poorly designed API documentation can also exhibit 'smells'. Such documentation smells can be d… ▽ More The learning and usage of an API is supported by official documentation. Like source code, API documentation is itself a software product. Several research results show that bad design in API documentation can make the reuse of API features difficult. Indeed, similar to code smells or code antipatterns, poorly designed API documentation can also exhibit 'smells'. Such documentation smells can be described as bad documentation styles that do not necessarily produce an incorrect documentation but nevertheless make the documentation difficult to properly understand and to use. Recent research on API documentation has focused on finding content inaccuracies in API documentation and to complement API documentation with external resources (e.g., crowd-shared code examples). We are aware of no research that focused on the automatic detection of API documentation smells. This paper makes two contributions. First, we produce a catalog of five API documentation smells by consulting literature on API documentation presentation problems. We create a benchmark dataset of 1,000 API documentation units by exhaustively and manually validating the presence of the five smells in Java official API reference and instruction documentation. Second, we conduct a survey of 21 professional software developers to validate the catalog. The developers agreed that they frequently encounter all five smells in API official documentation and 95.2% of them reported that the presence of the documentation smells negatively affects their productivity. The participants wished for tool support to automatically detect and fix the smells in API official documentation. We develop a suite of rule-based, deep and shallow machine learning classifiers to automatically detect the smells. The best performing classifier BERT, a deep learning model, achieves F1-scores of 0.75 - 0.97. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Journal ref: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

Showing 1–50 of 82 results for author: Khan, J