Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–20 of 20 results for author: Rao, K S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13384  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection

    Authors: Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra, Vinod Rathod

    Abstract: Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data char… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2404.12679  [pdf, other

    cs.CV cs.CR

    MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement

    Authors: Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra

    Abstract: Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-en… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  3. BOXREC: Recommending a Box of Preferred Outfits in Online Shopping

    Authors: Debopriyo Banerjee, Krothapalli Sreenivasa Rao, Shamik Sural, Niloy Ganguly

    Abstract: Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Journal ref: ACM Trans. Intell. Syst. Technol. 11, 6, Article 69 (December 2020), pages 69:1-69:28

  4. arXiv:2401.01356  [pdf, other

    cs.IR

    Efficient Indexing of Meta-Data (Extracted from Educational Videos)

    Authors: Shalika Kumbham, Abhijit Debnath, Krothapalli Sreenivasa Rao

    Abstract: Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, m… ▽ More

    Submitted 11 December, 2023; originally announced January 2024.

  5. arXiv:2310.12736  [pdf, other

    cs.CV

    ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swapping

    Authors: Aravinda Reddy PN, K. Sreenivasa Rao, Raghavendra Ramachandra, Pabitra mitra

    Abstract: We present a novel face swapping method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concate… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  6. arXiv:2202.01078  [pdf, other

    cs.SD eess.AS

    Melody Extraction from Polyphonic Music by Deep Learning Approaches: A Review

    Authors: Gurunath Reddy M, K. Sreenivasa Rao, Partha Pratim Das

    Abstract: Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment wit… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 72 pages

  7. arXiv:2112.04841  [pdf, other

    eess.AS cs.MM cs.SD eess.SP

    On The Effect Of Coding Artifacts On Acoustic Scene Classification

    Authors: Nagashree K. S. Rao, Nils Peters

    Abstract: Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and classify audio recordings previously uploaded (or st… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: paper presented at the 2021 Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

  8. arXiv:2109.04138  [pdf, other

    cs.CR cs.CV

    Multilingual Audio-Visual Smartphone Dataset And Evaluation

    Authors: Hareesh Mandalapu, Aravinda Reddy P N, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

    Abstract: Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to their usability, and also it will be challenging to spoof because of their multimodal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset cont… ▽ More

    Submitted 15 November, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

  9. Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

    Authors: Hareesh Mandalapu, P N Aravinda Reddy, Raghavendra Ramachandra, K Sreenivasa Rao, Pabitra Mitra, S R Mahadeva Prasanna, Christoph Busch

    Abstract: Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and… ▽ More

    Submitted 12 March, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Journal ref: in IEEE Access, vol. 9, pp. 37431-37455, 2021

  10. arXiv:2011.06455  [pdf

    cs.GT physics.soc-ph q-bio.PE

    Optimal governance and implementation of vaccination programmes to contain the COVID-19 pandemic

    Authors: Mahendra Piraveenan, Shailendra Sawleshwarkar, Michael Walsh, Iryna Zablotska, Samit Bhattacharyya, Habib Hassan Farooqui, Tarun Bhatnagar, Anup Karan, Manoj Murhekar, Sanjay Zodpey, K. S. Mallikarjuna Rao, Philippa Pattison, Albert Zomaya, Matjaz Perc

    Abstract: Since the recent introduction of several viable vaccines for SARS-CoV-2, vaccination uptake has become the key factor that will determine our success in containing the COVID-19 pandemic. We argue that game theory and social network models should be used to guide decisions pertaining to vaccination programmes for the best possible results. In the months following the introduction of vaccines, their… ▽ More

    Submitted 9 June, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: 15 pages, 1 figure; published in Royal Society Open Science

    Journal ref: R. Soc. Open Sci. 8, 210429 (2021)

  11. arXiv:2011.04297  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Knowledge Distillation for Singing Voice Detection

    Authors: Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das

    Abstract: Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for C… ▽ More

    Submitted 19 August, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures

  12. arXiv:1909.03974  [pdf, other

    eess.AS cs.LG cs.SD

    DNN-based cross-lingual voice conversion using Bottleneck Features

    Authors: M Kiran Reddy, K Sreenivasa Rao

    Abstract: Cross-lingual voice conversion (CLVC) is a quite challenging task since the source and target speakers speak different languages. This paper proposes a CLVC framework based on bottleneck features and deep neural network (DNN). In the proposed method, the bottleneck features extracted from a deep auto-encoder (DAE) are used to represent speaker-independent features of speech signals from different… ▽ More

    Submitted 10 September, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

  13. arXiv:1908.09634  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Multilingual and Multimode Phone Recognition System for Indian Languages

    Authors: Kumud Tripathi, M. Kiran Reddy, K. Sreenivasa Rao

    Abstract: The aim of this paper is to develop a flexible framework capable of automatically recognizing phonetic units present in a speech utterance of any language spoken in any mode. In this study, we considered two modes of speech: conversation, and read modes in four Indian languages, namely, Telugu, Kannada, Odia, and Bengali. The proposed approach consists of two stages: (1) Automatic speech mode clas… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: 33 pages, 5 figures, 6 tables, article

  14. arXiv:1908.08668  [pdf, ps, other

    eess.AS cs.SD

    VOP Detection for Read and Conversation Speech using CWT Coefficients and Phone Boundaries

    Authors: Kumud Tripathi, K. Sreenivasa Rao

    Abstract: In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropri… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: 21 pages, 8 figures, 4 tables, article

  15. arXiv:1904.09765  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    hf0: A hybrid pitch extraction method for multimodal voice

    Authors: Pradeep Rengaswamy, Gurunath Reddy M, Krothapalli Sreenivasa Rao

    Abstract: Pitch or fundamental frequency (f0) extraction is a fundamental problem studied extensively for its potential applications in speech and clinical applications. In literature, explicit mode specific (modal speech or singing voice or emotional/ expressive speech or noisy speech) signal processing and deep learning f0 extraction methods that exploit the quasi periodic nature of the signal in time, ha… ▽ More

    Submitted 22 April, 2019; originally announced April 2019.

    Comments: Pitch Extraction, F0 extraction, harmonic signals, speech, monophonic songs, Convolutional Neural Network, 5 pages, 5 figures

  16. arXiv:1811.09956  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Glottal Closure Instants Detection From Pathological Acoustic Speech Signal Using Deep Learning

    Authors: Gurunath Reddy M, Tanumay Mandal, Krothapalli Sreenivasa Rao

    Abstract: In this paper, we propose a classification based glottal closure instants (GCI) detection from pathological acoustic speech signal, which finds many applications in vocal disorder analysis. Till date, GCI for pathological disorder is extracted from laryngeal (glottal source) signal recorded from Electroglottograph, a dedicated device designed to measure the vocal folds vibration around the larynx.… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/39

  17. arXiv:1807.07710  [pdf, ps, other

    cs.CR

    Multivariate Public Key Cryptography and Digital Signature

    Authors: Pulugurtha Krishna Subba Rao, Duggirala Meher Krishna, Duggirala Ravi

    Abstract: In this paper, algorithms for multivariate public key cryptography and digital signature are described. Plain messages and encrypted messages are arrays, consisting of elements from a fixed finite ring or field. The encryption and decryption algorithms are based on multivariate mappings. The security of the private key depends on the difficulty of solving a system of parametric simultaneous multiv… ▽ More

    Submitted 23 July, 2018; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1608.06472

    MSC Class: 03C10; 11C08; 11T71; 12E20; 12Y05; 13A15; 13P10; 81P94; 94A60

  18. arXiv:1405.2049  [pdf, other

    cs.IT cs.CR

    A New Upperbound for the Oblivious Transfer Capacity of Discrete Memoryless Channels

    Authors: K. Sankeerth Rao, Vinod M. Prabhakaran

    Abstract: We derive a new upper bound on the string oblivious transfer capacity of discrete memoryless channels. The main tool we use is the tension region of a pair of random variables introduced in Prabhakaran and Prabhakaran (2014) where it was used to derive upper bounds on rates of secure sampling in the source model. In this paper, we consider secure computation of string oblivious transfer in the cha… ▽ More

    Submitted 8 May, 2014; originally announced May 2014.

    Comments: 7 pages, 3 figures, extended version of submission to IEEE Information Theory Workshop, 2014

  19. arXiv:1209.4157  [pdf, ps, other

    cs.OH

    AutoAmp : An Open-Source Analog Amplifier Design Tool - For Classroom and Lab Purposes

    Authors: Om Prasad Patri, K. Sanmukh Rao

    Abstract: This correspondence presents an open-source tool AutoAmp developed at the Indian Institute of Technology, Guwahati. It is available at http://sourceforge.net/projects/autoamp-iitg/ This tool helps the user to design different types of electronic amplifiers, using solid state devices, for a given specification. It can handle several types of designs namely common-emitter BJT amplifier (single and t… ▽ More

    Submitted 19 September, 2012; originally announced September 2012.

    Comments: presented at the Indian Conference for Academic Research by Undergraduate Students (ICARUS), 2010, IIT Kanpur; AutoAmp : An Open-Source Analog Amplifier Design Tool - For Classroom and Lab Purposes, Proceedings of the Indian Conference for Academic Research by Undergraduate Students (ICARUS), 2010

  20. arXiv:1001.4190  [pdf

    cs.SD

    Speech Recognition of the letter 'zha' in Tamil Language using HMM

    Authors: A. Srinivasan, K. Srinivasa Rao, K. Kannan, D. Narasimhan

    Abstract: Speech signals of the letter 'zha' in Tamil language of 3 males and 3 females were coded using an improved version of Linear Predictive Coding (LPC). The sampling frequency was at 16 kHz and the bit rate was at 15450 bits per second, where the original bit rate was at 128000 bits per second with the help of wave surfer audio tool. The output LPC cepstrum is implemented in first order three state… ▽ More

    Submitted 23 January, 2010; originally announced January 2010.

    Comments: 6 Pages

    Report number: IJEST09-01-02-05

    Journal ref: IJEST Volume 1 Issue 2 2009 67-72