Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–39 of 39 results for author: Bhatia, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.09187  [pdf, other

    cs.LG cs.AI cs.CL

    Automated Rewards via LLM-Generated Progress Functions

    Authors: Vishnu Sarukkai, Brennan Shacklett, Zander Majercik, Kush Bhatia, Christopher Ré, Kayvon Fatahalian

    Abstract: Large Language Models (LLMs) have the potential to automate reward engineering by leveraging their broad domain knowledge across various tasks. However, they often need many iterations of trial-and-error to generate effective reward functions. This process is costly because evaluating every sampled reward function requires completing the full policy optimization process for each function. In this… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: 26 pages, 5 figures

  2. arXiv:2410.05224  [pdf, other

    cs.CL cs.LG

    Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates

    Authors: Avanika Narayan, Mayee F. Chen, Kush Bhatia, Christopher Ré

    Abstract: Fine-tuning large language models (LLMs) on instruction datasets is a common way to improve their generative capabilities. However, instruction datasets can be expensive and time-consuming to manually curate, and while LLM-generated data is less labor-intensive, it may violate user privacy agreements or terms of service of LLM providers. Therefore, we seek a way of constructing instruction dataset… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: COLM 2024

  3. arXiv:2406.19557  [pdf, other

    eess.IV cs.CV physics.med-ph

    Robustness Testing of Black-Box Models Against CT Degradation Through Test-Time Augmentation

    Authors: Jack Highton, Quok Zong Chong, Samuel Finestone, Arian Beqiri, Julia A. Schnabel, Kanwal K. Bhatia

    Abstract: Deep learning models for medical image segmentation and object detection are becoming increasingly available as clinical products. However, as details are rarely provided about the training data, models may unexpectedly fail when cases differ from those in the training distribution. An approach allowing potential users to independently test the robustness of a model, treating it as a black box and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2402.04347  [pdf, other

    cs.LG cs.CL

    The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

    Authors: Michael Zhang, Kush Bhatia, Hermann Kumbong, Christopher Ré

    Abstract: Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training linear Transformers from scratch, (2) "finetuned-conversion" of task-specific Transformers into linear versions that recover task performance, and (3) "pretrained-conversion" of Transformers such as large l… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 30 pages, 20 figures, 15 tables, ICLR 2024

  5. arXiv:2310.16763  [pdf, other

    cs.CL cs.AI cs.LG

    SuperHF: Supervised Iterative Learning from Human Feedback

    Authors: Gabriel Mukobi, Peter Chatain, Su Fong, Robert Windesheim, Gitta Kutyniok, Kush Bhatia, Silas Alberti

    Abstract: While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these models, Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). SFT is simple and robust, powering a host of open-source models, while RL… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to the Socially Responsible Language Modelling Research (SoLaR) workshop at NeurIPS 2023

  6. arXiv:2307.14430  [pdf, other

    cs.CL cs.LG

    Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

    Authors: Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré

    Abstract: The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when le… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  7. arXiv:2307.11031  [pdf, ps, other

    cs.LG cs.CL

    Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

    Authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré

    Abstract: Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: 38 pages, 22 figures, 8 tables

  8. arXiv:2306.07536  [pdf, other

    cs.LG cs.AI cs.CL

    TART: A plug-and-play Transformer module for task-agnostic reasoning

    Authors: Kush Bhatia, Avanika Narayan, Christopher De Sa, Christopher Ré

    Abstract: Large language models (LLMs) exhibit in-context learning abilities which enable the same model to perform several tasks without any task-specific training. In contrast, traditional adaptation approaches, such as fine-tuning, modify the underlying models for each specific task. In-context learning, however, consistently underperforms task-specific tuning approaches even when presented with the same… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  9. arXiv:2302.12349  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

    Authors: Kush Bhatia, Wenshuo Guo, Jacob Steinhardt

    Abstract: Specifying reward functions for complex tasks like object manipulation or driving is challenging to do by hand. Reward learning seeks to address this by learning a reward model using human feedback on selected query policies. This shifts the burden of reward specification to the optimal design of the queries. We propose a theoretical framework for studying reward learning and the associated optima… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: Accepted to AISTATS 2023

  10. arXiv:2301.09251  [pdf, other

    cs.LG stat.ML

    Congested Bandits: Optimal Routing via Short-term Resets

    Authors: Pranjal Awasthi, Kush Bhatia, Sreenivas Gollapudi, Kostas Kollias

    Abstract: For traffic routing platforms, the choice of which route to recommend to a user depends on the congestion on these routes -- indeed, an individual's utility depends on the number of people using the recommended route at that instance. Motivated by this, we introduce the problem of Congested Bandits where each arm's reward is allowed to depend on the number of times it was played in the past $Δ$ ti… ▽ More

    Submitted 22 January, 2023; originally announced January 2023.

    Comments: Published at ICML 2022

  11. arXiv:2212.04717  [pdf, other

    cs.LG cs.AI

    On the Sensitivity of Reward Inference to Misspecified Human Models

    Authors: Joey Hong, Kush Bhatia, Anca Dragan

    Abstract: Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want. But doing so relies on models of how humans behave given their objectives. After decades of research in cognitive science, neuroscience, and behavioral economics, obtaining accurate human models remains an open research topic. This begs the question: how… ▽ More

    Submitted 30 October, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: published as a paper in ICLR 2023; 17 pages, 12 figures

  12. arXiv:2210.02441  [pdf, other

    cs.CL

    Ask Me Anything: A simple strategy for prompting language models

    Authors: Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

    Abstract: Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect promp… ▽ More

    Submitted 19 November, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

  13. arXiv:2207.11208  [pdf, other

    stat.ML cs.LG

    Statistical and Computational Trade-offs in Variational Inference: A Case Study in Inferential Model Selection

    Authors: Kush Bhatia, Nikki Lijing Kuang, Yi-An Ma, Yixin Wang

    Abstract: Variational inference has recently emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. The core idea is to trade statistical accuracy for computational efficiency. In this work, we study these statistical and computational trade-offs in variational inference via a case study in inferential model selection. Focusing on Gaussian infere… ▽ More

    Submitted 6 August, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: 57 pages, 8 figures

  14. arXiv:2203.03706  [pdf, other

    cs.SD cs.LG eess.AS

    Detection of AI Synthesized Hindi Speech

    Authors: Karan Bhatia, Ansh Agrawal, Priyanka Singh, Arun Kumar Singh

    Abstract: The recent advancements in generative artificial speech models have made possible the generation of highly realistic speech signals. At first, it seems exciting to obtain these artificially synthesized signals such as speech clones or deep fakes but if left unchecked, it may lead us to digital dystopia. One of the primary focus in audio forensics is validating the authenticity of a speech. Though… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 5 Pages, 6 Figures, 4 Tables

  15. arXiv:2201.03544  [pdf, ps, other

    cs.LG cs.AI stat.ML

    The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

    Authors: Alexander Pan, Kush Bhatia, Jacob Steinhardt

    Abstract: Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied. To understand how reward hacking arises, we construct four RL environments with misspecified rewards. We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time. M… ▽ More

    Submitted 14 February, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: ICLR 2022; 19 pages

  16. arXiv:2109.02273  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG

    Postulating Exoplanetary Habitability via a Novel Anomaly Detection Method

    Authors: Jyotirmoy Sarkar, Kartik Bhatia, Snehanshu Saha, Margarita Safonova, Santonu Sarkar

    Abstract: A profound shift in the study of cosmology came with the discovery of thousands of exoplanets and the possibility of the existence of billions of them in our Galaxy. The biggest goal in these searches is whether there are other life-harbouring planets. However, the question which of these detected planets are habitable, potentially-habitable, or maybe even inhabited, is still not answered. Some po… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: 12 pages, 3 figures, submitted to MNRAS

  17. arXiv:2108.03124  [pdf, other

    cs.CV

    Contrastive Learning for View Classification of Echocardiograms

    Authors: Agisilaos Chartsias, Shan Gao, Angela Mumith, Jorge Oliveira, Kanwal Bhatia, Bernhard Kainz, Arian Beqiri

    Abstract: Analysis of cardiac ultrasound images is commonly performed in routine clinical practice for quantification of cardiac function. Its increasing automation frequently employs deep learning networks that are trained to predict disease or detect image features. However, such models are extremely data-hungry and training requires labelling of many thousands of images by experienced clinicians. Here we… ▽ More

    Submitted 6 August, 2021; originally announced August 2021.

    Comments: Accepted in ASMUS workshop of MICCAI 2021

  18. arXiv:2105.01850  [pdf, other

    cs.LG stat.ML

    Preference learning along multiple criteria: A game-theoretic perspective

    Authors: Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright

    Abstract: The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well known that any Nash equilibrium of the zero sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, howe… ▽ More

    Submitted 4 May, 2021; originally announced May 2021.

    Comments: 47 pages; published as a conference paper at NeurIPS 2020

  19. arXiv:2104.08482  [pdf, other

    cs.LG stat.ML

    Agnostic learning with unknown utilities

    Authors: Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

    Abstract: Traditional learning approaches for classification implicitly assume that each mistake has the same cost. In many real-world problems though, the utility of a decision depends on the underlying context $x$ and decision $y$. However, directly incorporating these utilities into the learning objective is often infeasible since these can be quite complex and difficult for humans to specify. We forma… ▽ More

    Submitted 17 April, 2021; originally announced April 2021.

    Comments: 30 pages; published as a conference paper at ITCS 2021

  20. arXiv:2012.01705  [pdf, ps, other

    cs.LG stat.ML

    Online learning with dynamics: A minimax perspective

    Authors: Kush Bhatia, Karthik Sridharan

    Abstract: We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. I… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: Published at NeurIPS 2020

  21. arXiv:2006.14782  [pdf, other

    cs.CR cs.HC

    WorkerRep: Immutable Reputation System For Crowdsourcing Platform Based on Blockchain

    Authors: Gurpriya Kaur Bhatia, Shubham Gupta, Alpana Dubey, Ponnurangam Kumaraguru

    Abstract: Crowdsourcing is a process wherein an individual or an organisation utilizes the talent pool present over the Internet to accomplish their task. The existing crowdsourcing platforms and their reputation computation are centralised and hence prone to various attacks or malicious manipulation of the data by the central entity. A few distributed crowdsourcing platforms have been proposed but they lac… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

  22. arXiv:1907.11826  [pdf, ps, other

    stat.ML cs.LG stat.CO

    Bayesian Robustness: A Nonasymptotic Viewpoint

    Authors: Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan

    Abstract: We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers. We propose Rob-ULA, a robust variant of the Unadjusted Langevin Algorithm (ULA), and provide a finite-sample analysis of its sampling distribution. In particular, we show that after… ▽ More

    Submitted 26 July, 2019; originally announced July 2019.

    Comments: 30 pages, 5 figures

  23. arXiv:1907.05164  [pdf

    eess.IV cs.CV cs.LG

    Disease classification of macular Optical Coherence Tomography scans using deep learning software: validation on independent, multi-centre data

    Authors: Kanwal K. Bhatia, Mark S. Graham, Louise Terry, Ashley Wood, Paris Tranos, Sameer Trikha, Nicolas Jaccard

    Abstract: Purpose: To evaluate Pegasus-OCT, a clinical decision support software for the identification of features of retinal disease from macula OCT scans, across heterogenous populations involving varying patient demographics, device manufacturers, acquisition sites and operators. Methods: 5,588 normal and anomalous macular OCT volumes (162,721 B-scans), acquired at independent centres in five countrie… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

  24. arXiv:1903.08192  [pdf, ps, other

    cs.LG stat.ML

    Adaptive Hard Thresholding for Near-optimal Consistent Robust Regression

    Authors: Arun Sai Suggala, Kush Bhatia, Pradeep Ravikumar, Prateek Jain

    Abstract: We study the problem of robust linear regression with response variable corruptions. We consider the oblivious adversary model, where the adversary corrupts a fraction of the responses in complete ignorance of the data. We provide a nearly linear time estimator which consistently estimates the true regression vector, even with $1-o(1)$ fraction of corruptions. Existing results in this setting eith… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

  25. arXiv:1901.02358  [pdf, ps, other

    cs.LG cs.AI cs.NE stat.ML

    FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network

    Authors: Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, Manik Varma

    Abstract: This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: 23 pages, 10 figures, Published at Advances in Neural Information Processing Systems (NeurIPS) 2018

  26. arXiv:1812.08305  [pdf, ps, other

    cs.LG math.OC stat.ML

    Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

    Authors: Dhruv Malik, Ashwin Pananjady, Kush Bhatia, Koulik Khamaru, Peter L. Bartlett, Martin J. Wainwright

    Abstract: We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order eva… ▽ More

    Submitted 18 May, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

    Comments: Version v3 consistent with paper appearing in JMLR

  27. arXiv:1811.08393  [pdf, ps, other

    cs.LG cs.DS cs.NE stat.ML

    Gen-Oja: A Two-time-scale approach for Streaming CCA

    Authors: Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael I. Jordan

    Abstract: In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting. We propose a simple and efficient algorithm, Gen-Oja, for these problems. We prove the global convergence of our algorithm, borrowing ideas from the theory of fast-mixing Markov chains and two-time-scale stochastic approximation, showing that it achiev… ▽ More

    Submitted 31 January, 2020; v1 submitted 20 November, 2018; originally announced November 2018.

    Comments: Accepted at NeurIPS 2018

  28. arXiv:1810.08174  [pdf, other

    cs.RO

    Establishing Appropriate Trust via Critical States

    Authors: Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan

    Abstract: In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts. Learned neural network policies make that particularly challenging. We propose an approach for helping end-users build a mental model of such policies. Our key observation is that for most tasks, the essence of the policy is captured in a few critical states… ▽ More

    Submitted 18 October, 2018; originally announced October 2018.

    Comments: IROS 2018

  29. arXiv:1607.00146  [pdf, ps, other

    cs.LG stat.ML

    Efficient and Consistent Robust Time Series Analysis

    Authors: Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, Purushottam Kar

    Abstract: We study the problem of robust time series analysis under the standard auto-regressive (AR) time series model in the presence of arbitrary outliers. We devise an efficient hard thresholding based algorithm which can obtain a consistent estimate of the optimal AR model despite a large fraction of the time series points being corrupted. Our algorithm alternately estimates the corrupted set of points… ▽ More

    Submitted 1 July, 2016; originally announced July 2016.

  30. arXiv:1509.06847  [pdf

    cs.IR

    Design and Implementation of Domain based Semantic Hidden Web Crawler

    Authors: Manvi, Komal Kumar Bhatia, Ashutosh Dixit

    Abstract: Web is a wide term which mainly consists of surface web and hidden web. One can easily access the surface web using traditional web crawlers, but they are not able to crawl the hidden portion of the web. These traditional crawlers retrieve contents from web pages, which are linked by hyperlinks ignoring the information hidden behind form pages, which cannot be extracted using simple hyperlink stru… ▽ More

    Submitted 23 September, 2015; originally announced September 2015.

    Comments: 12 pages,10 figures

    Journal ref: IJIACS 2015 Volume 4 Special Issue ICRDESM-15 Paper id: 9D2N6Y

  31. A novel design of hidden web crawler using ontology

    Authors: Manvi, Komal Kumar Bhatia, Ashutosh Dixit

    Abstract: Deep Web is content hidden behind HTML forms. Since it represents a large portion of the structured, unstructured and dynamic data on the Web, accessing Deep-Web content has been a long challenge for the database community. This paper describes a crawler for accessing Deep-Web using Ontologies. Performance evaluation of the proposed work showed that this new approach has promising results.

    Submitted 10 August, 2015; originally announced August 2015.

    Comments: 7 pages,8 figures,2 tables, International Journal of Engineering Trends & Technology (IJETT),August 2015, ISSN: 2231-5381

  32. arXiv:1507.02743  [pdf, ps, other

    cs.LG cs.IR math.OC stat.ML

    Locally Non-linear Embeddings for Extreme Multi-label Learning

    Authors: Kush Bhatia, Himanshu Jain, Purushottam Kar, Prateek Jain, Manik Varma

    Abstract: The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensio… ▽ More

    Submitted 9 July, 2015; originally announced July 2015.

  33. arXiv:1506.02428  [pdf, other

    cs.LG stat.ML

    Robust Regression via Hard Thresholding

    Authors: Kush Bhatia, Prateek Jain, Purushottam Kar

    Abstract: We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted. More specifically, for a data matrix X \in R^{p x n} and an underlying model w*, the response vector is generated as y = X'w* + b where b \in R^n is the corruption vector supported over at most C.n coordinates. Existing exact recovery results for RLSR focus solely on L1-p… ▽ More

    Submitted 8 June, 2015; originally announced June 2015.

    Comments: 24 pages, 3 figures

  34. arXiv:1407.5732  [pdf

    cs.IR

    A Comparative Study of Hidden Web Crawlers

    Authors: Sonali Gupta, Komal Kumar Bhatia

    Abstract: A large amount of data on the WWW remains inaccessible to crawlers of Web search engines because it can only be exposed on demand as users fill out and submit forms. The Hidden web refers to the collection of Web data which can be accessed by the crawler only through an interaction with the Web-based search form and not simply by traversing hyperlinks. Research on Hidden Web has emerged almost a d… ▽ More

    Submitted 22 July, 2014; originally announced July 2014.

    Comments: 8 pages, 8 figures

    Journal ref: Vol 12 number 3 , Jun 2014 V12(3):111-118

  35. arXiv:1406.5690  [pdf

    cs.IR

    WebParF: A Web partitioning framework for Parallel Crawlers

    Authors: Sonali Gupta, Komal kumar Bhatia, Pikakshi Manchanda

    Abstract: With the ever proliferating size and scale of the WWW [1] efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this era of tera and multi-core processors, we ought to think of multi-threaded processes as a serving solution. So, even better how can we improve the crawling performance by using parallel crawlers… ▽ More

    Submitted 22 June, 2014; originally announced June 2014.

    Comments: 8pages, 7 figures, ISSN : 0975-3397 Vol.5 no.8, 2013

  36. arXiv:1311.4900  [pdf

    cs.IR cs.DB

    Query Interface Integrator For Domain Specific Hidden Web

    Authors: Sudhakar Ranjan, Komal K. Bhatia

    Abstract: Web is title admittance today mainly relies on search engines. A large amount of data is hidden in the databases behind the search interfaces referred to as Hidden web, which needs to be indexed so in order to serve user query. In this paper database and data mining techniques are used for query interface integration. The query interface must resemble the look and feel of local interface as much a… ▽ More

    Submitted 16 November, 2013; originally announced November 2013.

    Comments: 8 Pages. International Journal of Computer Engineering and Applications, 2013

  37. arXiv:1311.0339  [pdf

    cs.IR

    A Novel Term Weighing Scheme Towards Efficient Crawl of Textual Databases

    Authors: Sonali Gupta, Komal Kumar Bhatia

    Abstract: The Hidden Web is the vast repository of informational databases available only through search form interfaces, accessible by therein typing a set of keywords in the search forms. Typically, a Hidden Web crawler is employed to autonomously discover and download pages from the Hidden Web. Traditional hidden web crawlers do not provide the search engines with an optimal search experience because of… ▽ More

    Submitted 2 November, 2013; originally announced November 2013.

    Comments: 12 Pages. IJCEA, 2013

  38. arXiv:1307.6814  [pdf

    cs.LG

    A Propound Method for the Improvement of Cluster Quality

    Authors: Shveta Kundra Bhatia, V. S. Dixit

    Abstract: In this paper Knockout Refinement Algorithm (KRA) is proposed to refine original clusters obtained by applying SOM and K-Means clustering algorithms. KRA Algorithm is based on Contingency Table concepts. Metrics are computed for the Original and Refined Clusters. Quality of Original and Refined Clusters are compared in terms of metrics. The proposed algorithm (KRA) is tested in the educational dom… ▽ More

    Submitted 25 July, 2013; originally announced July 2013.

    Journal ref: IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 2, July 2012 ISSN (Online): 1694-0814

  39. arXiv:1201.4210  [pdf

    cs.IR cs.AI

    Collaborative Personalized Web Recommender System using Entropy based Similarity Measure

    Authors: Harita Mehta, Shveta Kundra Bhatia, Punam Bedi, V. S. Dixit

    Abstract: On the internet, web surfers, in the search of information, always strive for recommendations. The solutions for generating recommendations become more difficult because of exponential increase in information domain day by day. In this paper, we have calculated entropy based similarity between users to achieve solution for scalability problem. Using this concept, we have implemented an online user… ▽ More

    Submitted 20 January, 2012; originally announced January 2012.

    Comments: 10 pages

    Journal ref: IJCSI, Vol 8, Issue 6, No 3, Nov 2011