Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2023
Least squares model averaging for distributed data
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 215, Pages 10235–10293Divide and conquer algorithm is a common strategy applied in big data. Model averaging has the natural divide-and-conquer feature, but its theory has not been developed in big data scenarios. The goal of this paper is to fill this gap. We propose two ...
- research-articleJanuary 2023
Distributed nonparametric regression imputation for missing response problems with large-scale data
The Journal of Machine Learning Research (JMLR), Volume 24, Issue 1Article No.: 68, Pages 2961–3012Nonparametric regression imputation is commonly used in missing data analysis. However, it su_ers from the "curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while the large-scale data size presents ...
- research-articleJune 2022
NTP-VFL - A New Scheme for Non-3rd Party Vertical Federated Learning
ICMLC '22: Proceedings of the 2022 14th International Conference on Machine Learning and ComputingPages 134–139https://doi.org/10.1145/3529836.3529841Vertical Federated Learning (FL) handles decentralized and partitioned vertically data about common entities. While most existing privacy-preserving federated learning algorithms require a third party (TP) as an intermediary data accessor to coordinate ...
- research-articleMay 2020
Data Sharing via Differentially Private Coupled Matrix Factorization
ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 14, Issue 3Article No.: 28, Pages 1–27https://doi.org/10.1145/3372408We address the privacy-preserving data-sharing problem in a distributed multiparty setting. In this setting, each data site owns a distinct part of a dataset and the aim is to estimate the parameters of a statistical model conditioned on the complete ...
- research-articleSeptember 2019
Efficient privacy-preserving recommendations based on social graphs
RecSys '19: Proceedings of the 13th ACM Conference on Recommender SystemsPages 78–86https://doi.org/10.1145/3298689.3347013Many recommender systems use association rules mining, a technique that captures relations between user interests and recommends new probable ones accordingly. Applying association rule mining causes privacy concerns as user interests may contain ...
-
- research-articleMarch 2019
Fast Approximate Score Computation on Large-Scale Distributed Data for Learning Multinomial Bayesian Networks
ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 13, Issue 2Article No.: 14, Pages 1–40https://doi.org/10.1145/3301304In this article, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in an efficient and scalable ...
- research-articleJanuary 2019
Training Normal Bayes Classifier on Distributed Data
Procedia Computer Science (PROCS), Volume 150, Issue CPages 389–396https://doi.org/10.1016/j.procs.2019.02.068AbstractThe paper describes an approach to parallelization of Normal Bayes classifier training algorithm for distributed data. In the process of distributed data analysis and the algorithm performance, the results fail to join properly. Due to this, the ...
- research-articleMay 2017
Efficient Matrix Sketching over Distributed Data
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsPages 347–359https://doi.org/10.1145/3034786.3056119A sketch or synopsis of a large dataset captures vital properties of the original data while typically occupying much less space. In this paper, we consider the problem of computing a sketch of a massive data matrix A ∈ℜnxd, which is distributed across ...
- research-articleMay 2016
Management of distributed big data for social networks
CCGRID '16: Proceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid ComputingPages 639–648https://doi.org/10.1109/CCGrid.2016.107In the current era of big data, high volumes of a wide variety of valuable data can be easily collected and generated from a broad range of data sources of different veracities at a high velocity. Due to the well-known 5V's of these big data, many ...
- research-articleJune 2015
Communication-Efficient Computation on Distributed Noisy Datasets
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and ArchitecturesPages 313–322https://doi.org/10.1145/2755573.2755575This paper gives a first attempt to answer the following general question: Given a set of machines connected by a point-to-point communication network, each having a {\em noisy} dataset, how can we perform communication-efficient statistical estimations ...
- surveyApril 2015
Classification Framework of MapReduce Scheduling Algorithms
ACM Computing Surveys (CSUR), Volume 47, Issue 3Article No.: 49, Pages 1–38https://doi.org/10.1145/2693315A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured ...
- articleFebruary 2015
Privacy-Preserving Naïve Bayesian Classifier-Based Recommendations on Distributed Data
Data collected for recommendation purposes might be distributed among various e-commerce sites, which can collaboratively provide more accurate predictions. However, because of privacy concerns, they might not want to work together. If privacy measures ...
- ArticleOctober 2014
The Communication Complexity of Distributed epsilon-Approximations
FOCS '14: Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer SciencePages 591–600https://doi.org/10.1109/FOCS.2014.69Data summarization is an effective approach to dealing with the "big data" problem. While data summarization problems traditionally have been studied is the streaming model, the focus is starting to shift to distributed models, as distributed/parallel ...
- ArticleJune 2014
Privacy-Preserving Kriging Interpolation on Distributed Data
Proceedings of the 14th International Conference on Computational Science and Its Applications — ICCSA 2014 - Volume 8584Pages 695–708https://doi.org/10.1007/978-3-319-09153-2_52Kriging is one of the most preferred geostatistical methods in many engineering fields. Basically, it creates a model using statistical properties of all measured points in the region, where a prediction value is sought. The accuracy of the kriging ...
- research-articleApril 2012
The ERC webdam on foundations of web data management
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebPages 211–214https://doi.org/10.1145/2187980.2188011The Webdam ERC grant is a five-year project that started in December 2008. The goal is to develop a formal model for Web data management that would open new horizons for the development of the Web in a well-principled way, enhancing its functionality, ...
- research-articleMarch 2012
Privacy preserving distributed DBSCAN clustering
EDBT-ICDT '12: Proceedings of the 2012 Joint EDBT/ICDT WorkshopsPages 177–185https://doi.org/10.1145/2320765.2320819DBSCAN is a well-known density-based clustering algorithm which offers advantages for finding clusters of arbitrary shapes compared to partitioning and hierarchical clustering methods. However, there are few papers studying the DBSCAN algorithm under ...
- ArticleDecember 2011
An approach to access the distributed data based on the multi-agent system for interoperability
FGIT'11: Proceedings of the Third international conference on Future Generation Information TechnologyPages 215–222https://doi.org/10.1007/978-3-642-27142-7_25In this paper, we present an approach to access the distributed data for interoperability in the distributed environments based on the multi-agent system that is designed on the proposed structure of multi-agent by FIPA(IEEE Foundation for Intelligent ...
- research-articleOctober 2011
Data mining without data: a novel approach to privacy-preserving collaborative distributed data mining
WPES '11: Proceedings of the 10th annual ACM workshop on Privacy in the electronic societyPages 159–164https://doi.org/10.1145/2046556.2046578With the proliferation of organizations that independently collect various types of data, with the growing awareness of corporations and public to keep their sensitive data private, and with the ever-increasing need of government and corporate policy ...
- ArticleSeptember 2011
Privacy-Preserving Trust-Based Recommendations on Vertically Distributed Data
ICSC '11: Proceedings of the 2011 IEEE Fifth International Conference on Semantic ComputingPages 376–379https://doi.org/10.1109/ICSC.2011.43Providing recommendations on trusts between entities is receiving increasing attention lately. Customers may prefer different online vendors for shopping. Thus, their preferences about various products might be distributed among multiple parties. To ...
- articleAugust 2011
Secure construction and publication of contingency tables from distributed data
Contingency tables are widely used in many fields to analyze the relationship or infer the association between two or more variables. Indeed, due to their simplicity and ease, they are one of the first methods used to analyze gathered data. Typically, ...