Search | arXiv e-print repository

Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks

Authors: Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong

Abstract: Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challen… ▽ More Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challenges associated with privacy-sensitive data, such domains could still benefit from valuable vision services provided by diffusion models. Federated learning (FL) plays a crucial role in enabling decentralized model training without compromising data privacy. Instead of collecting data, an FL system gathers model parameters, effectively safeguarding the private data of different parties involved. This makes FL systems vital for managing decentralized learning tasks, especially in scenarios where privacy-sensitive data is distributed across a network of clients. Nonetheless, FL presents its own set of challenges due to its distributed nature and privacy-preserving properties. Therefore, in this study, we explore the FL strategy to train diffusion models, paving the way for the development of federated diffusion models. We conduct experiments on various FL scenarios, and our findings demonstrate that federated diffusion models have great potential to deliver vision services to privacy-sensitive domains. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2303.12296 [pdf, other]

Prototype Helps Federated Learning: Towards Faster Convergence

Authors: Yu Qiao, Seong-Bae Park, Sun Moo Kang, Choong Seon Hong

Abstract: Federated learning (FL) is a distributed machine learning technique in which multiple clients cooperate to train a shared model without exchanging their raw data. However, heterogeneity of data distribution among clients usually leads to poor model inference. In this paper, a prototype-based federated learning framework is proposed, which can achieve better inference performance with only a few ch… ▽ More Federated learning (FL) is a distributed machine learning technique in which multiple clients cooperate to train a shared model without exchanging their raw data. However, heterogeneity of data distribution among clients usually leads to poor model inference. In this paper, a prototype-based federated learning framework is proposed, which can achieve better inference performance with only a few changes to the last global iteration of the typical federated learning process. In the last iteration, the server aggregates the prototypes transmitted from distributed clients and then sends them back to local clients for their respective model inferences. Experiments on two baseline datasets show that our proposal can achieve higher accuracy (at least 1%) and relatively efficient communication than two popular baselines under different heterogeneous settings. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 3 pages, 3 figures

arXiv:2110.11751 [pdf, other]

Forecasting Financial Market Structure from Network Features using Machine Learning

Authors: Douglas Castilho, Tharsis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho

Abstract: We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering… ▽ More We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph (DAG), Dynamic Minimal Spanning Tree (DMST) and Dynamic Threshold Networks (DTN). Experimental results show that the proposed model can forecast market structure with high predictive performance with up to $40\%$ improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 22 pages, 13 figures

arXiv:2105.12661 [pdf, other]

Detecting Biological Locomotion in Video: A Computational Approach

Authors: Soo Min Kang, Richard P. Wildes

Abstract: Animals locomote for various reasons: to search for food, find suitable habitat, pursue prey, escape from predators, or seek a mate. The grand scale of biodiversity contributes to the great locomotory design and mode diversity. Various creatures make use of legs, wings, fins and other means to move through the world. In this report, we refer to the locomotion of general biological species as biolo… ▽ More Animals locomote for various reasons: to search for food, find suitable habitat, pursue prey, escape from predators, or seek a mate. The grand scale of biodiversity contributes to the great locomotory design and mode diversity. Various creatures make use of legs, wings, fins and other means to move through the world. In this report, we refer to the locomotion of general biological species as biolocomotion. We present a computational approach to detect biolocomotion in unprocessed video. Significantly, the motion exhibited by the body parts of a biological entity to navigate through an environment can be modeled by a combination of an overall positional advance with an overlaid asymmetric oscillatory pattern, a distinctive signature that tends to be absent in non-biological objects in locomotion. We exploit this key trait of positional advance with asymmetric oscillation along with differences in an object's common motion (extrinsic motion) and localized motion of its parts (intrinsic motion) to detect biolocomotion. An algorithm is developed to measure the presence of these traits in tracked objects to determine if they correspond to a biological entity in locomotion. An alternative algorithm, based on generic features combined with learning is assembled out of components from allied areas of investigation, also is presented as a basis of comparison. A novel biolocomotion dataset encompassing a wide range of moving biological and non-biological objects in natural settings is provided. Also, biolocomotion annotations to an extant camouflage animals dataset are provided. Quantitative results indicate that the proposed algorithm considerably outperforms the alternative approach, supporting the hypothesis that biolocomotion can be detected reliably based on its distinct signature of positional advance with asymmetric oscillation and extrinsic/intrinsic motion dissimilarity. △ Less

Submitted 26 May, 2021; originally announced May 2021.

arXiv:2104.10297 [pdf, other]

FPGA Synthesis of Ternary Memristor-CMOS Decoders

Authors: Xiaoyuan Wang, Zhiru Wu, Pengfei Zhou, Herbert H. C. Iu, Jason K. Eshraghian, Sung Mo Kang

Abstract: The search for a compatible application of memristor-CMOS logic gates has remained elusive, as the data density benefits are offset by slow switching speeds and resistive dissipation. Active microdisplays typically prioritize pixel density (and therefore resolution) over that of speed, where the most widely used refresh rates fall between 25-240 Hz. Therefore, memristor-CMOS logic is a promising f… ▽ More The search for a compatible application of memristor-CMOS logic gates has remained elusive, as the data density benefits are offset by slow switching speeds and resistive dissipation. Active microdisplays typically prioritize pixel density (and therefore resolution) over that of speed, where the most widely used refresh rates fall between 25-240 Hz. Therefore, memristor-CMOS logic is a promising fit for peripheral IO logic in active matrix displays. In this paper, we design and implement a ternary 1-3 line decoder and a ternary 2-9 line decoder which are used to program a seven segment LED display. SPICE simulations are conducted in a 50-nm process, and the decoders are synthesized on an Altera Cyclone IV field-programmable gate array (FPGA) development board which implements a ternary memristor model designed in Quartus II. We compare our hardware results to a binary coded decimal (BCD)-to-seven segment display decoder, and show our memristor-CMOS approach reduces the total IO power consumption by a factor of approximately 6 times at a maximum synthesizable frequency of 293.77MHz. Although the speed is approximately half of the native built-in BCD-to-seven decoder, the comparatively slow refresh rates of typical microdisplays indicate this to be a tolerable trade-off, which promotes data density over speed. △ Less

Submitted 20 April, 2021; originally announced April 2021.

arXiv:2102.06536 [pdf, other]

CrossStack: A 3-D Reconfigurable RRAM Crossbar Inference Engine

Authors: Jason K. Eshraghian, Kyoungrok Cho, Sung Mo Kang

Abstract: Deep neural network inference accelerators are rapidly growing in importance as we turn to massively parallelized processing beyond GPUs and ASICs. The dominant operation in feedforward inference is the multiply-and-accumlate process, where each column in a crossbar generates the current response of a single neuron. As a result, memristor crossbar arrays parallelize inference and image processing… ▽ More Deep neural network inference accelerators are rapidly growing in importance as we turn to massively parallelized processing beyond GPUs and ASICs. The dominant operation in feedforward inference is the multiply-and-accumlate process, where each column in a crossbar generates the current response of a single neuron. As a result, memristor crossbar arrays parallelize inference and image processing tasks very efficiently. In this brief, we present a 3-D active memristor crossbar array `CrossStack', which adopts stacked pairs of Al/TiO2/TiO2-x/Al devices with common middle electrodes. By designing CMOS-memristor hybrid cells used in the layout of the array, CrossStack can operate in one of two user-configurable modes as a reconfigurable inference engine: 1) expansion mode and 2) deep-net mode. In expansion mode, the resolution of the network is doubled by increasing the number of inputs for a given chip area, reducing IR drop by 22%. In deep-net mode, inference speed per-10-bit convolution is improved by 29\% by simultaneously using one TiO2/TiO2-x layer for read processes, and the other for write processes. We experimentally verify both modes on our $10\times10\times2$ array. △ Less

Submitted 7 February, 2021; originally announced February 2021.

Comments: 5 pages, 4 figures

arXiv:1908.07193 [pdf, other]

Counterfactual Distribution Regression for Structured Inference

Authors: Nicolo Colombo, Ricardo Silva, Soong M Kang, Arthur Gretton

Abstract: We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the aff… ▽ More We consider problems in which a system receives external \emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the affected rails. We assume that the data available provides records of the system functioning at its "natural regime" (e.g., the train network without disruptions) and data on cases where perturbations took place. The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations. We approach this problem from the point of view of a mapping from the counterfactual distribution of the system behavior without disruptions to the distribution of the disrupted system. A variant on \emph{distribution regression} is developed for this setup. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: 24 pages, 5 figures

arXiv:1610.06906 [pdf, other]

Review of Action Recognition and Detection Methods

Authors: Soo Min Kang, Richard P. Wildes

Abstract: In computer vision, action recognition refers to the act of classifying an action that is present in a given video and action detection involves locating actions of interest in space and/or time. Videos, which contain photometric information (e.g. RGB, intensity values) in a lattice structure, contain information that can assist in identifying the action that has been imaged. The process of action… ▽ More In computer vision, action recognition refers to the act of classifying an action that is present in a given video and action detection involves locating actions of interest in space and/or time. Videos, which contain photometric information (e.g. RGB, intensity values) in a lattice structure, contain information that can assist in identifying the action that has been imaged. The process of action recognition and detection often begins with extracting useful features and encoding them to ensure that the features are specific to serve the task of action recognition and detection. Encoded features are then processed through a classifier to identify the action class and their spatial and/or temporal locations. In this report, a thorough review of various action recognition and detection algorithms in computer vision is provided by analyzing the two-step process of a typical action recognition and detection algorithm: (i) extraction and encoding of features, and (ii) classifying features into action classes. In efforts to ensure that computer vision-based algorithms reach the capabilities that humans have of identifying actions irrespective of various nuisance variables that may be present within the field of view, the state-of-the-art methods are reviewed and some remaining problems are addressed in the final chapter. △ Less

Submitted 1 November, 2016; v1 submitted 21 October, 2016; originally announced October 2016.

Report number: EECS-2016-04

arXiv:1105.5294 [pdf, other]

A long-time limit of world subway networks

Authors: Camille Roth, Soong Moon Kang, Michael Batty, Marc Barthelemy

Abstract: We study the temporal evolution of the structure of the world's largest subway networks in an exploratory manner. We show that, remarkably, all these networks converge to {a shape which shares similar generic features} despite their geographic and economic differences. This limiting shape is made of a core with branches radiating from it. For most of these networks, the average degree of a node (s… ▽ More We study the temporal evolution of the structure of the world's largest subway networks in an exploratory manner. We show that, remarkably, all these networks converge to {a shape which shares similar generic features} despite their geographic and economic differences. This limiting shape is made of a core with branches radiating from it. For most of these networks, the average degree of a node (station) within the core has a value of order 2.5 and the proportion of k=2 nodes in the core is larger than 60%. The number of branches scales roughly as the square root of the number of stations, the current proportion of branches represents about half of the total number of stations, and the average diameter of branches is about twice the average radial extension of the core. Spatial measures such as the number of stations at a given distance to the barycenter display a first regime which grows as r^2 followed by another regime with different exponents, and eventually saturates. These results -- difficult to interpret in the framework of fractal geometry -- confirm and yield a natural explanation in the geometric picture of this core and their branches: the first regime corresponds to a uniform core, while the second regime is controlled by the interstation spacing on branches. The apparent convergence towards a unique network shape in the temporal limit suggests the existence of dominant, universal mechanisms governing the evolution of these structures. △ Less

Submitted 16 May, 2012; v1 submitted 26 May, 2011; originally announced May 2011.

Comments: 11 pages, 13 figures, revised version, accepted for publication in Royal Society Interface

Journal ref: Journal of the Royal Society Interface, 9:2540-2550 (2012)

Showing 1–9 of 9 results for author: Kang, S M