Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
MapReduce paradigm has become the compelling choice for performing advanced analytics over unstructured information and enabling efficient “Big Data ” processing. There is an increasing number of MapReduce applications, e.g., personalized... more
MapReduce paradigm has become the compelling choice for performing advanced analytics over unstructured information and enabling efficient “Big Data ” processing. There is an increasing number of MapReduce applications, e.g., personalized advertising, sentiment analysis, spam detection, real-time event log analysis, etc., that require completion time guarantees and are deadline-driven. In an enterprise setting, users share Hadoop clusters and benefit from processing a diverse variety of applications over the same or different datasets. The existing Hadoop schedulers (Hadoop Fair Scheduler, Capacity Scheduler) do not support completion time guarantees. Given a MapReduce workload consisting of diverse jobs with deadlines, how do we schedule them
a b s t r a c t The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and... more
a b s t r a c t The core business of many companies depends on the timely analysis of large quantities of new data. MapReduce clusters that routinely process petabytes of data represent a new entity in the evolving landscape of clouds and data centers. During the lifetime of a data center, old hardware needs to be eventually replaced by new hardware. The hardware selection process needs to be driven by performance objectives of the existing production workloads. In this work, we present a general framework, called Ariel, that automates system administrators' efforts for evaluating different hardware choices and predicting completion times of MapReduce applications for their migration to a Hadoop cluster based on the new hardware. The proposed framework consists of two key components: (i) a set of microbenchmarks to profile the MapReduce processing pipeline on a given platform, and (ii) a regression-based model that establishes a performance relationship between the source and t...
A fast, degradation-free solution for the DCT block extraction problem is proposed. The problem is defined as extracting a DCT block from a DCT compressed frame composed of DCT blocks. This problem is encountered in both video/image... more
A fast, degradation-free solution for the DCT block extraction problem is proposed. The problem is defined as extracting a DCT block from a DCT compressed frame composed of DCT blocks. This problem is encountered in both video/image manipulations in the compressed domain and transcodecs, for example, converting from MPEG to Motion JPEG. Traditionally, solutions involve using the pixel domain manipulation
When threads use context switching, they incur an overhead in addition to the minimum required running time. The source of this overhead is both direct overhead due to running the context switch code and indirect overhead due to... more
When threads use context switching, they incur an overhead in addition to the minimum required running time. The source of this overhead is both direct overhead due to running the context switch code and indirect overhead due to perturbation of caches. We calculate indirect overhead by measuring the running time of tasks that use context switching and subtracting the direct overhead. We also measure the indirect overhead impact on the running time of tasks due to processor interrupt servicing. Experiment results are presented for the Linux kernel running on a mobile device platform.
W hen two people chat face to face, they know the conversation's context—each knows, for example , the other person's location, what he or she is doing, who else is nearby, and the room's condition (lighting, sound, and so... more
W hen two people chat face to face, they know the conversation's context—each knows, for example , the other person's location, what he or she is doing, who else is nearby, and the room's condition (lighting, sound, and so forth). There is also significant communication through facial and other nonverbal cues. Knowing such information makes the conversation richer. However, when you chat with someone electronically , you don't get to exchange this type of contex-tual information. Recently, there has been great interest in making applications context aware so that they can adapt to different situations and be more receptive to users' needs. In particular, text-based chat programs could greatly benefit from being context aware. Today's chat programs let users set their status (such as " out to lunch " or " on the phone "), but they generally don't let the two parties exchange any other type of contextual information. Systems such as Babb...
The Information Trust Institute (ITI) at the University of Illinois at Urbana-Champaign is developing an entirely new multidisciplinary undergraduate curriculum on the topic of digital forensics, and this paper presents the findings of... more
The Information Trust Institute (ITI) at the University of Illinois at Urbana-Champaign is developing an entirely new multidisciplinary undergraduate curriculum on the topic of digital forensics, and this paper presents the findings of the development process, including initial results and evaluation of a pilot offering of the coursework to students. The curriculum consists of a four-course sequence, including introductory and advanced lecture courses with parallel laboratory courses, followed by an advanced course. The content has been designed to reflect both the emerging national standards and the strong multidisciplinary character of the profession of digital forensics, and includes modules developed collaboratively by faculty experts in multiple fields of computer science, law, psychology, social sciences, and accountancy. A preliminary plan for the introductory course was presented to a workshop of digital forensics experts in May 2013 and received their strong approval. Pilot...
Ubiquitous computing enables an environment that assimilates digital and physical devices seamlessly and presents a unified programming interface to the user. Users can program the environment similar to programming a computer. With the... more
Ubiquitous computing enables an environment that assimilates digital and physical devices seamlessly and presents a unified programming interface to the user. Users can program the environment similar to programming a computer. With the widespread availability of personal devices and personal area networks, there is a growing need for personal devices to share resources and services among themselves to support complex applications. Middleware support is required for enabling interactions among services and for sharing resources. We introduce Mobile Gaia, a middleware framework to enable the construction of personal ubiquitous computing environments, or personal spaces, which are formed ad hoc using personal devices carried or worn by a person as well as devices that are physically nearby. We discuss the architecture and services of Mobile Gaia and some of the challenges that need to be addressed in this endeavor.
In this paper, we present an overview of our research project with GaiaOS, a middleware operating system that provides a generic computational environment for ubiquitous computing. In addition to an outline of theGaiaOSarchitecture, we... more
In this paper, we present an overview of our research project with GaiaOS, a middleware operating system that provides a generic computational environment for ubiquitous computing. In addition to an outline of theGaiaOSarchitecture, we describe how we address some mobility issues in this infrastructure.
Neural networks allow the implementation of complicated applications such as stock market predictions on low-end PCs. However, the training of neural networks can take many hours on a PC. In this paper we propose a technique for training... more
Neural networks allow the implementation of complicated applications such as stock market predictions on low-end PCs. However, the training of neural networks can take many hours on a PC. In this paper we propose a technique for training complicated neural networks on a commodity GPU (available in a low-end PC) that completes 6 times faster than training on a multi core. Using the Proben1 benchmark for our analysis we use 15 datasets from 12 different domains to explore our solution. Our technique allows the training to be done with minimal CPU utilization time. This allows the user to carry out other tasks while the training is in progress. We compare several avenues of neural network training on a general purpose computer. The benchmark we use, covers problems of pattern classification from real life and hence is best suited for our tests as we aim to solve the problem of stock market predictions.
This work presents the first-ever detailed and large-scale measurement analysis of storage consumption behavior of applications (apps) on smart mobile devices. We start by carrying out a five-year longitudinal static analysis of millions... more
This work presents the first-ever detailed and large-scale measurement analysis of storage consumption behavior of applications (apps) on smart mobile devices. We start by carrying out a five-year longitudinal static analysis of millions of Android apps to study the increase in their sizes over time and identify various sources of app storage consumption. Our study reveals that mobile apps have evolved as large monolithic packages that are packed with features to monetize/engage users and optimized for performance at the cost of redundant storage consumption. We also carry out a mobile storage usage study with 140 Android participants. We built and deployed a lightweight context-aware storage tracing tool, called cosmos, on each participant's device. Leveraging the traces from our user study, we show that only a small fraction of apps/features are actively used and usage is correlated to user context. Our findings suggest a high degree of app feature bloat and unused functionali...
Machine Learning 2020 Organizing Committee is dedicated to gather you all for the International Conference on Computer Science and Machine Learning slated to be held from November 02-03, 2020 at Tokyo, Japan. The conference will revolve... more
Machine Learning 2020 Organizing Committee is dedicated to gather you all for the International Conference on Computer Science and Machine Learning slated to be held from November 02-03, 2020 at Tokyo, Japan. The conference will revolve around the theme by making the world a new place with technology. Machine Learning 2020 gives a main global gathering that unites analysts and experts from diverse fields to investigate the basic roles, interaction also the practical impact of Artificial Intelligence (AI). Computer Graphics & Animation 2020 is distinguished with the attendance of Organizing Committee Members and Editorial Board Members of supporting Journals, Scientists, young and brilliant researchers, business delegates and talented research student communities representing from developed and under developing countries, who made this conference rewarding and fecund. Conference the theme is “Innovate the immersive environments with Computer Graphics and Animation” offering a unique ...
The analysis phase of the digital forensic process is the most complex. This phase grows more complicated as the size and ubiquity of digital devices increase. There are many tools aimed at assisting the investigator in the analysis... more
The analysis phase of the digital forensic process is the most complex. This phase grows more complicated as the size and ubiquity of digital devices increase. There are many tools aimed at assisting the investigator in the analysis process; however, they do not address growing challenges. In this paper, we discuss the application of graph theory, a study of related mathematical structures, to aid in the investigation process of digital forensic examiners. Graph theory is used to study the pairwise relations between objects. We explore how graph theory can be used as a basis for further analysis. We demonstrate the potential of the application of graph theory through its implementation in a case study.
Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of... more
Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed...
Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in... more
Convolutional autoregressive models have recently demonstrated state-of-the-art performance on a number of generation tasks. While fast, parallel training methods have been crucial for their success, generation is typically implemented in a naive fashion where redundant computations are unnecessarily repeated. This results in slow generation, making such models infeasible for production environments. In this work, we describe a method to speed up generation in convolutional autoregressive models. The key idea is to cache hidden states to avoid redundant computation. We apply our fast generation method to the Wavenet and PixelCNN++ models and achieve up to $21\times$ and $183\times$ speedups respectively.
The COrona VIrus Disease (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) has resulted in a challenging number of infections and deaths worldwide. In order to combat the pandemic, several... more
The COrona VIrus Disease (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) has resulted in a challenging number of infections and deaths worldwide. In order to combat the pandemic, several countries worldwide enforced mitigation measures in the forms of lockdowns, social distancing and disinfection measures. In an effort to understand the dynamics of this disease, we propose a Long Short Term Memory (LSTM) based model. We train our model on over three months of cumulative COVID-19 cases and deaths. Our model can be adjusted based on the parameters in order to provide predictions as needed. We provide results at both the country and county levels. We also perform a quantitative comparison of mitigation measures in various counties in the United States based on the rate of difference of a short and long window parameter of the proposed LSTM model. The analyses provided by our model can provide valuable insights based on the trends in the rat...
Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more,... more
Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and predict which actions will lead to desirable outcomes. In this paper, we explore how video prediction models can similarly enable agents to solve Atari games with fewer interactions than model-free methods. We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting. Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environmen...
While there exist many isolation mechanisms that are available to cloud service providers, including virtual machines, containers, etc., the problem of side-channel increases in importance as a remaining security vulnerability,... more
While there exist many isolation mechanisms that are available to cloud service providers, including virtual machines, containers, etc., the problem of side-channel increases in importance as a remaining security vulnerability, particularly in the presence of shared caches and multicore processors. In this paper we present a hardware-software mechanism that improves the isolation of cloud processes in the presence of shared caches on multicore chips. Combining the Intel CAT architecture that enables cache partitioning on the fly with novel scheduling techniques and state cleansing mechanisms, we enable cache-side-channel free computing for Linux-based containers and virtual machines, in particular, those managed by KVM. We do a preliminary evaluation of our system using a CPU bound workload. Our system allows Simultaneous Multithreading (SMT) to remain enabled and does not require application level changes.
To demonstrate compliance with privacy and security principles, information technology (IT) service providers often rely on security standards and certifications. However, the appearance of new service models such as cloud computing has... more
To demonstrate compliance with privacy and security principles, information technology (IT) service providers often rely on security standards and certifications. However, the appearance of new service models such as cloud computing has brought new threats to information assurance, weakening the protection that existing standards can provide. In this study, we analyze four highly regarded IT security standards used to assess, improve, and demonstrate information systems assurance and cloud security. ISO/IEC 27001, SOC 2, C5, and FedRAMP are standards adopted worldwide and constantly updated and improved since the first release of ISO in 2005. We examine their adequacy in addressing current threats to cloud security, and provide an overview of the evolution over the years of their ability to cope with threats and vulnerabilities. By comparing the standards alongside each other, we investigate their complementarity, their redundancies, and the level of protection they offer to informa...
We propose a neural networks-based learning mechanism for tracking in an RFID tag field. As users move through the field to a desired destination, they train localities of t ags, creating digital trails. Later on, users seeking the... more
We propose a neural networks-based learning mechanism for tracking in an RFID tag field. As users move through the field to a desired destination, they train localities of t ags, creating digital trails. Later on, users seeking the destination, but without knowledge of any path, can follow the digital trails. Training information (weights from the neural networks) is stored in the tags. Our system is entirely distributed and robust to failures.
In this paper we propose reactor mirage theory as a deception-based intrusion detection approach for digital I&C systems in nuclear power plants (NPPs). We draw from military deception techniques based on simulation of physical targets... more
In this paper we propose reactor mirage theory as a deception-based intrusion detection approach for digital I&C systems in nuclear power plants (NPPs). We draw from military deception techniques based on simulation of physical targets such as troops, radar-equipped air defense installations, tanks, bridges, airfields, etc. We propose the employment of genuine digital I&C systems to simulate physical components of a NPP via generation of Modbus protocol data units (PDUs) typical to the operation of these components. Communicating finite state machines are used to generate and recognize such deceptive PDUs. Artificially generated Modbus traffic is the reactor mirage theory counterpart of electromagnetic beam reflections, heat emitters, etc., commonly used as deceptive mechanisms by the military in warfare to indicate the existence of physical targets. These deceptive PDUs produce a drastic incrementation of the uncertainty which attackers may be subject to during the selection of tar...
Choices, an object-oriented framework for the design of distributed virtual memory consistency protocols, is presented. It is shown that custom-designed protocols for different applications are easy to construct and use with this... more
Choices, an object-oriented framework for the design of distributed virtual memory consistency protocols, is presented. It is shown that custom-designed protocols for different applications are easy to construct and use with this framework. Consistency protocols are shown to be useful in implementing atomic update and in controlling assignment of pages to processes. Experimental results are presented.<<ETX>>
Pervasive computing environments feature massively distributed systems containing a large number of devices, services and applications that help end-users perform various kinds of tasks. However, these systems are very complex to... more
Pervasive computing environments feature massively distributed systems containing a large number of devices, services and applications that help end-users perform various kinds of tasks. However, these systems are very complex to configure and manage. They are highly dynamic and fault-prone. Another challenge is that since these environments are rich in devices and services, they offer different ways of performing the
Petri nets are used to define a path and process notation which is more general in its ability to express synchronization than previous path notations. The Petri net classes corresponding to the path notation prove to be interesting in... more
Petri nets are used to define a path and process notation which is more general in its ability to express synchronization than previous path notations. The Petri net classes corresponding to the path notation prove to be interesting in their own right and have demonstrable properties such as liveness and safeness.
To identify novel genes associated with ALS, we undertook two lines of investigation. We carried out a genome-wide association study comparing 20,806 ALS cases and 59,804 controls. Independently, we performed a rare variant burden... more
To identify novel genes associated with ALS, we undertook two lines of investigation. We carried out a genome-wide association study comparing 20,806 ALS cases and 59,804 controls. Independently, we performed a rare variant burden analysis comparing 1,138 index familial ALS cases and 19,494 controls. Through both approaches, we identified kinesin family member 5A (KIF5A) as a novel gene associated with ALS. Interestingly, mutations predominantly in the N-terminal motor domain of KIF5A are causative for two neurodegenerative diseases: hereditary spastic paraplegia (SPG10) and Charcot-Marie-Tooth type 2 (CMT2). In contrast, ALS-associated mutations are primarily located at the C-terminal cargo-binding tail domain and patients harboring loss-of-function mutations displayed an extended survival relative to typical ALS cases. Taken together, these results broaden the phenotype spectrum resulting from mutations in KIF5A and strengthen the role of cytoskeletal defects in the pathogenesis o...
The Choices operating system architecture [3, 4, 15] uses class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is... more
The Choices operating system architecture [3, 4, 15] uses class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry Parallel Computing Laboratory at the University of Illinois to study the performance of algorithms, mechanisms, and policies for parallel systems. This paper describes the architectural design and class hierarchy of the Choices memory and secondary storage management system. The mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-offs between response times and storage capacities. In Choices , the notion of a memory hierarchy is represented by layers in which abstract classes define interfaces between and internal to the layers. Concrete subclasses implement new algorithms or data structures or specializations of existing ones. This paper describes the motivation for an obj...
Specification of parallel and distributed systems is difficult because it involves describing a two dimensional relationship between the flow of control of many processes and the synchronized flow of data between those processes. Many... more
Specification of parallel and distributed systems is difficult because it involves describing a two dimensional relationship between the flow of control of many processes and the synchronized flow of data between those processes. Many researchers have turned to graphical representations for such specifications; perhaps one of the most well-known techniques involves Petri nets. However, parallel and distributed systems are often dynamic. The components’ distribution and degree of parallelism vary with time. This is often difficult to express in existing graphical notations. The Actor approach has tried to capture such changing behavior (in a textual form) but can lead to system specifications that lack structure and are therefore prone to error. ln this paper, we introduce a new graph grammar based notation for specifying concurrent and distributed systems. The notation has a formal algebraic basis which both defines the meanings of graphs and forms the foundation for verifying transformations from specifications to code. Specifications show no implementation bias and are therefore independent of many low-level issues which clutter up conventional notations. The use of graphics greatly simplifies the tasks of writing and understanding specifications of concurrent systems.
Genetics has proven to be a powerful approach in neurodegenerative diseases research, resulting in the identification of numerous causal and risk variants. Previously, we introduced the NeuroX Illumina genotyping array, a fast and... more
Genetics has proven to be a powerful approach in neurodegenerative diseases research, resulting in the identification of numerous causal and risk variants. Previously, we introduced the NeuroX Illumina genotyping array, a fast and efficient genotyping platform designed for the investigation of genetic variation in neurodegenerative diseases. Here, we present its updated version, named NeuroChip. The NeuroChip is a low-cost, custom-designed array containing a tagging variant backbone of about 306,670 variants complemented with a manually curated custom content comprised of 179,467 variants implicated in diverse neurological diseases, including Alzheimer's disease, Parkinson's disease, Lewy body dementia, amyotrophic lateral sclerosis, frontotemporal dementia, progressive supranuclear palsy, corticobasal degeneration, and multiple system atrophy. The tagging backbone was chosen because of the low cost and good genome-wide resolution; the custom content can be combined with oth...
Page 1. An Object-Oriented Implementation of Distributed Virtual Memory Gary Johnston RoyCampbell October 5, 1989 Department of Computer Science University of Illinois at Urbana-Champaign 1304 W. Spring eld Ave., Urbana, IL 61801{2987... more
Page 1. An Object-Oriented Implementation of Distributed Virtual Memory Gary Johnston RoyCampbell October 5, 1989 Department of Computer Science University of Illinois at Urbana-Champaign 1304 W. Spring eld Ave., Urbana, IL 61801{2987 Page 2. Outline Concepts. ...
This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test... more
This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator
Currently, there is a substantial availability of low bandwidth wireless networks which can be used to enhance mobile computing. However, to get the same computing service as on the wireless networks, one needs greater bandwidths at... more
Currently, there is a substantial availability of low bandwidth wireless networks which can be used to enhance mobile computing. However, to get the same computing service as on the wireless networks, one needs greater bandwidths at reasonable prices. For the foreseeable future, wide-area wireless networks are not expected to have bandwidth comparable to wireline networks or become much cheaper. This necessitates software techniques to provide users of laptop computers with (i) quick response time, and (ii) quick consistency with the backbone network. The former has been addressed by research in optimistically replicated filesystems, while the latter problem is unsolved. This study substantially reduces the communication requirement needed to achieve quick consistency. As a result, wide area wireless networks and low bandwidth telephone modems can be used for achieving quick synchronization with the backbone network. Using trace-driven simulation and user-level implementation, a preliminary validation of the proposed methods has been done. We postulate that by looking at the events which occur during the execution of processes, it is possible to attack the problem of bandwidth reduction and storage management simultaneously. Finally, we conjecture that automated operating system techniques can be used to substantially reduce the bandwidth needed for downloading digital documents. A full-blown implementation remains to be done.

And 459 more