Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Seth C Goldstein

    International audience<font face="arial, helvetica"&gt<span style="font-size: 13px;"&gtScalable distributed computing with distributed intelligent MEMS</span&gt</font&g
    Nanoelectronics presents the opportunity of incorporating billions of devices into a single system. Its opportunity is also its challenge: the economic design, verification, manufacturing, and testing of billion component systems. In this... more
    Nanoelectronics presents the opportunity of incorporating billions of devices into a single system. Its opportunity is also its challenge: the economic design, verification, manufacturing, and testing of billion component systems. In this presentation I will explore how the abstractions used in computer systems change as we approach nanoscale dimensions.
    Abstract—Motion planning for a self-reconfigurable robot involves coordinating the movement and connectivity of each of its homogeneous modules. Reconfiguration occurs when the shape of the robot changes from some initial configuration to... more
    Abstract—Motion planning for a self-reconfigurable robot involves coordinating the movement and connectivity of each of its homogeneous modules. Reconfiguration occurs when the shape of the robot changes from some initial configuration to a target configuration. Finding an optimal solution to reconfiguration problems involves searching the space of possible robot config-urations. As this space grows exponentially with the number of modules, optimal planning becomes intractable. We propose a hierarchical planning approach that computes heuristic global reconfiguration strategies efficiently. Our approach consists of a base planner that computes an optimal solution for a few modules and a hierarchical planner that calls this base planner or reuses pre-computed plans at each level of the hierarchy to ultimately compute a global suboptimal solution. We present results from a prototype implementation of the method that efficiently plans for self-reconfigurable robots with several thousan...
    We present an improvement to the simultaneous heuristic allocator component of the global pro-gressive register allocator described in our previous work [Koes06]. Our improved allocator de-composes the control flow graph into linear... more
    We present an improvement to the simultaneous heuristic allocator component of the global pro-gressive register allocator described in our previous work [Koes06]. Our improved allocator de-composes the control flow graph into linear traces which are allocated in the same manner as a single basic block. We investigate two methods for handling the control flow within the traces both of which produce better quality allocations than the simultaneous heuristic allocator.
    Declarative programming in the style of functional and logic programming has been hailed as an alternative parallel programming style where computer programs are automatically parallelized without programmer control. Although this... more
    Declarative programming in the style of functional and logic programming has been hailed as an alternative parallel programming style where computer programs are automatically parallelized without programmer control. Although this approach removes many pitfalls of explicit parallel programming, it hides important information about the underlying parallel architecture that could be used to improve the scalability and efficiency of programs. In this paper, we present a novel programming model that allows the programmer to reason about thread state in data-driven declarative programs. This abstraction has been implemented on top of Linear Meld, a linear logic programming language that is designed for writing graphbased programs. We present several programs that show the flavor of our new programming model, including graph algorithms and a machine learning algorithm. Our goal is to show that it is possible to take advantage of architectural details without losing the key advantages of l...
    Modular robots form autonomous distributed systems in which modules use communications to coordinate their activities in order to achieve common goals. The complexity of distributed algorithms is generally expressed as a function of... more
    Modular robots form autonomous distributed systems in which modules use communications to coordinate their activities in order to achieve common goals. The complexity of distributed algorithms is generally expressed as a function of network properties, e.g., the number of nodes, the number of links and the radius/diameter of the system. In this paper, we characterize the networks of some lattice-based modular robots which use only neighbor-to-neighbor communications. We demonstrate that they form sparse and large-diameter networks. Additionally, we provide tight bounds for the radius and the diameter of these networks. We also show that, because of the huge diameter and the huge average distance of massive-scale lattice-based networks, complex distributed algorithms for programmable matter pose a significant design challenge. Indeed, communications over a large number of hops cause, for instance, latency and reliability issues.
    Many distributed algorithms require a specific role to be played by a leader, a single node in the system. The choice of this node often has a direct impact on the performance. In particular, selecting a central node as the leader can... more
    Many distributed algorithms require a specific role to be played by a leader, a single node in the system. The choice of this node often has a direct impact on the performance. In particular, selecting a central node as the leader can significantly improve algorithm efficiency. Classical distributed algorithms require global information about the connectivity network to elect a centroid node. Thus, they are not suitable for large-scale distributed embedded systems with scarce computation, memory and energy resources. We present E2ACE, an Effective and Efficient Approximate-Centroid Election algorithm that uses O(1) memory space per node, O(d) time and O(mn^2) messages of size O(1), where n is the number of nodes, m the number of connections and d the diameter of the system. We evaluate our algorithm on the Blinky Blocks system using simulations. Experimental results show that E2ACE scales well in terms of accuracy, execution time and number of messages. We show that E2ACE is more accurate than the only existing algorithm with similar complexity results.
    In this paper, we propose the Modular Robot Time Protocol (MRTP), a network-wide time synchronization protocol for modular robots. Our protocol achieves its performance by combining several mechanisms: central time master election,... more
    In this paper, we propose the Modular Robot Time Protocol (MRTP), a network-wide time synchronization protocol for modular robots. Our protocol achieves its performance by combining several mechanisms: central time master election, low-level time-stamping and clock skew compensation using linear regression. We evaluate our protocol on the Blinky Blocks hardware. Experimental results show that MRTP can potentially manage real systems composed of up to 27,775 Blinky Blocks. We observe that the synchronization precision depends on the hardware, the hop distance to the time master, the synchronization periods and the number of synchronization points used for the linear regressions. Furthermore, we show that our protocol is able to keep a Blinky Blocks system synchronized to a few milliseconds, using few network resources at runtime, even-though the Blinky Blocks hardware clocks exhibit very poor accuracy and resolution.
    Modular robots are composed of many independent connected modules which are able to achieve common goals through communications. Many distributed algorithms have better performance if the modules that have to communicate with all the... more
    Modular robots are composed of many independent connected modules which are able to achieve common goals through communications. Many distributed algorithms have better performance if the modules that have to communicate with all the others, are placed at the center of the system. In this paper, we propose ABC-Center, an iterative algorithm for electing an approximate-center module in modular robots. ABC-Center uses O(1) space per module and O(kd) time, where k is the number of iterations required to terminate and d the diameter of the system. We evaluated our algorithm both on hardware modular robots and in a simulator for large ensemble of robots. The average expected eccentricity of the module elected by ABC-Center is less than 1.25 blocks off for random systems composed of up to 1000 modules. Furthermore, experiments show that our algorithm terminates after a few iterations. Hence, ABC-Center is scalable and adapted to modular robots with low memory resources.
    Speedups of coupled processor-FPGA systems over traditional microprocessor systems are limited by the cost of hardware reconfiguration. In this paper we compare several new configuration caching algorithms that reduce the latency of... more
    Speedups of coupled processor-FPGA systems over traditional microprocessor systems are limited by the cost of hardware reconfiguration. In this paper we compare several new configuration caching algorithms that reduce the latency of reconfiguration. We also present a cache replacement strategy for a 3-level hierarchy. Using the techniques we present, total latency for loading the configurations is reduced, lowering the configurable overhead.
    We present an improvement to the simultaneous heuristic allocator component of the global progressive register allocator described in our previous work [Koes06]. Our improved allocator decomposes the control flow graph into linear traces... more
    We present an improvement to the simultaneous heuristic allocator component of the global progressive register allocator described in our previous work [Koes06]. Our improved allocator decomposes the control flow graph into linear traces which are allocated in the same manner as a single basic block. We investigate two methods for handling the control flow within the traces both of which produce better quality allocations than the simultaneous heuristic allocator. KEYWORDS: Register Allocation; Progressive Solver 1
    This paper presents a methodology for improving the speed of high-speed adders. As a starting point, a previously proposed method, called “speculative completion, ” is used in which fast-terminating additions are automatically detected.... more
    This paper presents a methodology for improving the speed of high-speed adders. As a starting point, a previously proposed method, called “speculative completion, ” is used in which fast-terminating additions are automatically detected. Unlike the previous design, the method proposed in this paper is able to adapt dynamically to (1) application-specific behavior and (2) to adder-specific behavior, resulting in a higher detection rate of fast additions and, consequently, a faster average-case speed for addition. Our experimental results show detection rates of over 99%, and adder average-case speed improvements of up to 14.8%.
    We examine how modular robots can be used to enable remote robotic construction of planetary and orbital outposts. Each modular robot, called a catom, contains sufficient actuation, adhesion, control, and power to allow it to function as... more
    We examine how modular robots can be used to enable remote robotic construction of planetary and orbital outposts. Each modular robot, called a catom, contains sufficient actuation, adhesion, control, and power to allow it to function as part of an ensemble of similar units. We describe the catom design and construction as well as initial experiments carried out to verify the system.
    In this paper we introduce a reconfigurable architecture based on chemically assembled electronic nanotechnol-ogy (CAEN). CAEN is a promising alternative to CMOS that takes advantage of chemical synthesis techniques to combine... more
    In this paper we introduce a reconfigurable architecture based on chemically assembled electronic nanotechnol-ogy (CAEN). CAEN is a promising alternative to CMOS that takes advantage of chemical synthesis techniques to combine molecular-scale circuit elements (such as resis-tors, diodes, and reconfigurable switches) using directed self-assembly. A serious drawback to CAEN is the in-ability (in the near-term) to include three-terminal devices (e.g., transistors) in circuits. Here, we present a molec-ular latch, based on molecular resonant tunneling diodes (RTDs) [6], which provides the most important benefits of the lithographically fabricated transistor: voltage restora-tion, fan-out, tolerance to manufacturing variability, and I/O isolation. To make CAEN circuits economically viable they must
    this paper we outline the computational cache architecture and execution model. While our current effort is motivated towards creating scalable silicon-based systems we believe the core results of our research -- the algorithms and... more
    this paper we outline the computational cache architecture and execution model. While our current effort is motivated towards creating scalable silicon-based systems we believe the core results of our research -- the algorithms and techniques for instruction placement, node design, computation and communication will be of interest to the nanoscale device research community. While our two domains (silicon and non-silicon) are currently orders of magnitude apart in terms of scale, they are approaching each other. This raises many exciting research possibilities for both disciplines. The first part of this paper (Sections 2 and 3) will outline the silicon-based architecture we are currently designing. Section 4 discusses the convergence of scaling trends between silicon and non-silicon devices and how the computation cache architecture applies to both. We follow this with Section 5 which describes some of the modifications necessary for molecular level implementation. Next in Section 6...
    In the Claytronics project, we have used Meld, a logic programming language suitable for writing scalable and concise distributed programs for ensembles. Meld allows declarative code to be compiled into distributed code that can be... more
    In the Claytronics project, we have used Meld, a logic programming language suitable for writing scalable and concise distributed programs for ensembles. Meld allows declarative code to be compiled into distributed code that can be executed in thousands of computing units. We are now using Meld to program more traditional algorithms that run on multicore machines. We made several modifications to the core language, to the compiler and to the runtime system to take advantage of the characteristics of the target architecture. Our experimental results show that the new compiler and runtime system are capable of exploiting implicit parallelism in programs such as graph algorithms, neural networks and belief propagation algorithms.
    Fault tolerance is becoming an increasingly important issue, especially in mission-critical applications where data integrity is a paramount concern. Performance, however, remains a large driving force in the market place. Runtime... more
    Fault tolerance is becoming an increasingly important issue, especially in mission-critical applications where data integrity is a paramount concern. Performance, however, remains a large driving force in the market place. Runtime reconjigurable hardware architectures have the power to balance fault tolerance with performance, allowing the amount of fault tolerance to be tuned at run-time. This paper describes a new built-in self-test designed to run on, and take advantage oJ runtime reconjigurable architectures, using the PipeRench architecture as a model. In addition, this paper introduces a new metric by which a user can set the desired fault tolerance of a runtime reconjigurable device. 1.
    Abstract—Distributed reconfiguration is an important prob-lem in multi-robot systems such as mobile sensor nets and metamorphic robot systems. In this work, we present a scalable distributed reconfiguration algorithm, Hierarchical Median... more
    Abstract—Distributed reconfiguration is an important prob-lem in multi-robot systems such as mobile sensor nets and metamorphic robot systems. In this work, we present a scalable distributed reconfiguration algorithm, Hierarchical Median De-composition, to achieve arbitrary target configurations. Our algorithm is built on top of a novel distributed median consensus estimator. The algorithms presented are fully distributed and do not require global communication. We show results from simulations in an open source multi-robot simulator. I.
    Claytronics is a form a programmable matter that takes the concept of modular robots to a new extreme. The concept of modular robots has been around for some time. (See [14] for a survey.) Previous approaches to modular robotics sought to... more
    Claytronics is a form a programmable matter that takes the concept of modular robots to a new extreme. The concept of modular robots has been around for some time. (See [14] for a survey.) Previous approaches to modular robotics sought to create an ensemble of tens or even hundreds of small autonomous robots which could, through coordination, achieve a global effect not possible by any single unit. In general the goal of these projects was to adapt to the environment to facilitate, for example, improved locomotion. Our work on claytronics departs from previous work in several important ways. One of the primary goals of claytronics is to form the basis for a new media type, pario. Pario, a logical extension of audio and video, is a media type used to reproduce moving 3D objects in the real world. A direct result of our goal is that claytronics must scale to millions of micron-scale units. Having scaling (both in number and size) as a primary design goal impacts the work significantly...
    This research was sponsored by the National Aeronautics and Space Administration (NASA) under grant nos. NAG2-1230 and NAG2-6054, and by a generous donation from the Intel Corporation. The views and conclusions contained herein are those... more
    This research was sponsored by the National Aeronautics and Space Administration (NASA) under grant nos. NAG2-1230 and NAG2-6054, and by a generous donation from the Intel Corporation. The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of any sponsoring party or the U.S. Government.
    Abstract—This paper presents Meld, a programming lan-guage for modular robots, i.e., for independently executing robots where inter-robot communication is limited to immediate neighbors. Meld is a declarative language, based on P2, a... more
    Abstract—This paper presents Meld, a programming lan-guage for modular robots, i.e., for independently executing robots where inter-robot communication is limited to immediate neighbors. Meld is a declarative language, based on P2, a logic-programming language originally designed for programming overlay networks. By using logic programming, the code for an ensemble of robots can be written from a global perspective, as opposed to a large collection of independent robot views. This greatly simplifies the thought process needed for programming large ensembles. Initial experience shows that this also leads to a considerable reduction in code size and complexity. An initial implementation of Meld has been completed and has been used to demonstrate its effectiveness in the Claytronics simulator. Early results indicate that Meld programs are con-siderably more concise (more than 20x shorter) than programs written in C++, while running nearly as efficiently. I.
    Abstract. Speedups of coupled processor-FPGA systems over tradi-tional microprocessor systems are limited by the cost of hardware recon-guration. In this paper we compare several new conguration caching algorithms that reduce the latency... more
    Abstract. Speedups of coupled processor-FPGA systems over tradi-tional microprocessor systems are limited by the cost of hardware recon-guration. In this paper we compare several new conguration caching algorithms that reduce the latency of reconguration. We also present a cache replacement strategy for a 3-level hierarchy. Using the techniques we present, total latency for loading the congurations is reduced, low-ering the congurable overhead. 1
    We present a high-level language for programming modular robotic systems, based on locally distributed predicates (LDP), which are distributed conditions that hold for a connected subensemble of the robotic system. An LDP program is a... more
    We present a high-level language for programming modular robotic systems, based on locally distributed predicates (LDP), which are distributed conditions that hold for a connected subensemble of the robotic system. An LDP program is a collection of LDPs with associated actions which are triggered on any subensemble that matches the predicate. The result is a reactive programming language which efficiently and concisely supports ensemble-level programming. We demonstrate the utility of LDP by implementing three common, but diverse, modular robotic tasks.
    — Internal localization, the problem of estimating relative pose for each module (part) of a modular robot is a prerequisite for many shape control, locomotion, and actuation algorithms. In this paper, we propose a robust hierarchical... more
    — Internal localization, the problem of estimating relative pose for each module (part) of a modular robot is a prerequisite for many shape control, locomotion, and actuation algorithms. In this paper, we propose a robust hierarchical approach that uses normalized cut to identify dense subregions with small mutual localization error, then progressively merges those subregions to localize the entire ensemble. Our method works well in both 2D and 3D, and requires neither exact measurements nor rigid inter-module connectors. Most of the computations in our method can be effectively distributed. The result is a robust algorithm that scales to large, non-homogeneous ensembles. We evaluate our algorithm in accurate 2D and 3D simulations of scenarios with up to 10,000 modules. I.
    Optical photolithography techniques are approaching physical and economic limits that will drastically reduce the current scaling rate of device miniaturization. New technologies being investigated include Next Generation Lithography... more
    Optical photolithography techniques are approaching physical and economic limits that will drastically reduce the current scaling rate of device miniaturization. New technologies being investigated include Next Generation Lithography (NGL) and Chemically Assembled Electronic Nanotechnology (CAEN), which hold the promise of extremely high densities and sub-10nm feature sizes. However, these technologies are likely to have significantly higher defect densities than current ones. This is especially true for CAEN-based devices: we expect the very nature of chemical fabrication to result in defect densities of as much as 10%. Such high defect densities require a completely new approach to manufacturing computational devices: since every chip is expected to have multiple defects, it will no longer be possible to test them and throw defective ones away. Instead, we will have to devise a way to use defective chips. A natural solution is suggested by reconfigurable fabrics, i.e
    : This paper presents Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In this... more
    : This paper presents Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In this paper, the authors present Pegasus (Predicated Explicit GAted Simple Uniform SSA), a new intermediate representation (IR) that makes explicit -- in a single representation -- the control flow, the data flow, and the synchronization of operations that interfere through side-effects. More importantly, Pegasus has a clean semantics, independent of the target architecture. This enables its use to bridge the gap between C programs and hardware implementations, enabling the conversion from an imperative, single-threaded model of computation to a highly parallel, asynchronous, explicitly synchronized target. Pegasus combines predicated static single assignment (SSA) representation and gated SSA, making explicit the switching of data values, enabling the compil...
    Chemically assembled electronic nanotechnology (CAEN) is a promising alternative to CMOS-based computing. However, CAEN-based circuits are expected to have huge defect densities. To solve this problem CAEN can be used to build... more
    Chemically assembled electronic nanotechnology (CAEN) is a promising alternative to CMOS-based computing. However, CAEN-based circuits are expected to have huge defect densities. To solve this problem CAEN can be used to build reconfigurable fabrics which, assuming the defects can be found, are inherently defect tolerant. In this paper, we propose a scalable testing methodology for finding defects in reconfigurable devices.

    And 228 more