Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Jose A Gregorio

    ABSTRACT
    Page 1. Petri Net Modeling of Intemonnection Networks for Massively Pamllel Amhitectums. JAGregorio, F. Vallejo, R. Beivide and C. Carrion Departamento de Electr6nica Universidad de Cantabria 39005 Santander-Spain e-mail: ja@ ctrhp3.... more
    Page 1. Petri Net Modeling of Intemonnection Networks for Massively Pamllel Amhitectums. JAGregorio, F. Vallejo, R. Beivide and C. Carrion Departamento de Electr6nica Universidad de Cantabria 39005 Santander-Spain e-mail: ja@ ctrhp3. unican. es Abstract. ...
    The bubble algorithm evaluated in this paper assures message deadlock freedom in k-ary, n-cube network without using virtual channels. This algorithm is based both on a dimension order I outing (DOR) and on a restricted injection policy... more
    The bubble algorithm evaluated in this paper assures message deadlock freedom in k-ary, n-cube network without using virtual channels. This algorithm is based both on a dimension order I outing (DOR) and on a restricted injection policy extended to the dimension changes. An exhaustive comparison between the bubble mechanism and the classical deterministic virtual channels solution is presented here. For that purpose, the message router of both proposals has been designed by using VHDL descriptions and the Synopsys VLSI CAD tool. Additionally, formal models of the routers, based on colored Petri nets, have been carried out together with simulation techniques in order to assure the validation of the results and shorten the design cycle. The performance evaluation of n-dimension tori highlights the benefits of the bubble algorithm as both the temporal delay and the necessary silicon area of the message router are reduced.
    Research Interests:
    The analysis, design and evaluation of the interconnection subsystem for massively parallel architectures is normally carried out using computer simulation tools, requiring elevated computational costs. Moreover, in some cases, these... more
    The analysis, design and evaluation of the interconnection subsystem for massively parallel architectures is normally carried out using computer simulation tools, requiring elevated computational costs. Moreover, in some cases, these simulation processes show serious difficulties when both experiments and results have to be reproduced by other research or design teams. This work shows the suitability of the use of formal representation methods, like DSPN (stochastic Petri nets with deterministic and exponential firing times), for the description of the message routers, focusing on two important features. Firstly, the possibility of obtaining network performance indicators through the simulation of the obtained models with a lower computational cost than using conventional techniques; in some cases, analytical results can also be obtained. And secondly, making the basic parameters of the network design relatively independent of the router implementation features, thus simplifying the method of establishing the behavior of new router structures. This approach has been successfully applied to the analysis of both symmetrical torus and asymmetrical mesh interconnection topologies, with virtual cut-through flow control, oblivious routing and random traffic. It should be noted that most modern parallel computers employ a local buffer space big enough to store at least a complete packet. Two different functional router structures have been used in each case: transit buffers located at the input or at the output router links.
    A comparative study is presented of the effect of slew-rate-induced distortion on main single-amplifier biquadratic stages, Evidence is presented to show how only one of those stages is free of regenerative phenomena for all input... more
    A comparative study is presented of the effect of slew-rate-induced distortion on main single-amplifier biquadratic stages, Evidence is presented to show how only one of those stages is free of regenerative phenomena for all input conditions. A normalized comparison criterion is defined that can be used to establish the differences between the various biquadratic stages independently of the input-output transfer functions implemented in each filter. This criterion can be used to obtain the precise operating conditions that will guarantee a regenerative-phenomena-free linear response in each of the different structures.
    Page 1. IEEE TRANSACTIONS ON SOITWARE ENGINEERING, VOL. 18, NO. 1, JANUARY 1992 55 Performance Evaluation of Parallel Systems by Using Unbounded Generalized Stochastic Petri Nets Mercedes Granda, JosC M. Drake, and JosC A. Gregorio ...
    ... Cuando el número de procesadores es más alto, la nula escalabilidad en el ancho de banda del bus hace necesario utilizar redes de interconexión ... astar hmmer lbm bt cg ft is lu mg sp ua apache jbb zeus blacks cann fluid swapt AVG... more
    ... Cuando el número de procesadores es más alto, la nula escalabilidad en el ancho de banda del bus hace necesario utilizar redes de interconexión ... astar hmmer lbm bt cg ft is lu mg sp ua apache jbb zeus blacks cann fluid swapt AVG ... [15] H. Jin, M. Frumkin, and J. Yan, “The ...
    Page 1. Towards a Shared/Private Non-Uniform Cache Architecture in CMP Systems Javier Merino∗,1, Valentín Puente∗,1, Pablo Prieto∗,1, José Ángel Gregorio∗,1 ∗ Grupo de Arquitectura y Tecnología de Computadores, Universidad de Cantabria.... more
    Page 1. Towards a Shared/Private Non-Uniform Cache Architecture in CMP Systems Javier Merino∗,1, Valentín Puente∗,1, Pablo Prieto∗,1, José Ángel Gregorio∗,1 ∗ Grupo de Arquitectura y Tecnología de Computadores, Universidad de Cantabria. ABSTRACT ...
    ... “Energy Scalability of On-Chip Interconnection Networks in Multicore Architectures”, MIT CSAIL Technical Report, November, 2007 [11] PS Magnusson, M. Christensson, J. Eskilson, D. Forsgren, F. Larsson, A. Moestedt, B. Werner, “Simics:... more
    ... “Energy Scalability of On-Chip Interconnection Networks in Multicore Architectures”, MIT CSAIL Technical Report, November, 2007 [11] PS Magnusson, M. Christensson, J. Eskilson, D. Forsgren, F. Larsson, A. Moestedt, B. Werner, “Simics: A Full System Simulation Platform”. ...
    In this paper we develop a new and generic theory about the necessary and sufficient conditions for deadlock-free routing in the interconnection networks An extension of the channel dependency graph described by Dally is defined, the... more
    In this paper we develop a new and generic theory about the necessary and sufficient conditions for deadlock-free routing in the interconnection networks An extension of the channel dependency graph described by Dally is defined, the channel dynamic dependency graph. The main achievement of this new concept is consecuence of introducing the concept of time and the flow control function in its definition. Our theory remains valid for different routing and flow control functions showing that even if Duato’s theorem conditions are not fulfilled the network can be deadlock-free. Index Terms Multicomputer networks, deadlock, flow control, routing
    Research Interests:
    The impact of any new architectural proposal must be evaluated under realistic working conditions. This class of analysis requires trustworthy simulation tools and representative workloads that allow us to know the real effectiveness of... more
    The impact of any new architectural proposal must be evaluated under realistic working conditions. This class of analysis requires trustworthy simulation tools and representative workloads that allow us to know the real effectiveness of the improvement. We propose a methodology that enable the use of an important family of transactional workloads, such as decision support system workloads, in a full system simulator. In contrast to numerical applications, with this type of workload it is not possible to scale down the problem size in order to reduce the computational requirements of the simulation. We will show the stationary behaviour of the workload and how it can be employed to reduce computational requirements without significant loss. Taking into account this fact, we will show how simulating only 3% of the benchmark the maximum error in the main system performance metrics is approximately 5%.
    Research Interests:
    The trend towards increasing the number of processor cores and cache capacity in future Chip-Multiprocessors (CMPs), will require scalable packet-switched interconnection networks adapted to the restrictions imposed by the CMP... more
    The trend towards increasing the number of processor cores and cache capacity in future Chip-Multiprocessors (CMPs), will require scalable packet-switched interconnection networks adapted to the restrictions imposed by the CMP environment. This paper presents an innovative router design, which successfully addresses CMP cost/performance constraints. The router structure is based on two independent rings, which force packets to circulate either clockwise or anti-clockwise, traveling through every port of the router. It uses a completely decentralized scheduling scheme, which allows the design to: (1) take advantage of wide links, (2) reduce Head of Line blocking, (3) use adaptive routing, (4) be topology agnostic, (5) scale with network degree, and (6) have reasonable power consumption and implementation cost. A thorough comparative performance analysis against competitive conventional routers shows an advantage for our proposal of up to 50 % in terms of raw performance and nearly 60...
    This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces a feasible way to allocate cache blocks according to the... more
    This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces a feasible way to allocate cache blocks according to the access pattern. Each L2 bank is dynamically partitioned at set level in private and shared content. Simply by adjusting the replacement algorithm, we can place private data closer to its owner processor. In contrast, independently of the accessing processor, shared data is always placed in the same position. This approach is capable of reducing on-chip latency without significantly sacrificing hit rates or increasing implementation cost of a conventional static NUCA. Additionally, most of the unnecessary interference between cores in private accesses is removed. To support the architectural decisions adopted and provide a comparative study, a comprehensive evaluation framework is employed. The workbench is composed of a full system simulator, and a rep...
    1 Resumen—En este trabajo se han analizado algunas posibilidades de aceleración de los protocolos de coherencia empleando el conocimiento que la red de interconexión tiene o puede tener sobre los mensajes que circulan por los sistemas... more
    1 Resumen—En este trabajo se han analizado algunas posibilidades de aceleración de los protocolos de coherencia empleando el conocimiento que la red de interconexión tiene o puede tener sobre los mensajes que circulan por los sistemas multiprocesador. Se ha determinado el número de saltos de los mensajes necesarios para mantener la coherencia empleando un entorno de simulación basado en el

    And 23 more