To meet the various requirements of cloud computing users, research on guaranteeing Quality of Se... more To meet the various requirements of cloud computing users, research on guaranteeing Quality of Service (QoS) is gaining widespread attention in the field of cloud computing. However, as cloud computing platforms adopt virtualization as an enabling technology, it becomes challenging to distribute system resources to each user according to the diverse requirements. Although ample research has been conducted in order to meet QoS requirements, the proposed solutions lack simultaneous support for multiple policies, degrade the aggregated throughput of network resources, and incur CPU overhead. In this paper, we propose a new mechanism, called ANCS (Advanced Network Credit Scheduler), to guarantee QoS through dynamic allocation of network resources in virtualization. To meet the various network demands of cloud users, ANCS aims to concurrently provide multiple performance policies; these include weight-based proportional sharing, minimum bandwidth reservation, and maximum bandwidth limita...
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015
Enterprise servers require customized solid-state drives (SSDs) to satisfy their specialized I/O ... more Enterprise servers require customized solid-state drives (SSDs) to satisfy their specialized I/O performance and reliability requirements. For effective use of SSDs for enterprise purposes, SSDs must be designed considering requirements such as those related to performance, lifetime, and cost constraints. However, SSDs have numerous hardware and software design options, such as flash memory types and block allocation methods, which have not been well analyzed yet, but on which the SSD performance depends. Furthermore, there is no methodology for determining the optimal design for a particular I/O workload. This paper proposes SSD-Tailor, a customization tool for SSDs. SSD-Tailor determines a near-optimal set of design options for a given workload. SSD designers can use SSD-Tailor to customize SSDs in the early design stage to meet the customer requirements. We evaluate SSD-Tailor with nine I/O workload traces collected from real-world enterprise servers. We observe that SSD-Tailor finds near-optimal SSD designs for these workloads by exploring only about 1% of the entire set of design candidates. We also show that the near-optimal designs increase the average I/O operations per second by up to 17% and decrease the average response time by up to 163% as compared to an SSD with a general design.
2015 IEEE International Conference on Consumer Electronics (ICCE), 2015
Solid state drives (SSDs) are becoming increasingly popular in computing environment that employ ... more Solid state drives (SSDs) are becoming increasingly popular in computing environment that employ virtualization. The Xen hypervisor is the most popular hypervisor for virtualization. In this paper, we discuss the performance of SSDs on the Xen hypervisor, show that the Xen I/O model causes performance degradation in SSDs, and analyze the reasons for the degradation. We then propose a new I/O processing method, clustering of fragmented I/O requests, which improves the I/O performance of SSDs. The evaluations indicate that our proposed method improves the I/O performance by as much as 7%.
2014 IEEE International Conference on Consumer Electronics (ICCE), 2014
In this paper, we propose the intelligent Rate Determinate Algorithm (iRDA) for the content centr... more In this paper, we propose the intelligent Rate Determinate Algorithm (iRDA) for the content centric network (CCN) to provide a seamless video streaming service under fluctuating network bandwidth. iRDA has a method to restart the bit-rate selection of current segment deadline when the threshold value. Compared to the standard, dynamic adaptive streaming over HTTP (DASH), iRDA reduces the amount of freezes time.
To solve above problems, we propose the autonomic defragmentation file system without a degradati... more To solve above problems, we propose the autonomic defragmentation file system without a degradation of system performance. At first, we design the Automatic Layout Scoring(ALS) system which is used for measuring the fragmentation ratio of the files. The ALS counts the number of contiguous blocks in a file when the blocks are allocated for the file. We can recognize the fragmented files with ALS. After the detection, we search the free and contiguous blocks for defragment the file. If searching succeeds, we copy the file at idle time. To reduce the effect of additional I/O operations, we chase the idle time of the hard disk drive which contains the file. When the idle time is found, we copy a small portion of file. Because our goal is minimizing the effect of additional I/O, we divide the copying process into several pieces. The size of a piece is same with a maximum size of one I/O request for the block device, for example, 64KB. When the file copy is completed, we edit the block in...
It is well known that TCP shows performance degradation over wired/wireless networks since TCP re... more It is well known that TCP shows performance degradation over wired/wireless networks since TCP regards packet loss as network congestion. TCP Veno has successfully addressed this fundamental problem by proposing an end-to-end loss differentiation algorithm, which distinguishes the cause of packet loss by the number of packets in the router buffer. Unfortunately, Veno's algorithm shows very low accuracy on small buffer routers that are emerging recently as a new challenge for Internet routers. In the small buffer networks, routers can overflow easily in a short time although Veno diagnoses wireless loss, and this leads to the failure in loss differentiation. Furthermore, when congestion loss is misdiagnosed as wireless loss, Veno shows poor Reno-friendliness since it can increase sending rate by setting ssthresh to a larger value than Reno. In this paper, we propose a more accurate loss differentiation algorithm for small buffer heterogeneous networks. Our algorithm accurately distinguishes wireless and wired packet loss by newly defining Congestive rate. Through extensive network simulations, we confirm that our new TCP Feno achieves not only higher accuracy, but also better Reno-friendliness while not losing performance efficiency.
This paper addresses the I/O latency issue within Xen-ARM. Although Xen-ARM's... more This paper addresses the I/O latency issue within Xen-ARM. Although Xen-ARM's split driver presents reliable driver isolation, it requires additional inter-VM scheduling. Consequently, the credit scheduler within Xen-ARM results in unsatisfactory I/O latency for real-time guest OS. This paper analyzes the I/O latency in Xen-ARM's interrupt path, and proposes a new scheduler to bound I/O latency. Our scheduler dynamically assigns
ABSTRACT Hybrid genetic algorithms are presented that use constrained heuristic search and geneti... more ABSTRACT Hybrid genetic algorithms are presented that use constrained heuristic search and genetic techniques for the timetabling problem (TP).The TP is an NP-hard problem for which a general polynomial time deterministic algorithm is not known. The paper describes the classification of constraints and the constraint ordering to obtain the minimization of backtracking and the maximization of parallelism.The school timetabling problem is discussed in detail as a case study. The genetic algorithm approach is particularly well suited to this kind of problem, since there exists an easy way to assess a good timetable, but not a well structured automatic technique for constructing it. So, a population of timetables is created that evolves toward the best solution. The evaluation function and the genetic operators are well separated from the domain-specific parts, such as the knowledge of the problem and the heuristics, i.e. from the timetable builder.The present paper illustrates an approach based on the hybridization of constrained heuristic search with novel genetic algorithm techniques. It compares favourably with known programs to solve decision problems under logic constraints. The cost of the new algorithm and the quality of the solutions obtained in significant experiments are reported.
Although extremely high-speed interconnects are available today, the traditional protocol stacks ... more Although extremely high-speed interconnects are available today, the traditional protocol stacks such as TCP/IP and UDP/IP are not able to utilize the maximum network bandwidth due to inherent overheads in the protocol stacks. Such overheads are a big obstacle for high-performance computing applications to exploit highspeed interconnects in cluster environments. To address this issue, many researchers have been presenting analyses of protocol overheads and suggesting a number of optimization approaches to harness the TCP/IP suite over high-speed interconnects. However, to the best of our knowledge, there is no study that analyzes and optimizes the protocol overheads thoroughly in an integrated manner. In this paper, we exploit a set of protocol optimization mechanisms in an integrated manner while dealing with the full spectrum of the protocol layers from the transport layer to the physical layer. To evaluate the impact of each protocol overhead, we apply the optimization mechanisms one by one and perform detailed analyses at each step. The thorough overhead measurements and analyses reveal the dependencies between protocol overheads. With our comprehensive optimizations, we show that UDP/IP can utilize more than 95% of the maximum network throughput a Myrinet-based experimental system can provide.
IEICE Transactions on Information and Systems, 2012
ABSTRACT Facing practical limits to increasing processor frequencies, manufacturers have resorted... more ABSTRACT Facing practical limits to increasing processor frequencies, manufacturers have resorted to multi-core designs in their commercial products. In multi-core implementations, cores in a physical package share the last-level caches to improve inter-core communication. To efficiently exploit this facility, operating systems must employ cache-aware schedulers. Unfortunately, virtualization software, which is a foundation technology of cloud computing, is not yet cache-aware or does not fully exploit the locality of the last-level caches. In this paper, we propose a cache-aware virtual machine scheduler for multi-core architectures. The proposed scheduler exploits the locality of the last-level caches to improve the performance of concurrent applications running on virtual machines. For this purpose, we provide a space-partitioning algorithm that migrates and clusters communicating virtual CPUs (VCPUs) in the same cache domain. Second, we provide a time-partitioning algorithm that co-schedules or schedules in sequence clustered VCPUs. Finally, we present a theoretical analysis that proves our scheduling algorithm is more efficient in supporting concurrent applications than the default credit scheduler in Xen. We implemented our virtual machine scheduler in the recent Xen hypervisor with para-virtualized Linux-based operating systems. We show that our approach can improve performance of concurrent virtual machines by up to 19% compared to the credit scheduler.
To meet the various requirements of cloud computing users, research on guaranteeing Quality of Se... more To meet the various requirements of cloud computing users, research on guaranteeing Quality of Service (QoS) is gaining widespread attention in the field of cloud computing. However, as cloud computing platforms adopt virtualization as an enabling technology, it becomes challenging to distribute system resources to each user according to the diverse requirements. Although ample research has been conducted in order to meet QoS requirements, the proposed solutions lack simultaneous support for multiple policies, degrade the aggregated throughput of network resources, and incur CPU overhead. In this paper, we propose a new mechanism, called ANCS (Advanced Network Credit Scheduler), to guarantee QoS through dynamic allocation of network resources in virtualization. To meet the various network demands of cloud users, ANCS aims to concurrently provide multiple performance policies; these include weight-based proportional sharing, minimum bandwidth reservation, and maximum bandwidth limita...
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015
Enterprise servers require customized solid-state drives (SSDs) to satisfy their specialized I/O ... more Enterprise servers require customized solid-state drives (SSDs) to satisfy their specialized I/O performance and reliability requirements. For effective use of SSDs for enterprise purposes, SSDs must be designed considering requirements such as those related to performance, lifetime, and cost constraints. However, SSDs have numerous hardware and software design options, such as flash memory types and block allocation methods, which have not been well analyzed yet, but on which the SSD performance depends. Furthermore, there is no methodology for determining the optimal design for a particular I/O workload. This paper proposes SSD-Tailor, a customization tool for SSDs. SSD-Tailor determines a near-optimal set of design options for a given workload. SSD designers can use SSD-Tailor to customize SSDs in the early design stage to meet the customer requirements. We evaluate SSD-Tailor with nine I/O workload traces collected from real-world enterprise servers. We observe that SSD-Tailor finds near-optimal SSD designs for these workloads by exploring only about 1% of the entire set of design candidates. We also show that the near-optimal designs increase the average I/O operations per second by up to 17% and decrease the average response time by up to 163% as compared to an SSD with a general design.
2015 IEEE International Conference on Consumer Electronics (ICCE), 2015
Solid state drives (SSDs) are becoming increasingly popular in computing environment that employ ... more Solid state drives (SSDs) are becoming increasingly popular in computing environment that employ virtualization. The Xen hypervisor is the most popular hypervisor for virtualization. In this paper, we discuss the performance of SSDs on the Xen hypervisor, show that the Xen I/O model causes performance degradation in SSDs, and analyze the reasons for the degradation. We then propose a new I/O processing method, clustering of fragmented I/O requests, which improves the I/O performance of SSDs. The evaluations indicate that our proposed method improves the I/O performance by as much as 7%.
2014 IEEE International Conference on Consumer Electronics (ICCE), 2014
In this paper, we propose the intelligent Rate Determinate Algorithm (iRDA) for the content centr... more In this paper, we propose the intelligent Rate Determinate Algorithm (iRDA) for the content centric network (CCN) to provide a seamless video streaming service under fluctuating network bandwidth. iRDA has a method to restart the bit-rate selection of current segment deadline when the threshold value. Compared to the standard, dynamic adaptive streaming over HTTP (DASH), iRDA reduces the amount of freezes time.
To solve above problems, we propose the autonomic defragmentation file system without a degradati... more To solve above problems, we propose the autonomic defragmentation file system without a degradation of system performance. At first, we design the Automatic Layout Scoring(ALS) system which is used for measuring the fragmentation ratio of the files. The ALS counts the number of contiguous blocks in a file when the blocks are allocated for the file. We can recognize the fragmented files with ALS. After the detection, we search the free and contiguous blocks for defragment the file. If searching succeeds, we copy the file at idle time. To reduce the effect of additional I/O operations, we chase the idle time of the hard disk drive which contains the file. When the idle time is found, we copy a small portion of file. Because our goal is minimizing the effect of additional I/O, we divide the copying process into several pieces. The size of a piece is same with a maximum size of one I/O request for the block device, for example, 64KB. When the file copy is completed, we edit the block in...
It is well known that TCP shows performance degradation over wired/wireless networks since TCP re... more It is well known that TCP shows performance degradation over wired/wireless networks since TCP regards packet loss as network congestion. TCP Veno has successfully addressed this fundamental problem by proposing an end-to-end loss differentiation algorithm, which distinguishes the cause of packet loss by the number of packets in the router buffer. Unfortunately, Veno's algorithm shows very low accuracy on small buffer routers that are emerging recently as a new challenge for Internet routers. In the small buffer networks, routers can overflow easily in a short time although Veno diagnoses wireless loss, and this leads to the failure in loss differentiation. Furthermore, when congestion loss is misdiagnosed as wireless loss, Veno shows poor Reno-friendliness since it can increase sending rate by setting ssthresh to a larger value than Reno. In this paper, we propose a more accurate loss differentiation algorithm for small buffer heterogeneous networks. Our algorithm accurately distinguishes wireless and wired packet loss by newly defining Congestive rate. Through extensive network simulations, we confirm that our new TCP Feno achieves not only higher accuracy, but also better Reno-friendliness while not losing performance efficiency.
This paper addresses the I/O latency issue within Xen-ARM. Although Xen-ARM's... more This paper addresses the I/O latency issue within Xen-ARM. Although Xen-ARM's split driver presents reliable driver isolation, it requires additional inter-VM scheduling. Consequently, the credit scheduler within Xen-ARM results in unsatisfactory I/O latency for real-time guest OS. This paper analyzes the I/O latency in Xen-ARM's interrupt path, and proposes a new scheduler to bound I/O latency. Our scheduler dynamically assigns
ABSTRACT Hybrid genetic algorithms are presented that use constrained heuristic search and geneti... more ABSTRACT Hybrid genetic algorithms are presented that use constrained heuristic search and genetic techniques for the timetabling problem (TP).The TP is an NP-hard problem for which a general polynomial time deterministic algorithm is not known. The paper describes the classification of constraints and the constraint ordering to obtain the minimization of backtracking and the maximization of parallelism.The school timetabling problem is discussed in detail as a case study. The genetic algorithm approach is particularly well suited to this kind of problem, since there exists an easy way to assess a good timetable, but not a well structured automatic technique for constructing it. So, a population of timetables is created that evolves toward the best solution. The evaluation function and the genetic operators are well separated from the domain-specific parts, such as the knowledge of the problem and the heuristics, i.e. from the timetable builder.The present paper illustrates an approach based on the hybridization of constrained heuristic search with novel genetic algorithm techniques. It compares favourably with known programs to solve decision problems under logic constraints. The cost of the new algorithm and the quality of the solutions obtained in significant experiments are reported.
Although extremely high-speed interconnects are available today, the traditional protocol stacks ... more Although extremely high-speed interconnects are available today, the traditional protocol stacks such as TCP/IP and UDP/IP are not able to utilize the maximum network bandwidth due to inherent overheads in the protocol stacks. Such overheads are a big obstacle for high-performance computing applications to exploit highspeed interconnects in cluster environments. To address this issue, many researchers have been presenting analyses of protocol overheads and suggesting a number of optimization approaches to harness the TCP/IP suite over high-speed interconnects. However, to the best of our knowledge, there is no study that analyzes and optimizes the protocol overheads thoroughly in an integrated manner. In this paper, we exploit a set of protocol optimization mechanisms in an integrated manner while dealing with the full spectrum of the protocol layers from the transport layer to the physical layer. To evaluate the impact of each protocol overhead, we apply the optimization mechanisms one by one and perform detailed analyses at each step. The thorough overhead measurements and analyses reveal the dependencies between protocol overheads. With our comprehensive optimizations, we show that UDP/IP can utilize more than 95% of the maximum network throughput a Myrinet-based experimental system can provide.
IEICE Transactions on Information and Systems, 2012
ABSTRACT Facing practical limits to increasing processor frequencies, manufacturers have resorted... more ABSTRACT Facing practical limits to increasing processor frequencies, manufacturers have resorted to multi-core designs in their commercial products. In multi-core implementations, cores in a physical package share the last-level caches to improve inter-core communication. To efficiently exploit this facility, operating systems must employ cache-aware schedulers. Unfortunately, virtualization software, which is a foundation technology of cloud computing, is not yet cache-aware or does not fully exploit the locality of the last-level caches. In this paper, we propose a cache-aware virtual machine scheduler for multi-core architectures. The proposed scheduler exploits the locality of the last-level caches to improve the performance of concurrent applications running on virtual machines. For this purpose, we provide a space-partitioning algorithm that migrates and clusters communicating virtual CPUs (VCPUs) in the same cache domain. Second, we provide a time-partitioning algorithm that co-schedules or schedules in sequence clustered VCPUs. Finally, we present a theoretical analysis that proves our scheduling algorithm is more efficient in supporting concurrent applications than the default credit scheduler in Xen. We implemented our virtual machine scheduler in the recent Xen hypervisor with para-virtualized Linux-based operating systems. We show that our approach can improve performance of concurrent virtual machines by up to 19% compared to the credit scheduler.
Uploads
Papers by Yoo Chuck