Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleAugust 2024
μMon: Empowering Microsecond-level Network Monitoring with Wavelets
ACM SIGCOMM '24: Proceedings of the ACM SIGCOMM 2024 ConferencePages 274–290https://doi.org/10.1145/3651890.3672236Network monitoring is essential for network management and optimization. In modern data centers, fluctuations in flow rates and network congestion events (e.g., microbursts) typically manifest on a microsecond timescale. However, the time granularity of ...
- research-articleApril 2024
Unison: A Parallel-Efficient and User-Transparent Network Simulation Kernel
- Songyuan Bai,
- Hao Zheng,
- Chen Tian,
- Xiaoliang Wang,
- Chang Liu,
- Xin Jin,
- Fu Xiao,
- Qiao Xiang,
- Wanchun Dou,
- Guihai Chen
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsPages 115–131https://doi.org/10.1145/3627703.3629574Discrete-event simulation (DES) is a prevalent tool for evaluating network designs. Although DES offers full fidelity and generality, its slow performance limits its application. To speed up DES, many network simulators employ parallel discrete-event ...
- posterSeptember 2023
- posterSeptember 2023
Dilemma of Proactive Congestion Control Protocols
APNet '23: Proceedings of the 7th Asia-Pacific Workshop on NetworkingPages 176–177https://doi.org/10.1145/3600061.3603123Reactive congestion control (RCC) protocols have undergone decades of evolution, where senders first send data packets and then back off when congestion occurs. Recently, there has been a surge of interest in proactive congestion control (PCC) that ...
-
- posterSeptember 2023
AFNFA: An Approach to Automate NCCL Configuration Exploration
APNet '23: Proceedings of the 7th Asia-Pacific Workshop on NetworkingPages 204–205https://doi.org/10.1145/3600061.3600068With the continuously increasing scale of deep neural network models, there is a clear trend towards distributed DNN model training. State-of-the-art training frameworks support this approach using collective communication libraries such as NCCL, MPI, ...
- research-articleMay 2023
I/O-Efficient Butterfly Counting at Scale
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 34, Pages 1–27https://doi.org/10.1145/3588714Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is more and more challenging to do ...
- research-articleDecember 2022
PayDebt: Reduce Buffer Occupancy Under Bursty Traffic on Large Clusters
- Kexin Liu,
- Chen Tian,
- Qingyue Wang,
- Yanqing Chen,
- Bingchuan Tian,
- Wenhao Sun,
- Ke Meng,
- Long Yan,
- Lei Han,
- Jie Fu,
- Wanchun Dou,
- Guihai Chen
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4707–4722https://doi.org/10.1109/TPDS.2022.3202504The average/tail Flow Completion Times (FCTs) are critical to many datacenter applications. Congestion control plays a central role in optimizing FCT. Inappropriate congestion control can exacerbate buffer occupancy, thus hurting the flow performance. Our ...
- research-articleDecember 2022
PushBox: Making Use of Every Bit of Time to Accelerate Completion of Data-Parallel Jobs
- Chen Tian,
- Yi Wang,
- Bingchuan Tian,
- Yang Zhao,
- Yuhang Zhou,
- Chenxu Wang,
- Haoran Guan,
- Wanchun Dou,
- Guihai Chen
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 4256–4269https://doi.org/10.1109/TPDS.2022.3182037To minimize a job's completion time, we need to minimize the completion time of its final stage's last task. Scheduling of machine slots and networks largely dominates the variable part of each task's duration. Finding an optimal ...
- research-articleDecember 2022
<sc>Meet</sc>: Rack-Level Pooling Based Load Balancing in Datacenter Networks
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 33, Issue 12Pages 3628–3639https://doi.org/10.1109/TPDS.2022.3162297Datacenter networks enable multiple paths between hosts to provide large bisection bandwidth. It requires load balancers to cope with network uncertainties such as traffic dynamics and topology asymmetry. Existing edge-based load balancing schemes are ...
- research-articleNovember 2022
A reconfigurable test method based on LFSR for 3D stacking integrated circuits
Integration, the VLSI Journal (INTG), Volume 87, Issue CPages 82–89https://doi.org/10.1016/j.vlsi.2022.06.011AbstractThe specificity of the 3D integrated circuit and the changes in the design integration flow bring many new problems to the test process, such as reducing the controllability of the test and increasing the test cost. To address these problems, ...
Highlights- A reconfigurable structure test structure for LFSR is designed in 3D integrated circuits.
- The proposed test structure achieves optimization of the in-binding and post-binding test.
- The relationship between test power consumption ...
- research-articleAugust 2022
FlyMon: enabling on-the-fly task reconfiguration for network measurement
SIGCOMM '22: Proceedings of the ACM SIGCOMM 2022 ConferencePages 486–502https://doi.org/10.1145/3544216.3544239Network measurement is important to data center operators. Most existing efforts focus on developing new implementation schemes for measurement tasks. Little attention is paid to on-the-fly task reconfiguration. Due to resource constraints, it is ...
- research-articleJuly 2022
Analyzing and Optimizing Packet Corruption in RDMA Network
- Yi-Xiao Gao,
- Chen Tian,
- Wei Chen,
- Duo-Xing Li,
- Jian Yan,
- Yuan-Yuan Gong,
- Bing-Quan Wang,
- Tao Wu,
- Lei Han,
- Fa-Zhi Qi,
- Shan Zeng,
- Wan-Chun Dou,
- Gui-Hai Chen
Journal of Computer Science and Technology (JCST), Volume 37, Issue 4Pages 743–762https://doi.org/10.1007/s11390-022-2123-8AbstractRemote direct memory access (RDMA) has become one of the state-of-the-art high-performance network technologies in datacenters. The reliable transport of RDMA is designed based on a lossless underlying network and cannot endure a high packet loss ...
- research-articleJuly 2022
SMART: Speedup Job Completion Time by Scheduling Reduce Tasks
- Jia-Qing Dong,
- Ze-Hao He,
- Yuan-Yuan Gong,
- Pei-Wen Yu,
- Chen Tian,
- Wan-Chun Dou,
- Gui-Hai Chen,
- Nai Xia,
- Hao-Ran Guan
Journal of Computer Science and Technology (JCST), Volume 37, Issue 4Pages 763–778https://doi.org/10.1007/s11390-022-2118-5AbstractDistributed computing systems have been widely used as the amount of data grows exponentially in the era of information explosion. Job completion time (JCT) is a major metric for assessing their effectiveness. How to reduce the JCT for these ...
- research-articleJuly 2022
Approximation Designs for Energy Harvesting Relay Deployment in Wireless Sensor Networks
Journal of Computer Science and Technology (JCST), Volume 37, Issue 4Pages 779–796https://doi.org/10.1007/s11390-022-1964-5AbstractEnergy harvesting technologies allow wireless devices to be recharged by the surrounding environment, providing wireless sensor networks (WSNs) with higher performance and longer lifetime. However, directly building a wireless sensor network with ...
- articleJune 2022
Moneo: Monitoring Fine-grained Metrics Nonintrusively in AI Infrastructure
ACM SIGOPS Operating Systems Review (SIGOPS), Volume 56, Issue 1Pages 18–25https://doi.org/10.1145/3544497.3544501Cloud-based AI infrastructure is becoming increasingly important, especially on large-scale distributed training. To improve its efficiency and serviceability, real-time monitoring of the infrastructure and workload profiling are proved to be the ...
- research-articleMay 2022
On Designing Secure Cross-user Redundancy Elimination for WAN Optimization
IEEE INFOCOM 2022 - IEEE Conference on Computer CommunicationsPages 1589–1598https://doi.org/10.1109/INFOCOM48880.2022.9796893Redundancy elimination (RE) systems allow network users to remove duplicate parts in their messages by introducing caches at both message senders’ and receivers’ sides. While RE systems have been successfully deployed for handling ...
- research-articleJanuary 2022
Technology Based on Interactive Theatre Performance Production and Performance Platform
Drama refers to a comprehensive art that realizes the purpose of narrative through language, movement, dance, music, puppets, and other forms. From the pre-Qin period⟶the Han and Wei periods⟶the Tang, Song, and Jin periods⟶the Yuan Dynasty⟶the late Yuan ...
Floodgate: taming incast in datacenter networks
- Kexin Liu,
- Chen Tian,
- Qingyue Wang,
- Hao Zheng,
- Peiwen Yu,
- Wenhao Sun,
- Yonghui Xu,
- Ke Meng,
- Lei Han,
- Jie Fu,
- Wanchun Dou,
- Guihai Chen
CoNEXT '21: Proceedings of the 17th International Conference on emerging Networking EXperiments and TechnologiesPages 30–44https://doi.org/10.1145/3485983.3494854Incast occurs frequently in datacenter networks where a large number of senders send data to a single receiver simultaneously, which makes the last hop the network bottleneck. Incast can hurt flows' performance. However, congestion control protocols are ...
- research-articleSeptember 2021
Performance analysis of opportunistic NOMA strategy in uplink coordinated multi-points systems
Computer Communications (COMS), Volume 177, Issue CPages 207–212https://doi.org/10.1016/j.comcom.2021.07.001AbstractNon-orthogonal multiple access (NOMA) has attracted a great deal of interest due to its potential contribution to the 5th generation (5G) mobile networks. In conventional power-domain NOMA, one of the disadvantages is that the edge-...