Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Weirong Jiang
  • 3740 McClintock Avenue, EEB-244,
    Department of EE-Systems,
    University of Southern California,
    Los Angeles, CA 90089-2562
Abstract Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the... more
Abstract Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM-based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput.
Abstract Modern networks are increasingly becoming content aware to improve data delivery and security via content-based network processing. Content-aware processing at the front end of distributed network systems, such as application... more
Abstract Modern networks are increasingly becoming content aware to improve data delivery and security via content-based network processing. Content-aware processing at the front end of distributed network systems, such as application identification for datacenter load-balancers and deep packet inspection for security gateways, is more challenging due to the wire-speed and low-latency requirement.
Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form of tree traversal. SRAM-based Pipelining can improve the throughput dramatically. However, previous pipelining schemes result in unbalanced... more
Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form of tree traversal. SRAM-based Pipelining can improve the throughput dramatically. However, previous pipelining schemes result in unbalanced memory allocation over the pipeline stages. This has been identified as a major challenge for scalable pipelined solutions. This paper proposes a flexible bidirectional linear pipeline architecture based on widely-used dual-port SRAMs.
Internet is built as a packet-switching network. The kernel function of Internet infrastructure, including routers and switches, is to forward the packets that are received from one subnet to another subnet. The packet forwarding is... more
Internet is built as a packet-switching network. The kernel function of Internet infrastructure, including routers and switches, is to forward the packets that are received from one subnet to another subnet. The packet forwarding is accomplished by using the header information extracted from a packet to look up the forwarding table maintained in the routers/switches. Due to rapid growth of network traffic, packet forwarding has long been a performance bottleneck in routers/switches.
Abstract Network virtualization has become a powerful scheme to make efficient use of networking hardware. It allows multiple virtual networks to co-exist on the same physical networking substrate. This requires the hardware router to... more
Abstract Network virtualization has become a powerful scheme to make efficient use of networking hardware. It allows multiple virtual networks to co-exist on the same physical networking substrate. This requires the hardware router to maintain multiple lookup tables. Hence, ultimately the hardware router should be capable of handling packets from different virtual networks. In this paper, we introduce a memory-efficient solution for router virtualization named, Multiroot.
Abstract Packet classification is a fundamental enabling function for various applications in switches, routers and firewalls. Due to their performance and scalability limitations, current packet classification solutions are insufficient... more
Abstract Packet classification is a fundamental enabling function for various applications in switches, routers and firewalls. Due to their performance and scalability limitations, current packet classification solutions are insufficient in ad-dressing the challenges from the growing network bandwidth and the increasing number of new applications. This paper presents a scalable parallel architecture, named Para Split, for high-performance packet classification.
Abstract Due to the dual trends of increasing cellular network transmission capacity and coverage as well as improving computational capacity, storage and intelligence of mobile handsets, mobile peer-to-peer (MP2P) networking is emerging... more
Abstract Due to the dual trends of increasing cellular network transmission capacity and coverage as well as improving computational capacity, storage and intelligence of mobile handsets, mobile peer-to-peer (MP2P) networking is emerging an attractive research field in recent years. However, these trends have not been clearly articulated in perspective of both technology and business.
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to... more
SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. Third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order.
Multi-pattern string matching remains a major performance bottleneck in network intrusion detection and anti-virus systems for high-speed deep packet inspection (DPI). Although Aho-Corasick deterministic finite automaton (AC-DFA) based... more
Multi-pattern string matching remains a major performance bottleneck in network intrusion detection and anti-virus systems for high-speed deep packet inspection (DPI). Although Aho-Corasick deterministic finite automaton (AC-DFA) based solutions produce deterministic throughput and are widely used in today's DPI systems such as Snort and ClamAV, the high memory requirement of AC-DFA (due to the large number of state transitions in AC-DFA) inhibits efficient hardware implementation to achieve high performance. Some recent work has shown that the AC-DFA can be reduced to a character trie that contains only the forward transitions by incorporating pipelined processing. But they have limitations in either handling long patterns or extensions to support multi-character input per clock cycle to achieve high throughput. This paper generalizes the problem and proves formally that a linear pipeline with H stages can remove all cross transitions to the top H levels of a AC-DFA. A novel and scalable pipeline architecture for memory-efficient multi-pattern string matching is then presented. The architecture can be easily extended to support multi-character input per clock cycle by mapping a compressed AC-DFA onto multiple pipelines. Simulation using Snort and ClamAV pattern sets shows that a 8-stage pipeline can remove more than 99% of the transitions in the original AC-DFA. The implementation on a state-of-the-art field programmable gate array (FPGA) shows that our architecture can store on a single FPGA device the full set of string patterns from the latest Snort rule set. Our FPGA implementation sustains 10+ Gbps throughput, while consuming a small amount of on-chip logic resources. Also desirable scalability is achieved: the increase in resource requirement of our solution is sub-linear with the throughput improvement.
Power consumption has become a limiting factor in next-generation routers. IP forwarding engines dominate the overall power dissipation in a router. Although SRAM-based pipeline architectures have recently been developed as a promising... more
Power consumption has become a limiting factor in next-generation routers. IP forwarding engines dominate the overall power dissipation in a router. Although SRAM-based
pipeline architectures have recently been developed as a promising alternative to power-hungry TCAM-based solutions for high-throughput IP forwarding, it remains a challenge to achieve low power. This paper proposes several novel architecture-specific techniques to reduce the dynamic power consumption in SRAM-based pipelined IP forwarding engines. First, the pipeline architecture itself is built as an inherent cache, exploiting the data locality in Internet traffic. The number of memory accesses which contribute to the majority of power consumption, is thus reduced. No external cache is needed. Second, instead of using a global clock, different pipeline stages are driven by separate clocks. The local  locking scheme is carefully designed to exploit the traffic rate variation and improve the caching performance. Third, a fine-grained memory enabling scheme is developed to eliminate unnecessary memory accesses, while preserving the packet order. Simulation experiments using real-life traces show that our solutions can achieve up to 15-fold reduction in dynamic power dissipation, over the baseline pipeline architecture that does not employ the proposed schemes. FPGA implementation results show that our design sustains 40 Gbps throughput for minimum size (40 bytes)
packets while consuming a small amount of logic resources.
Multi-match packet classification is a critical function in network intrusion detection systems (NIDS), where all matching rules for a packet need to be reported. Most of the previous work is based on ternary content addressable memories... more
Multi-match packet classification is a critical function in network intrusion detection systems (NIDS), where all matching rules for a packet need to be reported. Most of the previous work is based on ternary content addressable memories (TCAMs) which are expensive and are not scalable with respect to clock rate, power consumption, and circuit area. This paper studies the characteristics of real-life Snort NIDS rule sets, and proposes a novel SRAM-based architecture. The proposed architecture is called field-split parallel bit vector (FSBV) where some header fields of a packet are further split into bit-level subfields. Unlike previous multi-match packet classification algorithms which suffer from memory explosion, the memory requirement of FSBV is linear in the number of rules. FPGA technology is exploited to provide high throughput and to support dynamic updates. Implementation results show that our architecture can store on a single Xilinx Virtex-5 FPGA the full set of packet header rules extracted from the latest Snort NIDS and sustains 100 Gbps throughput for minimum size (40 bytes) packets. The design achieves 1.25× improvement in throughput while the power consumption is approximately one fourth that of the state-of-the-art solutions.
Multi-field packet classification is a key enabling function of a variety of network applications, such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Although a plethora of... more
Multi-field packet classification is a key enabling function of a variety of network applications, such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Although a plethora of research has been done in this area, wire-speed packet classification while supporting large rule sets remains difficult. This paper exploits the features provided by current FPGAs and proposes a decision-tree-based, two-dimensional dual-pipeline architecture for multi-field packet classification. To fit the current largest rule set in the on-chip memory of the FPGA device, we propose several optimization techniques for the state-of-the-art decision-tree-based algorithm, so that the memory requirement is almost linear with the number of rules. Specialized logic is developed to support varying number of branches at each decision tree node. A tree-to-pipeline mapping scheme is carefully designed to maximize the memory utilization. Since our architecture is linear and memory-based, on-the-fly update without disturbing the ongoing operations is feasible. The implementation results show that our architecture can store 10K real-life rules in on-chip memory of a single Xilinx Virtex-5 FPGA, and sustain 80 Gbps (i.e. 2x OC-768 rate) throughput for minimum size (40 bytes) packets. To the best of our knowledge, this work is the first FPGA-based packet classification engine that achieves wire-speed throughput while supporting 10K unique rules.
Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the... more
Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM- based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput. However, several challenges must be addressed for such solutions to realize high throughput. First, the memory distribution across different stages of each pipeline as well as across different pipelines must be balanced. Second, the traffic on various pipelines should be balanced. In this paper, we propose a parallel SRAM-based multi- pipeline architecture for terabit IP lookup. To balance the memory requirement over the stages, a two-level mapping scheme is presented. By trie partitioning and subtrie-to-pipeline mapping, we ensure that each pipeline contains approximately equal number of trie nodes. Then, within each pipeline, a fine-grained node-to-stage mapping is used to achieve evenly distributed memory across the stages. To balance the traffic on different pipelines, both pipelined prefix caching and dynamic subtrie-to-pipeline remapping are employed. Simulation using real-life data shows that the proposed architecture with 8 pipelines can store a core routing table with over 200 K unique routing prefixes using 3.5 MB of memory. It achieves a throughput of up to 3.2 billion packets per second, i.e. 1 Tbps for minimum size (40 bytes) packets.
Abstract Routing metrics play a critical role in wireless mesh networks (WMNs). Several metrics have already been proposed but none of them can well meet the specific requirement brought by large-scale multi-radio mesh networks (LSMRMNs).... more
Abstract Routing metrics play a critical role in wireless mesh networks (WMNs). Several metrics have already been proposed but none of them can well meet the specific requirement brought by large-scale multi-radio mesh networks (LSMRMNs). In LSMRMNs, most of traffic has much longer paths than in small scale WMNs. The channel distribution on a long path thus has a significant impact on the route performance. In this paper, we identify such a challenge and study five existing routing metrics. Then we describe a novel ...
Abstract Routing in large-scale multi-radio wireless mesh networks (WMNs) is facing two challenges in achieving a high throughput. One is the long path between the source and the destination, and the other is the high routing overhead. We... more
Abstract Routing in large-scale multi-radio wireless mesh networks (WMNs) is facing two challenges in achieving a high throughput. One is the long path between the source and the destination, and the other is the high routing overhead. We study the both aspects and develop our schemes accordingly. Firstly, a new routing metric for selecting multi-channel routes with maximum end-to-end capacity is presented. Secondly, a feedback based algorithm to maximize the control message broadcasting interval is proposed to minimize ...
Abstract Link state routing (LSR) is widely adopted in wireless mesh networks (WMNs) while it's criticized for the high routing overhead. This work presents a hybrid mobility model for large-scale WMNs and develops a... more
Abstract Link state routing (LSR) is widely adopted in wireless mesh networks (WMNs) while it's criticized for the high routing overhead. This work presents a hybrid mobility model for large-scale WMNs and develops a feedback-based distributed algorithm for each node to minimize the routing overhead by adaptively maximizing the control message broadcasting interval while satisfying its local mobility. This algorithm is integrated with other proposed schemes, into a routing protocol named THU-OLSR. Both theoretical analysis and ...
Abstract IP forwarding with longest prefix matching (LPM) is the kernel function of routers in Internet. Most LPM algorithms can be implemented by some form of tree traversal, whose throughput can be improved dramatically by being... more
Abstract IP forwarding with longest prefix matching (LPM) is the kernel function of routers in Internet. Most LPM algorithms can be implemented by some form of tree traversal, whose throughput can be improved dramatically by being pipelined. On the other hand, balancing the memory allocation over the pipeline stages has been identified as a major challenge for scalable solutions. Most of previous pipelining schemes balance the memory distribution across stages at the cost of lowering the worst-case throughput. This paper proposes a ...
Research Interests:
Abstract This study originally focuses on optimizing the clustering algorithm employed in multi-radio ad-hoc networks with an objective of maximizing the end-to-end capacity. We theoretically demonstrate that, clusters with uniform size... more
Abstract This study originally focuses on optimizing the clustering algorithm employed in multi-radio ad-hoc networks with an objective of maximizing the end-to-end capacity. We theoretically demonstrate that, clusters with uniform size outperform those with not. A distributed uniforming algorithm which can be applied complementary to most existing clustering algorithms is developed to balance the cluster scale. Preliminary simulations finally confirm our conjectures and demonstrate the effectivity of our algorithm