Graduated in 1984 from Tripoli University in Computer Engineering. Worked at NCR and AES before starting MSc degree in Computer Engineering at University of Waterloo. Completed PhD in Computer Engineering in 1995. Joined Ryerson University in 1997 and currently working as a faculty member at the School of Engineering, University of Guelph in Canada. Phone: 1-519-8244120 x53819 Address: 50 Stone Road Guelph, N1G 2W1 Ontario, Canada
Timing-driven placement tools for FPGAs rely on the availability of accurate delay estimates for ... more Timing-driven placement tools for FPGAs rely on the availability of accurate delay estimates for nets in order to identify and optimize critical paths. In this paper, we propose a machine-learning framework for predicting net delay to reduce miscorrelation between placement and detailed-routing. Features relevant to timing delay are engineered based on characteristics of nets, available routing resources, and the behavior of the detailed router. Our results show an accuracy above 94%, and when integrated within an FPGA analytical placer Critical Path Delay (CPD) is improved by 10% on average compared to a static delay model.
One of the most time consuming steps in the FPGA CAD flow is the placement problem which directly... more One of the most time consuming steps in the FPGA CAD flow is the placement problem which directly impacts the completion of the design flow. Accordingly, a routability driven FPGA placement contest was organized by Xilinx in ISPD 2016 to address this problem. Due to variations in the ISPD benchmark characteristics and heterogeneity of the FPGA architectures, as well as the different optimization strategies employed by different participating placers, placement algorithms that performed well on some circuits performed poorly on others. In this paper we propose a Machine-Learning (ML) framework that is capable of recommending the best FPGA placement algorithm within the CAD flow. Results obtained indicate that the ML framework is capable of selecting the correct flow with an 83% accuracy.
Placement remains one of the most critical steps in the FPGA design flow. In this paper, a novel ... more Placement remains one of the most critical steps in the FPGA design flow. In this paper, a novel machine learning framework that enables a placement tool to predict key parameters used to inflate Look-Up-Tables (LUTS) during placement is proposed. LUT inflation assists the placer in spreading cells in congested regions to reduce congestion and improve routability. Empirical results show that when employed in a state-of-the-art placement tool, the proposed framework improves routed wirelength and increases the total number of routable placements found.
In this work, a Convolutional Encoder-Decoder (CED) is utilized to significantly reduce placement... more In this work, a Convolutional Encoder-Decoder (CED) is utilized to significantly reduce placement runtimes for large, high-utilization designs. The proposed CED uses features available during the early stages of placement to predict the congestion present in subsequent placement iterations including the final placement. This congestion information is then used by the placer to improve decision making leading to reduced runtimes. Experimental results show that reductions in placer runtime between 27% and 40% are achievable with no significant deterioration in quality-of-result.
ABSTRACT-Due to the rapid growth of technologies, Systems-on-Chip (SoC) have started to become a ... more ABSTRACT-Due to the rapid growth of technologies, Systems-on-Chip (SoC) have started to become a key is-sue in today's electronic industry. In deep submicron designs, the interconnect is responsible for more than 90 percent of the signal delay in a chip. This paper presents ...
2017 29th International Conference on Microelectronics (ICM), 2017
Supervised machine-learning algorithms require relatively large amounts of runtime to perform tra... more Supervised machine-learning algorithms require relatively large amounts of runtime to perform training and/or classification. Therefore, a need exists to accelerate their runtime, especially for real-time applications. In this paper, we propose and compare several hardware accelerators for the K-Nearest Neighbor (K-NN) classification algorithm. The accelerators are developed using Xilinx Vivado High-Level Synthesis (HLS) and represent examples of semi-tightly coupled architectures. Our experimental results, based on standard benchmarks, show speedups ranging from 48x-168x.
2016 28th International Conference on Microelectronics (ICM), 2016
Increasingly, machine-learning algorithms are playing an important role in the context of embedde... more Increasingly, machine-learning algorithms are playing an important role in the context of embedded and real-time systems. Applications such as wireless sensor networks, security, and commercial enterprises are increasingly relying on machine-learning algorithms to efficiently make predictive decisions based on the large volumes of data these systems collect. Therefore, there is a need to accelerate the runtime of these algorithms, especially for real-time applications. In this paper, we propose several Application Specific Instruction Processor (ASIP) architectures for the K-Nearest Neighbor (KNN) classification algorithm. Each ASIP is developed using Cadence Tensilica tools and represents a tightly-coupled architecture. Our experimental results, based on several benchmarks, show that proposed ASIPs achieve speedups of 86×-650× over the original software implementation.
2016 IEEE 21st International Workshop on Computer Aided Modelling and Design of Communication Links and Networks (CAMAD), 2016
Many packet classification algorithms with variable performances and capabilities are available. ... more Many packet classification algorithms with variable performances and capabilities are available. However, no single algorithm is guaranteed to outperform every other one in every case. Meta-Learning is a subfield in Machine Learning that aims to apply statistical techniques to automate the algorithm selection process. In this work, we propose a novel framework for efficient, automatic packet classification algorithm selection. By utilizing Meta-Learning and Artificial Neural Networks (ANNs) we are able to achieve an average accuracy of 90% when automatically choosing the most appropriate algorithm when applied to over a hundred different rulesets ranging in size from 1K to 5K.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17, 2017
Many of the key stages in the traditional FPGA CAD flow require substantial amounts of computatio... more Many of the key stages in the traditional FPGA CAD flow require substantial amounts of computational effort. Moreover, due to limited overlap among individual stages, poor decisions made in earlier stages will often adversely affect the quality of result in later stages. To help address these issues, we propose a machine-learning framework that uses training data to learn the underlying relationship between circuits and the CAD algorithms used to map them onto a particular FPGA device. The framework does not solve the problem at an arbitrary stage in the flow. Rather, it seeks to assist the designer or the tool to solve the problem. The potential capabilities of the framework are demonstrated by applying it to the placement stage, where it is used to recommend the best placement flow for circuits with different features, and to predict placement and routing results without actually performing placement and routing. Results show that when trained using 372 challenging benchmarks for a Xilinx UltraScale device, the classification models employed in the framework achieve average accuracies in the range 92% to 95%, while the regression models have an average error rate in the range of 0.5% to 3.6%.
Annotation: This paper adapts a basic node interchange scheme for solving the circuit partitionin... more Annotation: This paper adapts a basic node interchange scheme for solving the circuit partitioning problem and develops a clustering technique that uses GRASP to generate clusters of moderate sizes. The number of clusters is predetermined as a function of the number of partitions ...
2020 30th International Conference on Field-Programmable Logic and Applications (FPL), 2020
The ability to quickly and accurately predict congestion has emerged as one of the most critical ... more The ability to quickly and accurately predict congestion has emerged as one of the most critical problems during placement. In this paper, we present DLCong, a deep learning congestion-estimation framework based on a convolutional encoder-decoder. Experimental results show that compared to MLCong, a state-of-the-art machine-learning based congestion-estimation model, DLCong achieves an almost 9% improvement in congestion accuracy, while exhibiting inference times of a few milliseconds. Moreover, the accuracy of DLCong scales better with increasing congestion compared to MLCong.
ACM Transactions on Design Automation of Electronic Systems, 2018
Optimizing for routability during FPGA placement is becoming increasingly important, as failure t... more Optimizing for routability during FPGA placement is becoming increasingly important, as failure to spread and resolve congestion hotspots throughout the chip, especially in the case of large designs, may result in placements that either cannot be routed or that require the router to work excessively hard to obtain success. In this article, we introduce a new, analytic routability-aware placement algorithm for Xilinx UltraScale FPGA architectures. The proposed algorithm, called GPlace3.0, seeks to optimize both wirelength and routability. Our work contains several unique features including a novel window-based procedure for satisfying legality constraints in lieu of packing, an accurate congestion estimation method based on modifications to the pathfinder global router, and a novel detailed placement algorithm that optimizes both wirelength and external pin count. Experimental results show that compared to the top three winners at the recent ISPD’16 FPGA placement contest, GPlace3.0 ...
Timing-driven placement tools for FPGAs rely on the availability of accurate delay estimates for ... more Timing-driven placement tools for FPGAs rely on the availability of accurate delay estimates for nets in order to identify and optimize critical paths. In this paper, we propose a machine-learning framework for predicting net delay to reduce miscorrelation between placement and detailed-routing. Features relevant to timing delay are engineered based on characteristics of nets, available routing resources, and the behavior of the detailed router. Our results show an accuracy above 94%, and when integrated within an FPGA analytical placer Critical Path Delay (CPD) is improved by 10% on average compared to a static delay model.
One of the most time consuming steps in the FPGA CAD flow is the placement problem which directly... more One of the most time consuming steps in the FPGA CAD flow is the placement problem which directly impacts the completion of the design flow. Accordingly, a routability driven FPGA placement contest was organized by Xilinx in ISPD 2016 to address this problem. Due to variations in the ISPD benchmark characteristics and heterogeneity of the FPGA architectures, as well as the different optimization strategies employed by different participating placers, placement algorithms that performed well on some circuits performed poorly on others. In this paper we propose a Machine-Learning (ML) framework that is capable of recommending the best FPGA placement algorithm within the CAD flow. Results obtained indicate that the ML framework is capable of selecting the correct flow with an 83% accuracy.
Placement remains one of the most critical steps in the FPGA design flow. In this paper, a novel ... more Placement remains one of the most critical steps in the FPGA design flow. In this paper, a novel machine learning framework that enables a placement tool to predict key parameters used to inflate Look-Up-Tables (LUTS) during placement is proposed. LUT inflation assists the placer in spreading cells in congested regions to reduce congestion and improve routability. Empirical results show that when employed in a state-of-the-art placement tool, the proposed framework improves routed wirelength and increases the total number of routable placements found.
In this work, a Convolutional Encoder-Decoder (CED) is utilized to significantly reduce placement... more In this work, a Convolutional Encoder-Decoder (CED) is utilized to significantly reduce placement runtimes for large, high-utilization designs. The proposed CED uses features available during the early stages of placement to predict the congestion present in subsequent placement iterations including the final placement. This congestion information is then used by the placer to improve decision making leading to reduced runtimes. Experimental results show that reductions in placer runtime between 27% and 40% are achievable with no significant deterioration in quality-of-result.
ABSTRACT-Due to the rapid growth of technologies, Systems-on-Chip (SoC) have started to become a ... more ABSTRACT-Due to the rapid growth of technologies, Systems-on-Chip (SoC) have started to become a key is-sue in today's electronic industry. In deep submicron designs, the interconnect is responsible for more than 90 percent of the signal delay in a chip. This paper presents ...
2017 29th International Conference on Microelectronics (ICM), 2017
Supervised machine-learning algorithms require relatively large amounts of runtime to perform tra... more Supervised machine-learning algorithms require relatively large amounts of runtime to perform training and/or classification. Therefore, a need exists to accelerate their runtime, especially for real-time applications. In this paper, we propose and compare several hardware accelerators for the K-Nearest Neighbor (K-NN) classification algorithm. The accelerators are developed using Xilinx Vivado High-Level Synthesis (HLS) and represent examples of semi-tightly coupled architectures. Our experimental results, based on standard benchmarks, show speedups ranging from 48x-168x.
2016 28th International Conference on Microelectronics (ICM), 2016
Increasingly, machine-learning algorithms are playing an important role in the context of embedde... more Increasingly, machine-learning algorithms are playing an important role in the context of embedded and real-time systems. Applications such as wireless sensor networks, security, and commercial enterprises are increasingly relying on machine-learning algorithms to efficiently make predictive decisions based on the large volumes of data these systems collect. Therefore, there is a need to accelerate the runtime of these algorithms, especially for real-time applications. In this paper, we propose several Application Specific Instruction Processor (ASIP) architectures for the K-Nearest Neighbor (KNN) classification algorithm. Each ASIP is developed using Cadence Tensilica tools and represents a tightly-coupled architecture. Our experimental results, based on several benchmarks, show that proposed ASIPs achieve speedups of 86×-650× over the original software implementation.
2016 IEEE 21st International Workshop on Computer Aided Modelling and Design of Communication Links and Networks (CAMAD), 2016
Many packet classification algorithms with variable performances and capabilities are available. ... more Many packet classification algorithms with variable performances and capabilities are available. However, no single algorithm is guaranteed to outperform every other one in every case. Meta-Learning is a subfield in Machine Learning that aims to apply statistical techniques to automate the algorithm selection process. In this work, we propose a novel framework for efficient, automatic packet classification algorithm selection. By utilizing Meta-Learning and Artificial Neural Networks (ANNs) we are able to achieve an average accuracy of 90% when automatically choosing the most appropriate algorithm when applied to over a hundred different rulesets ranging in size from 1K to 5K.
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17, 2017
Many of the key stages in the traditional FPGA CAD flow require substantial amounts of computatio... more Many of the key stages in the traditional FPGA CAD flow require substantial amounts of computational effort. Moreover, due to limited overlap among individual stages, poor decisions made in earlier stages will often adversely affect the quality of result in later stages. To help address these issues, we propose a machine-learning framework that uses training data to learn the underlying relationship between circuits and the CAD algorithms used to map them onto a particular FPGA device. The framework does not solve the problem at an arbitrary stage in the flow. Rather, it seeks to assist the designer or the tool to solve the problem. The potential capabilities of the framework are demonstrated by applying it to the placement stage, where it is used to recommend the best placement flow for circuits with different features, and to predict placement and routing results without actually performing placement and routing. Results show that when trained using 372 challenging benchmarks for a Xilinx UltraScale device, the classification models employed in the framework achieve average accuracies in the range 92% to 95%, while the regression models have an average error rate in the range of 0.5% to 3.6%.
Annotation: This paper adapts a basic node interchange scheme for solving the circuit partitionin... more Annotation: This paper adapts a basic node interchange scheme for solving the circuit partitioning problem and develops a clustering technique that uses GRASP to generate clusters of moderate sizes. The number of clusters is predetermined as a function of the number of partitions ...
2020 30th International Conference on Field-Programmable Logic and Applications (FPL), 2020
The ability to quickly and accurately predict congestion has emerged as one of the most critical ... more The ability to quickly and accurately predict congestion has emerged as one of the most critical problems during placement. In this paper, we present DLCong, a deep learning congestion-estimation framework based on a convolutional encoder-decoder. Experimental results show that compared to MLCong, a state-of-the-art machine-learning based congestion-estimation model, DLCong achieves an almost 9% improvement in congestion accuracy, while exhibiting inference times of a few milliseconds. Moreover, the accuracy of DLCong scales better with increasing congestion compared to MLCong.
ACM Transactions on Design Automation of Electronic Systems, 2018
Optimizing for routability during FPGA placement is becoming increasingly important, as failure t... more Optimizing for routability during FPGA placement is becoming increasingly important, as failure to spread and resolve congestion hotspots throughout the chip, especially in the case of large designs, may result in placements that either cannot be routed or that require the router to work excessively hard to obtain success. In this article, we introduce a new, analytic routability-aware placement algorithm for Xilinx UltraScale FPGA architectures. The proposed algorithm, called GPlace3.0, seeks to optimize both wirelength and routability. Our work contains several unique features including a novel window-based procedure for satisfying legality constraints in lieu of packing, an accurate congestion estimation method based on modifications to the pathfinder global router, and a novel detailed placement algorithm that optimizes both wirelength and external pin count. Experimental results show that compared to the top three winners at the recent ISPD’16 FPGA placement contest, GPlace3.0 ...
Uploads
Papers by Shawki Areibi