The way developers edit day-to-day code tends to be repetitive, often using existing code element... more The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited scope. The advancement of Neural Machine Translation (NMT) and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild. However, unlike natural languages, for which NMT techniques were originally devised, source code and its changes have certain properties. For instance, compared to natural language, source code vocabulary can be significantly larger. Further, good changes in code do not break its syntactic structure. Thus, deploying state-of-the-art NMT models without adapting the methods to the source code domain yields sub-optimal results. To this end, we propose a novel Tree based NMT system to model source code changes and learn code change patterns from the wild. We ...
Pre-trained transformer models have recently shown promises for understanding the source code. Mo... more Pre-trained transformer models have recently shown promises for understanding the source code. Most existing works expect to understand code from the textual features and limited structural knowledge of code. However, the program functionalities sometimes cannot be fully revealed by the code sequence, even with structure information. Programs can contain very different tokens and structures while sharing the same functionality, but changing only one or a few code tokens can introduce unexpected or malicious program behaviors while preserving the syntax and most tokens. In this work, we present BOOST, a novel self-supervised model to focus pre-training based on the characteristics of source code. We first employ automated, structure-guided code transformation algorithms that generate (i.) functionally equivalent code that looks drastically different from the original one, and (ii.) textually and syntactically very similar code that is functionally distinct from the original. We train...
2015 18th International Conference on Computer and Information Technology (ICCIT), 2015
Intrusion Detection System (IDS) predominantly works for detecting malicious attacks. Many resear... more Intrusion Detection System (IDS) predominantly works for detecting malicious attacks. Many researchers have proposed the IDS with different techniques to achieve the best accuracy with the consolidation of Clustering and Artificial Neural Network (ANN). Clustering and ANN based models give better precision rate with better accuracy where attack records are low. Nevertheless, all the features of dataset are not relevant for classifying different attacks. So, feature selection can improve the stability and accuracy of IDS. In this paper, it is proposed that IDS with the amalgamation of best efficient features selected by Principal Component Analysis (PCA) can reduce the computational complexity of the system. It has been combined with the K-means clustering technique to cluster the specific groups of attacks and Artificial Neural Network to get a preeminent output by training the formulation of different base models. The model name has been defined by FP-ANK model. Investigational results have been reported on the NSL-KDD dataset where the accuracy rate associating with other models is distinct to validate the proposed system.
The way developers edit day-to-day code tends to be repetitive, often using existing code element... more The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited scope. The advancement of Neural Machine Translation (NMT) and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild. However, unlike natural languages, for which NMT techniques were originally devised, source code and its changes have certain properties. For instance, compared to natural language, source code vocabulary can be significantly larger. Further, good changes in code do not break its syntactic structure. Thus, deploying state-of-the-art NMT models without adapting the methods to the source code domain yields sub-optimal results. To this end, we propose a novel Tree based NMT system to model source code changes and learn code change patterns from the wild. We ...
Pre-trained transformer models have recently shown promises for understanding the source code. Mo... more Pre-trained transformer models have recently shown promises for understanding the source code. Most existing works expect to understand code from the textual features and limited structural knowledge of code. However, the program functionalities sometimes cannot be fully revealed by the code sequence, even with structure information. Programs can contain very different tokens and structures while sharing the same functionality, but changing only one or a few code tokens can introduce unexpected or malicious program behaviors while preserving the syntax and most tokens. In this work, we present BOOST, a novel self-supervised model to focus pre-training based on the characteristics of source code. We first employ automated, structure-guided code transformation algorithms that generate (i.) functionally equivalent code that looks drastically different from the original one, and (ii.) textually and syntactically very similar code that is functionally distinct from the original. We train...
2015 18th International Conference on Computer and Information Technology (ICCIT), 2015
Intrusion Detection System (IDS) predominantly works for detecting malicious attacks. Many resear... more Intrusion Detection System (IDS) predominantly works for detecting malicious attacks. Many researchers have proposed the IDS with different techniques to achieve the best accuracy with the consolidation of Clustering and Artificial Neural Network (ANN). Clustering and ANN based models give better precision rate with better accuracy where attack records are low. Nevertheless, all the features of dataset are not relevant for classifying different attacks. So, feature selection can improve the stability and accuracy of IDS. In this paper, it is proposed that IDS with the amalgamation of best efficient features selected by Principal Component Analysis (PCA) can reduce the computational complexity of the system. It has been combined with the K-means clustering technique to cluster the specific groups of attacks and Artificial Neural Network to get a preeminent output by training the formulation of different base models. The model name has been defined by FP-ANK model. Investigational results have been reported on the NSL-KDD dataset where the accuracy rate associating with other models is distinct to validate the proposed system.
Uploads
Papers by Saikat Chakraborty