Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
26 views

ASNM Program Explain

1. The document discusses using various Python libraries like Pandas, Numpy, Sklearn, Tensorflow, and Matplotlib for building a neural network model to classify network traffic data. 2. Three datasets containing network traffic traces with adversarial obfuscation techniques were used, with one dataset containing cyber defense exercise data from 2009. 3. The data was preprocessed, normalized, encoded, and split into training and test sets. A neural network model with 9 layers and varying neuron counts was developed and trained over 10 epochs with a batch size of 26 to classify the network traffic data.

Uploaded by

Keseho
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

ASNM Program Explain

1. The document discusses using various Python libraries like Pandas, Numpy, Sklearn, Tensorflow, and Matplotlib for building a neural network model to classify network traffic data. 2. Three datasets containing network traffic traces with adversarial obfuscation techniques were used, with one dataset containing cyber defense exercise data from 2009. 3. The data was preprocessed, normalized, encoded, and split into training and test sets. A neural network model with 9 layers and varying neuron counts was developed and trained over 10 epochs with a batch size of 26 to classify the network traffic data.

Uploaded by

Keseho
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ASNM-CDX 2009

Pandas: For reading the dataset

Numpy: For arrays manipulation

Sklearn: for label encoding as well as stuffing and to split the dataset into training and testing part,
accuracy score, precision score, f1_score, recall score, ROC curve and AUC

Tensorflow: in order to implement neural network

Matplotlib: used for visualization of data

We have used three datasets that have been built from network traffic traces using ASNM
(Advanced Security Network Metrics) features, designed in our previous work. The first dataset was
built using a state-of-the-art dataset CDX 2009 that was collected during a cyber defense exercise,
while the remaining two datasets were collected by us in 2015 and 2018 using publicly available
network services containing buffer overflow and other high severity vulnerabilities. These two
datasets contain several adversarial obfuscation techniques that were applied onto malicious as well
as legitimate traffic samples during ‘‘the execution’’ of their TCP network connections.

1 and 2. So let just start import all the necessary modules which I have already explained couple of
minutes back followed by importing our dataset from my local computer.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
import tensorflow as tf
from tensorflow.keras.layers import Dense
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, f1_score,
recall_score, roc_curve, auc
import matplotlib.pyplot as plt

3 and 4. So lets normalize the random dataset, in simple words Normalization is a technique used to
change the values of an array to a common scale, without distorting differences in the ranges of
values. It is an important step and you can check the difference in accuracies on our dataset by
removing this step. It is mainly required in case the dataset features vary, this code is discussed in
python module will make two arrays X and Y, X is used for taking all the attributes through will be
will be identifying the values present in column 1 labelled as Y.(true =1 and false = 0)

y = data.iloc[:, 1]
#print(y)
X = data.iloc[:, 2:]
#print(X)
y = [int(bool(i)) for i in y]
#print(y)
y = np.array(y)
#print(y)
For 8, Now our dataset is processed and ready to feed in the neural network.

Generally, it is better to split data into training and testing data. Training data is the data on which
we will train our neural network. Test data is used to check our trained neural network. This data is
totally new for our neural network and if the neural network performs well on this dataset, it shows
that there is no overfitting.

This will split our dataset into training and testing. Training data will have 75% samples and test data
will have 25% samples.

For 9, Building the Neural Network

model = tf.keras.Sequential()

model.add(Dense(12, activation='relu', input_shape=(len(X.iloc[0]),)))

model.add(Dense(32, activation='relu'))

model.add(Dense(128, activation='relu'))

model.add(Dense(256, activation='relu'))

model.add(Dense(256, activation='relu'))

model.add(Dense(64, activation='relu'))

model.add(Dense(16, activation='relu'))

model.add(Dense(8, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

Keras is a simple tool for constructing a neural network. It is a high-level framework based on
tensorflow, theano or cntk backends.

We have used a total of 9 layers, input layer with dimension 12, 7 consists of hidden layer and outer
layer of dimension 1.

Now lets understand the code line by line.

Sequential specifies to keras that we are creating model sequentially and the output of each layer
we add is input to the next layer we specify.

model.add is used to add a layer to our neural network. We need to specify as an argument what
type of layer we want. The Dense is used to specify the fully connected layer. The arguments of
Dense are output dimension which is 12 in the first case and taking input as one attribute labelled as
X at a time and the activation function to be used which is relu in this case. The second layer is
similar, we dont need to specify input dimension as we have defined the model to be sequential so
keras will automatically consider input dimension to be same as the output of last layer i.e 12. This
will be followed till second last layer and in last layer(the output layer) the output dimension is 1 (no
of classes) and the activiation function is sigmoid
Now we need to specify the loss function and the optimizer. It is done using compile function in
keras.

model.compile(loss='binary_crossentropy',

              optimizer='sgd',
              metrics=['accuracy'])

Here loss is Binary_crossentropy, which specifies that the cross-entropy loss between true labels
and predicted labels. The optimizer is Adam. Metrics is used to specify the way we want to judge the
performance of our neural network. Here we have specified it to accuracy.

Now we are done with building a neural network and we will train it.

Training model is simple in keras we ue model.fit to train the data

H = model.fit(X_train, y_train,epochs=10, batch_size=26, verbose=1) #77.16 -


test, 81 train

y_pred = model.predict(X_test).round()
#y_pred_bool = np.argmax(y_pred, axis=1)

Here X_train is the input data, y_train is the label, epoch which is 10 and batch size = 26 we have
batch size as 26 as the dataset is very big and we can not fit complete data at once so we use batch
size. Now only this number of samples will be loaded into memory and processed. Once we are done
with one batch it is flushed from memory and the next batch will be processed.

So we can see that our training accuracy is blah blah…

Now we can check the confusion matrix

a = confusion_matrix(y_test, y_pred)

print(a)
print('Classification Report:\n', classification_report(y_test, y_pred))
print('Training Accuracy : {:.2f}
%'.format(accuracy_score(model.predict(X_train).round(), y_train) * 100))
print('Testing Accuracy : {:.2f}%'.format(accuracy_score(y_pred, y_test) *
100))
print('Sensitivity: {:.2f}'.format(a[1][1]/(a[1][0]+a[1][1])))
print('Specificity: {:.2f}'.format(a[0][0]/(a[0][0]+a[0][1])))
print('Precision: {:.2f}'.format(precision_score(y_test, y_pred)))
print('F1 Score: {:.2f}'.format(f1_score(y_test, y_pred)))
print('Recall: {:.2f}'.format(recall_score(y_test, y_pred)))

1-9:

Pandas: For reading the dataset

Numpy: For arrays manipulation


Sklearn: for label encoding as well as stuffing and to split the dataset into training and testing part,
accuracy score, precision score, f1_score, recall score, ROC curve and AUC

Tensorflow: in order to implement neural network

Matplotlib: used for visualization of data

12:

Importing dataset and will use pandas for reading the dataset

14-17:

14: ‘Y’ variable is used for labelling 1st column of the dataset ie label_2 (which has true and false)

15: ‘X’ variable is used for taking the attributes ie from 2 nd column to the last column of the dataset

16: few have converted the values of 1 st column from true, false to 1,0 respectively

17: and converting it into an array

22-26:

we have converted the string values of label_poly, SrcIP, DstIP, SrcMAC, DstMac into integer values

27-30:

We have converted the true and false values in SRcIPInVlan and DstIPInVlan to 1 and 0 respectively
and further converted into an array

32-33:

We are splitting the data into train data and test data and further using standard scaler to fit the
train data

50-61

I have taken a total of 9 layers each containing 12 neurons followed by 32 and then 128 and again
256 and so on with activation function “relu” and inserting each attribute as input shape and In the
last layer I have used sigmoid function so that I can get output as 0 or 1

Here the loss used us “binary crossentrophy” because I need output between 0 and 1, optimizer as
“sgd” and metrics is equal to accuracy

We are considering 10 epochs and batch size each of 26 at a time and verbose as 1

You might also like