Skip to main content

Evaluating the Performance — Part 4

 We’ll need to evaluate the performance of the detector built to ensure that we are achieving a higher true positive rate than a false positive rate. Also as we increase the types of features built and used, we’ll need to monitor their performance.

ROC Curve

In order to evaluate the performance of the detector, we are going to use the Receiver Operating Characteristic (ROC) curve. We plot the false-positive rates against the true positive rates at various thresholds. This will help determine how to configure our detector to get the optimal settings. Detectors are not perfect, there will be false positives but we can use this method to reduce the false positive rate and increase our true positive rate. 
When you think about the process and the possibilities then it seems like a never-ending story but we should look at it as evolving our detector.

As we implement our function to evaluate the detector performance, we will delve further into the requirements of the ROC curve and see the results.

Evaluate function

We have a function called cv_evaluate (remember, as always all the code is on my GitHub page):

def cv_evaluate(X,y,hasher)

We’ve known that “X” is the malware data, “y” is for the benign software data, and the hasher is set to “20000”
We’ll need to convert the training data (X, y) to “NumPy” arrays so that we can use NumPy's enhanced array indexing. (https://numpy.org/devdocs/user/quickstart.html)

The type of evaluation we will be doing is “cross-validation”. Cross-validation allows you to split the training examples into x fold and run a test where we pit the folds against each other by rotating each fold which is done by running multiple tests. 
Sklearn has a class called “KFold” which has the module “cross-validation” we will use. (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html)

for train, test in KFold(len(X),3, shuffle=True):
 training_X, training_y =X[train], y[train]
 test_X, test_y = X[test], y[test]

We pass the number of training examples into KFold, then we tell it we want 3 folds and to shuffle the data before dividing it into the folds.
Whilst in the loop, we will set our training X and y, test X, and y to the corresponding elements based on the result from KFold.

With our training data X and y set, we need to run it through our classifier (remember that we use the below to determine if it can tell if the binary features we passed into it is from malware of benign software.)

classifier = ensemble.RandomForestClassifier()
 classifier.fit(training_X, training_y)

Now that we’ve fitted the data in, we can now get the scores:

scores = classifier.predict_proba(test_X)[:,-1]
fpr, tpr, thresholds = metrics.roc_curve(test_y, scores)

“fpr” = false positive rate and “tpr” = true positive rates

With our scores, we use “pyplot” from “matplotlib” to build our graph to see the outcome.

Now…we just need to run like a gazillion tests…..

Comments

Popular posts from this blog

Malware Analysis: Dissecting a Golang Botnet - Part 1

Introduction In this post, I walk through the process of analyzing a Golang-based botnet sample — specifically a variant of FritzFrog , a peer-to-peer (P2P) botnet known for brute-forcing SSH servers and spreading laterally across networks. The goal here is to share my steps, tools, and insights while preparing for a cybersecurity analyst role.  🐸 1. Downloading the Malware Sample I began by grabbing the malware sample from Da2dalus’ excellent GitHub repository of real-world malware: URL: FritzFrog Sample on GitHub To fetch the raw binary into my WSL environment, I used: wget -O botnet_malware_IM https://github.com/Da2dalus/The-MALWARE-Repo/raw/refs/heads/master/Botnets/FritzFrog/001eb377f0452060012124cb214f658754c7488ccb82e23ec56b2f45a636c859 📤 2. Transferring the Malware to the Flare VM (Windows) My analysis environment was running inside a Windows VM using FLARE VM . Since the malware was downloaded via WSL, I needed a way to securely transfer it to the Windows VM. First, ...

Building my own write blocker

  Spoiler — It’s cheaper than buying one I was looking to buy a write blocker to do data recovery/forensics tasks but I quickly noticed that I was window shopping write blockers due to their cost. Some starting at £300, others that cost less were no longer being built or sold, maybe you could find a 2nd hand one with or without the wires. Most of these write blockers were industry standard, used by law enforcement but was it necessary for me to buy such an expensive write blocker….or is it possible to build my own….. So th e  research began, reading through articles, publications, and so on, and with the information gained, I felt that I could build my own write blocker. So what do I need: A Raspberry Pi A Linux distro. HDD/SSD to test the write blocker And to put the information I gained into practice Building the write blocker So, I brought a Raspberry Pi 4 Model B that came with a power supply, HDMI cables, 32GB SD card, a case, and some extras. ( https://www.okdo.com/c/pi-...

Notes from a Linux command line course

 Recently I took a course on Linux command line and shell scripting, below are the notes I took which I decided to write into a blog to refer to for future reference (there's no way I could remember all of this in a single sitting) 1. Kernel vs Shell OS has 3 layers: Application layer - User apps, Daemons Shell - Command line interface. Kernel - Hardware management, memory management, I/O Handler, CPU, process management. Closest layer to the hardware The kernel controls and mediates access to hardware, for example, it schedules and allocates system resources like memory, CPU, disk etc. The shell works as an interface to access the services provided by the OS. We can further breakdown the layers into the following: User space - If you run a for loop etc, you are in user space. But when you want to perform an operation such as, write to the disk, for example, save a file, then it needs to talk to the kernel space. As the application can't directly talk to the hardware. Kernel sp...