Skip to main content

Posts

Showing posts from January, 2023

Evaluating the Performance — Part 4

  We’ll need to evaluate the performance of the detector built to ensure that we are achieving a higher true positive rate than a false positive rate. Also as we increase the types of features built and used, we’ll need to monitor their performance. ROC Curve In order to evaluate the performance of the detector, we are going to use the Receiver Operating Characteristic (ROC) curve. We plot the false-positive rates against the true positive rates at various thresholds. This will help determine how to configure our detector to get the optimal settings. Detectors are not perfect, there will be false positives but we can use this method to reduce the false positive rate and increase our true positive rate.  When you think about the process and the possibilities then it seems like a never-ending story but we should look at it as evolving our detector. As we implement our function to evaluate the detector performance, we will delve further into the requirements of the ROC curve and ...

Applying Data Science to Malware — Part 3

  Now we will build a machine learning detector. In order to build a machine learning detector, we need to extract a substantial amount of features from our software binary, not just malware because the point of the detector is to determine whether the software binary is malicious or benign.  But at this moment in time, I’m only using the strings feature, in the future I plan to add more features. Strings feature def get_string_features(path,hasher):  chars = r” -~”  min_length = 5  string_regexp = ‘[%s]{%d,}’ % (chars, min_length)  file_object = open(path)  data = file_object.read()  pattern = re.compile(string_regexp)  strings= pattern.findall(data) string_features = {}  for string in strings:  string_features[string] = 1 hashed_features = hasher.transform([string_features]) hashed_features = hashed_features.todense()  hashed_features = numpy.asarray(hashed_features)  hashed_features = hashed_features[0] print “Extracted...

Applying Data Science to Malware — Part 2

  Shared code analysis In the last section, I wrote about building networks and producing a visual graph that shows the connections between Malware. In this section, I will go through the script where we create a system that will show the links between Malware based on shared code analysis. Terminology Before we start to build the system, we first need to understand the following: 1. Jaccard index 2. Minhashes Jaccard index The Jaccard index is quite simple, it is worked out by diving the total of shared attributes (between malware) and the total attributes. For example: Jaccard index = 0.5 when shared attributes (5) / total attributes (10). Now, this is useful for small data sets, but when we want to compare large data sets then we turn to “minhashes”. Minhashes Now Minhashes isn’t so simple. A minhash is a technique used to estimate the similarity of two sets.  Our minhash is a malware sample’s feature (in our below system the features will be the results from “strings”) and...

Applying Data Science to Malware —Part 1

  With Malware exploding in numbers, I decided to learn and apply Data Science to Malware. So first I need a number of Malware samples, which I obtained from  https://github.com/fabrimagic72/malware-samples Now the following techniques can work on any set of Malware, maybe if your a business/organization who is being targeted or you’ve been following a certain group of Malware authors and you want to see how the Malware is connected, if they use the same resources, hosts, code, etc then that would yield some interesting data and start to paint a picture. Unfortunately, I don’t have access to those sets of Malware but that doesn’t say we can’t apply the techniques to Malware collected from honeypots. Ransomeware samples From the Malware samples, the Ransomware folder looks to have a number of samples we could apply the techniques on. Step one: unzip all the Malware within that dir: find . -name “*.zip” | while read filename; do 7z x $filename -pinfected -aou; done; Step two: st...