Malware research project
Type: #Project
Link: https://github.com/ItWozNotMe/MalwareResearch
Aims & Goals
- Develop a framework for automated malware analysis
- Illustrate malware attributes using the synthesised dataset
To use the framework, a dataset will need to be retrieved this can be any dataset, I used https://www.unb.ca/cic/datasets/malmem-2022.html the hashes from the Excel file can then be scraped using regex alongside a simple Python script and put into a txt file. The text file is iterated through line by line inputting the hash into the URL and writing it to a JSON file.
Each line that is iterated through a count is added as VirusTotal only allows for 500 free queries a day, allowing the script to stop at 500, the script may stop earlier but can be re-run from the hash it ended at, “updateTxt” is used to remove hashes from the text file via string slicing this prevents querying the same hash twice. Upon reaching the desired amount of hashes or finishing the dataset, the createCSV.py script can be employed to allow for readable data. This script uses a series of regular expressions to find data, this was employed as the JSON data was embedded, and as all malware samples were different some did not contain relevant data. This decision allows for a more scalable approach opting to avoid errors or false data.
The data was cleaned before compiling into a CSV file, through a series of functions dependent on the data type illustrated below.
The getData() function works through the creation of a pandas data frame opening the first iteration of final_dataset.txt to get all hashes used in the project, then iterates through the hash.JSON files to get the required data, each iteration is written to the pandas data frame after being formatted. This code is usable and scalable available at https://github.com/ItWozNotMe/MalwareResearch/blob/main/createCSV.py alongside my Dissertation that provides a more in-depth process of the project detailing the malware analysis, and future project improvements. Additionally, the CSV file can be downloaded at https://github.com/ItWozNotMe/MalwareResearch/blob/main/Final_Dataset.csv.