The Malware That Wasnt
By: Roger Johnsen, 29.12.2024
Introduction
In this article, I’ll take you through my journey of reverse engineering a malware sample that was sent to me for investigation. The malware arrived as an executable file and, at the time, was shrouded in mystery. While it did have some detections on VirusTotal, it remained largely absent from other threat intelligence feeds I rely on, making it an interesting case to explore. Join me as I detail each step of the process, from the initial analysis to unraveling the malicious operations hidden within.
Methodology
Let us first kick this off saying a word or two about methdologies. Adhering to a methodology when analyzing files is essential for maintaining consistency, efficiency, and accuracy. A structured approach minimizes the chance of missing critical details and allows analysts to work methodically, saving both time and effort. It also ensures reproducibility, enabling others to replicate the process and validate findings - a key factor in legal and investigative scenarios. Additionally, a well-defined methodology supports comprehensive analysis by covering crucial aspects such as metadata, file content, and network communications. It enhances quality control by identifying gaps, promoting thorough documentation, and fostering transparency, accountability, and compliance with regulations.
The following methodology is designed from a SOC analyst’s perspective to facilitate a swift initial investigation of suspicious files before escalating them to higher tiers. The process is divided into three main stages:
- Preliminary Analysis
- File Analysis
- Reverse Engineering
The exact steps within each stage can vary depending on the case. For this particular scenario, the methodology was structured as follows:
flowchart LR START((Start)) --> PI subgraph PI["Preliminary Analysis"] A[Virustotal] end PI --> FA subgraph FA["File Analysis"] B[Record Evidence Information] C[Identity Binary] D[Detect Human Readable Strings] B-->C-->D end FA-->RE subgraph RE["Reverse Engineering"] E[pyinstxtractor] F[PyLingual] E --> F end RE-->STOP((Stop))
The investigation was carried out using my analysis platform of choice, Fedora Linux with custom selected tool chain.
Preliminary Analysis
When I received this malware sample, I first uploaded it to VirusTotal to check if it was already known. However, in some cases, depending on the nature of the sample and the specific circumstances, I would make a carefully considered decision not to upload it immediately. In this particular instance, I chose to upload the sample to VirusTotal right away, as there were no immediate, severe or any active threats ongoing.
Seven out of seventity two engines had something to say about the malware sample, which was remarkably few. For clarity, I have included the engine results below.
Vendor | Detection |
---|---|
Avast | Win64:Malware-gen |
AVG | Win64:Malware-gen |
Malwarebytes | Malware.AI.4283851620 |
SecureAge | Malicious |
SentinelOne (Static ML) | Static AI - Suspicious PE |
Skyhigh (SWG) | BehavesLike.Win64.PUPXEO.vc |
Zillya | Trojan.Agent.Win32.4064176 |
Much can be said about the engines on VirusTotal. For example, several of them use the same underlying detection engine. Avast and AVG, for instance, share the same detection technology, as both are owned by Gen Digital Inc. (formerly Avast Software and NortonLifeLock). Their antivirus products leverage the same core technologies, often resulting in identical or very similar detections, such as “Win64:Malware-gen.”
However, when examining which engines flagged the sample, I found it odd that major players like Microsoft, Fortinet, Palo Alto Networks, Trend Micro, Sophos, and Symantec had nothing to report. In my view, when industry leaders of this caliber - who I consider forerunners - fail to detect anything, yet lesser-known engines flag the sample, it strongly suggests that the detection is likely a false positive. I though I’d better move into file analysis stage.
File Analysis
Recording Evidence Information
Calculating file hashes is essential in malware investigations as it verifies file integrity, uniquely identifies files, and enables quick classification by comparing against threat databases like VirusTotal. Hashes help detect duplicates, share threat intelligence, provide historical context, and automate analysis workflows. They also serve as evidence in forensic investigations, ensuring file authenticity and maintaining a chain of custody.
In my case I use hashes to verify that I am working on the correct version of the sample, and to refer to a particular file in the report. To start, I calculated cryptographic hashes of the suspicious sample for reference:
md5sum main.exe
c6c144b3278373a9a7cecdae34ec6a44 main.exe
sha256sum main.exe
49f908c69750b369ea54fc48ed85be624574599815e82669bd95d2bfe5e611f7 main.exe
Identify Binary
The file
command is used to quickly determine key details about the sample, such as its file type (PE32+ executable), architecture (64-bit), application type (console), platform (Windows), and structure (6 sections). This initial information helps guide further analysis by confirming the file’s format and preparing the right tools and environment for deeper investigation.
file main.exe
main.exe: PE32+ executable (console) x86-64, for MS Windows, 6 sections
Detect Human Readable Strings
Running strings -n10 main.exe
extracts human-readable strings from the malware, revealing potential indicators of compromise (IOCs), hardcoded information, error messages, and clues about its behavior. This helps us quickly identify malicious activity, hidden details, and communication patterns, aiding in faster investigation and analysis.
strings -n10 main.exe
Sample output (end excerpt) from my malware sample:
... SNIP ...
bbase_library.zip
blibcrypto-3.dll
bpython313.dll
bselect.pyd
bucrtbase.dll
bunicodedata.pyd
opyi-contents-directory _internal
zPYZ-00.pyz
9python313.dll
At the very end of the output from the above command there were some references to Python:
bpython313.dll
9python313.dll
The appearance of bpython313.dll
and 9python313.dll
strongly suggested this was a Python-based application. To dig deeper, we search for further Python-related references:
strings -n10 main.exe | grep -i python
Nothing much of interest revealed, except for some minor artefacts:
pyi-python-flag
Reported length (%d) of Python shared library name (%s) exceeds buffer size (%d)
Path of Python shared library (%s) and its name (%s) exceed buffer size (%d)
Failed to pre-initialize embedded python interpreter!
Failed to allocate PyConfig structure! Unsupported python version?
Failed to set python home path!
Failed to start embedded python interpreter!
9python313.dll
It is still apparent that this is a Python application packed as a binary. Most likely it was either packed using PyInstaller
or Py2Exe
. But which? Let’s find out by searching for more Python references!
strings -n10 main.exe | grep -i py
And yes - there were more Python references:
[PYI-%d:%s]
[PYI-%d:ERROR]
Absolute path to script exceeds PYI_PATH_MAX
_pyi_main_co
PYINSTALLER_RESET_ENVIRONMENT
_PYI_ARCHIVE_FILE
_PYI_APPLICATION_HOME_DIR
_PYI_PARENT_PROCESS_LEVEL
_PYI_SPLASH_IPC
Invalid value in _PYI_PARENT_PROCESS_LEVEL: %s
PYINSTALLER_STRICT_UNPACK_MODE
... SNIP ...
Based on these indicators, I confidently concluded the binary was packed using PyInstaller .
Reverse Engineering
Knowing both tools from past reverse engineering sessions, I went on trying to reverse engineer the sample. For this I used the following tools:
Tool | Description |
---|---|
PyInstaller Extractor | Extracts contents from PyInstaller-packaged executables. |
PyLingual | A Python decompilation service for restoring bytecode to source code with semantic verification. |
Reversing process
First, had to I extract the compiled Python files (.pyc
and .pyz
) from the binary using the tool pyinstxtractor
- like so:
python3 ../Tools/pyinstxtractor/pyinstxtractor.py main.exe
Artefacts were extracted into a folder called main.exe_extracted
:
main.pyc
pyiboot01_bootstrap.pyc
pyimod01_archive.pyc
pyimod02_importers.pyc
pyimod03_ctypes.pyc
pyimod04_pywin32.pyc
pyi_rth_inspect.pyc
PYZ-00.pyz
PYZ-00.pyz_extracted
struct.pyc
There were several .pyc
files available, but the one that caught my eye were main.py
. Whenever I see a main
named file it basically means the entry point for the file (or anything else resembling an entrypoint). In order to decompile it I uploaded it to PyLingual
:
After processing, the decompiled source code was revealed:
Here’s the resulting code:
# Decompiled with PyLingual (https://pylingual.io)
# Internal filename: main.py
# Bytecode version: 3.13.0rc3 (3571)
# Source timestamp: 1970-01-01 00:00:00 UTC (0)
import requests
from pprint import pprint
class Software:
def __init__(self):
return None
def run(self):
res = requests.get('https://api.ipify.org?format=json')
pprint(res.json())
if __name__ == '__main__':
print("This is the Malware that wasn't! ;)")
Software = Software()
Software.run()
The other Pyc’s did not contain anything relevant and I skipped these.
Conclusion
The analysis concluded here. This so-called “malware” lacked traditional characteristics commonly associated with malicious software, such as complex library calls, process carving, or code injection techniques. The original sample was somewhat more complex, but I had to simplify it for this article. Despite detections on VirusTotal, the analysis revealed that the binary was not malicious, highlighting that we cannot fully trust VirusTotal’s verdicts, though it can guide us. This case emphasizes the importance of thorough investigation beyond tool detections and reinforces the value of context and a structured methodology in cybersecurity analysis. It’s also important to note that VirusTotal flagged PyInstaller artefacts, not the underlying code itself - and that is something we should carry with us in future investigations.
Revised Date | Author | Comment |
---|---|---|
29.12.2024 | Roger Johnsen | Article added |