The Malware That Wasnt

By: Roger Johnsen, 29.12.2024

Banner Banner

Introduction

In this article, I’ll take you through my journey of reverse engineering a malware sample that was sent to me for investigation. The malware arrived as an executable file and, at the time, was shrouded in mystery. While it did have some detections on VirusTotal, it remained largely absent from other threat intelligence feeds I rely on, making it an interesting case to explore. Join me as I detail each step of the process, from the initial analysis to unraveling the malicious operations hidden within.


Methodology

Let us first kick this off saying a word or two about methdologies. Adhering to a methodology when analyzing files is essential for maintaining consistency, efficiency, and accuracy. A structured approach minimizes the chance of missing critical details and allows analysts to work methodically, saving both time and effort. It also ensures reproducibility, enabling others to replicate the process and validate findings - a key factor in legal and investigative scenarios. Additionally, a well-defined methodology supports comprehensive analysis by covering crucial aspects such as metadata, file content, and network communications. It enhances quality control by identifying gaps, promoting thorough documentation, and fostering transparency, accountability, and compliance with regulations.

The following methodology is designed from a SOC analyst’s perspective to facilitate a swift initial investigation of suspicious files before escalating them to higher tiers. The process is divided into three main stages:

  • Preliminary Analysis
  • File Analysis
  • Reverse Engineering

The exact steps within each stage can vary depending on the case. For this particular scenario, the methodology was structured as follows:

flowchart  LR
    START((Start)) --> PI

    subgraph PI["Preliminary Analysis"]
        A[Virustotal]
    end

    PI --> FA 

    subgraph FA["File Analysis"]
        B[Record Evidence Information]
        C[Identity Binary]
        D[Detect Human Readable Strings]

        B-->C-->D
    end

    FA-->RE

    subgraph RE["Reverse Engineering"]
        E[pyinstxtractor]
        F[PyLingual]

        E --> F
    end

    RE-->STOP((Stop))

The investigation was carried out using my analysis platform of choice, Fedora Linux with custom selected tool chain.

Preliminary Analysis

When I received this malware sample, I first uploaded it to VirusTotal to check if it was already known. However, in some cases, depending on the nature of the sample and the specific circumstances, I would make a carefully considered decision not to upload it immediately. In this particular instance, I chose to upload the sample to VirusTotal right away, as there were no immediate, severe or any active threats ongoing.

VirusTotal detection VirusTotal detection

Seven out of seventity two engines had something to say about the malware sample, which was remarkably few. For clarity, I have included the engine results below.

VendorDetection
AvastWin64:Malware-gen
AVGWin64:Malware-gen
MalwarebytesMalware.AI.4283851620
SecureAgeMalicious
SentinelOne (Static ML)Static AI - Suspicious PE
Skyhigh (SWG)BehavesLike.Win64.PUPXEO.vc
ZillyaTrojan.Agent.Win32.4064176

Much can be said about the engines on VirusTotal. For example, several of them use the same underlying detection engine. Avast and AVG, for instance, share the same detection technology, as both are owned by Gen Digital Inc. (formerly Avast Software and NortonLifeLock). Their antivirus products leverage the same core technologies, often resulting in identical or very similar detections, such as “Win64:Malware-gen.”

However, when examining which engines flagged the sample, I found it odd that major players like Microsoft, Fortinet, Palo Alto Networks, Trend Micro, Sophos, and Symantec had nothing to report. In my view, when industry leaders of this caliber - who I consider forerunners - fail to detect anything, yet lesser-known engines flag the sample, it strongly suggests that the detection is likely a false positive. I though I’d better move into file analysis stage.

File Analysis

Recording Evidence Information

Calculating file hashes is essential in malware investigations as it verifies file integrity, uniquely identifies files, and enables quick classification by comparing against threat databases like VirusTotal. Hashes help detect duplicates, share threat intelligence, provide historical context, and automate analysis workflows. They also serve as evidence in forensic investigations, ensuring file authenticity and maintaining a chain of custody.

In my case I use hashes to verify that I am working on the correct version of the sample, and to refer to a particular file in the report. To start, I calculated cryptographic hashes of the suspicious sample for reference:

md5sum main.exe
c6c144b3278373a9a7cecdae34ec6a44  main.exe

sha256sum main.exe
49f908c69750b369ea54fc48ed85be624574599815e82669bd95d2bfe5e611f7  main.exe

Identify Binary

The file command is used to quickly determine key details about the sample, such as its file type (PE32+ executable), architecture (64-bit), application type (console), platform (Windows), and structure (6 sections). This initial information helps guide further analysis by confirming the file’s format and preparing the right tools and environment for deeper investigation.

file main.exe
main.exe: PE32+ executable (console) x86-64, for MS Windows, 6 sections

Detect Human Readable Strings

Running strings -n10 main.exe extracts human-readable strings from the malware, revealing potential indicators of compromise (IOCs), hardcoded information, error messages, and clues about its behavior. This helps us quickly identify malicious activity, hidden details, and communication patterns, aiding in faster investigation and analysis.

strings -n10 main.exe

Sample output (end excerpt) from my malware sample:

... SNIP ...
bbase_library.zip
blibcrypto-3.dll
bpython313.dll
bselect.pyd
bucrtbase.dll
bunicodedata.pyd
opyi-contents-directory _internal
zPYZ-00.pyz
9python313.dll

At the very end of the output from the above command there were some references to Python:

  • bpython313.dll
  • 9python313.dll

The appearance of bpython313.dll and 9python313.dll strongly suggested this was a Python-based application. To dig deeper, we search for further Python-related references:

strings -n10 main.exe | grep -i python

Nothing much of interest revealed, except for some minor artefacts:

pyi-python-flag
Reported length (%d) of Python shared library name (%s) exceeds buffer size (%d)
Path of Python shared library (%s) and its name (%s) exceed buffer size (%d)
Failed to pre-initialize embedded python interpreter!
Failed to allocate PyConfig structure! Unsupported python version?
Failed to set python home path!
Failed to start embedded python interpreter!
9python313.dll

It is still apparent that this is a Python application packed as a binary. Most likely it was either packed using PyInstaller or Py2Exe. But which? Let’s find out by searching for more Python references!

strings -n10 main.exe | grep -i py

And yes - there were more Python references:

[PYI-%d:%s]
[PYI-%d:ERROR]
Absolute path to script exceeds PYI_PATH_MAX
_pyi_main_co
PYINSTALLER_RESET_ENVIRONMENT
_PYI_ARCHIVE_FILE
_PYI_APPLICATION_HOME_DIR
_PYI_PARENT_PROCESS_LEVEL
_PYI_SPLASH_IPC
Invalid value in _PYI_PARENT_PROCESS_LEVEL: %s
PYINSTALLER_STRICT_UNPACK_MODE
... SNIP ...

Based on these indicators, I confidently concluded the binary was packed using PyInstaller .

Reverse Engineering

Knowing both tools from past reverse engineering sessions, I went on trying to reverse engineer the sample. For this I used the following tools:

ToolDescription
PyInstaller ExtractorExtracts contents from PyInstaller-packaged executables.
PyLingualA Python decompilation service for restoring bytecode to source code with semantic verification.

Reversing process

First, had to I extract the compiled Python files (.pyc and .pyz) from the binary using the tool pyinstxtractor - like so:

python3 ../Tools/pyinstxtractor/pyinstxtractor.py main.exe

Artefacts were extracted into a folder called main.exe_extracted:

main.pyc
pyiboot01_bootstrap.pyc
pyimod01_archive.pyc
pyimod02_importers.pyc
pyimod03_ctypes.pyc
pyimod04_pywin32.pyc
pyi_rth_inspect.pyc
PYZ-00.pyz
PYZ-00.pyz_extracted
struct.pyc

There were several .pyc files available, but the one that caught my eye were main.py. Whenever I see a main named file it basically means the entry point for the file (or anything else resembling an entrypoint). In order to decompile it I uploaded it to PyLingual :

PyLingual Upload page PyLingual Upload page

After processing, the decompiled source code was revealed:

PyLingual decompiled code PyLingual decompiled code

Here’s the resulting code:

# Decompiled with PyLingual (https://pylingual.io)
# Internal filename: main.py
# Bytecode version: 3.13.0rc3 (3571)
# Source timestamp: 1970-01-01 00:00:00 UTC (0)

import requests
from pprint import pprint

class Software:

    def __init__(self):
        return None

    def run(self):
        res = requests.get('https://api.ipify.org?format=json')
        pprint(res.json())
if __name__ == '__main__':
    print("This is the Malware that wasn't! ;)")
    Software = Software()
    Software.run()

The other Pyc’s did not contain anything relevant and I skipped these.

Conclusion

The analysis concluded here. This so-called “malware” lacked traditional characteristics commonly associated with malicious software, such as complex library calls, process carving, or code injection techniques. The original sample was somewhat more complex, but I had to simplify it for this article. Despite detections on VirusTotal, the analysis revealed that the binary was not malicious, highlighting that we cannot fully trust VirusTotal’s verdicts, though it can guide us. This case emphasizes the importance of thorough investigation beyond tool detections and reinforces the value of context and a structured methodology in cybersecurity analysis. It’s also important to note that VirusTotal flagged PyInstaller artefacts, not the underlying code itself - and that is something we should carry with us in future investigations.


Revised DateAuthorComment
29.12.2024Roger JohnsenArticle added