Authored by: Subrat Sarkar, Arunpreet Singh, and Clemens Kolbitsch
Diving deeply into the ModPOS malware framework using sandbox process snapshotting
Point-of-sale (POS) systems are amongst the most valuable targets for attackers today: with direct access to systems processing payment information, miscreants are able to circumvent any encryption between point-of-sale devices and the payment processor, allowing them to spy on - or even tamper with - sensitive payment information.
With ModPOS malware authors have developed a system that not only compromises payment processes at the origin device, but, at the same time, it does so from the kernel of these systems, well outside the reach of most security solutions.
As we will describe in this post, the ModPOS malware is much more than a system for compromising POS systems: it is a versatile framework that allows an attacker to leverage a practically unlimited range of tools to interfere with a compromised system. Even more, this malware works on any 32-bit Microsoft Windows system (many POS systems on the market today are still running on Microsoft Windows XP) allowing to use this malware on more than just POS systems.
By leveraging process snapshots extracted by the Lastline FUSE sandbox, we walk through the most important parts of this versatile framework. We will show in detail how the infection spreads through the entire operating system, and how the attackers behind ModPOS can leverage the framework to load arbitrary plugins into user- and kernel-space of an infected system.
Although ModPOS was only first reported late last year by iSight in their blog alongside a high-level technical paper published at the same time, the iSight analysis suggests that parts of the system can be found as far back as 2012. With such a long history, it comes at no surprise that the malware authors behind this framework have picked up number of interesting tricks making it extremely powerful.
So what do we mean by saying ModPOS is a framework? ModPOS is not only a specific attack on point-of-sale devices, but instead it attacks a device’s kernel and provides an attacker with a versatile and modular system for loading arbitrary plugins into the system (we have seen PlugX use a similar plugin-based approach a while back) . These plugins carry out the actual malicious behavior, such as leaking payment information from the POS device to the command-and-control infrastructure, changing payment details to the attacker’s choosing, or stealing sensitive information (using memory scraping to steal passwords) from the device to compromise more systems and expand the attacker’s control through lateral movement.
ModPOS plugins are loaded on-demand when instructed by the attacker: the framework downloads a plugin from the command-and-control backend over an encrypted channel (interestingly this is plain HTTP containing encrypted content using a key embedded in an HTTP header), and executes the plugin’s functionality by injecting new code into a user- or system-process of the attacker’s choosing. Thus, to analyze such a versatile framework, analysts clearly require a system that is able to dive deeply into malware running in kernel-mode, and that is able to see all behaviors performed there. Otherwise, without this level of visibility, an analyst is blind to the behavior of the ModPOS framework or the activity of a given plugin.
The Lastline sandbox uses full-system emulation (FUSE), which is able to extract behavior at the instruction level in user- as well as kernel-mode. Because of this, we can monitor all behavior necessary to understand the behavior of the ModPOS framework. Even by only focusing on the actions performed by the framework (that is, without an attacker actively pushing a plugin performing a specific malicious behavior), can we classify this malware by identifying that the kernel module is able to inject data into (system) user-mode processes or contacts a command-and-control site:
Even more, our sandbox extracts full-process snapshots whenever new code blocks are found inside an analyzed process (for example, after code has been unpacked or is injected into another process). This allows us to analyze the different stages of the malware through off-the-shelf tools, like IDA Pro.
In the remainder of this post, we will walk through the different steps of how ModPOS works using process snapshot examples generated by the Lastline FUSE analysis system.
While it is not yet clear how most POS infections happen in detail, the fact that most systems on the market today run on the discontinued Microsoft Windows XP OS allows attackers to deliver the malware using a plethora of ways (such as delivering the malware through remote exploits or by using social-engineering to trick a user into running an infected file on a target host).
In this post, we inspect one ModPOS variant that targets 32-bit Windows XP. The initial infection (dropper component) is a packed executable (using the same packing technique explained in Section Unpacking: Stage 2). This dropper contains an encrypted PE file (embedded as Bitmap resource) that is decrypt byte-by-byte with a key of 16 bytes and subsequently executed.
After unpacking, the malware injects a 32-bit driver into the Windows kernel using an interesting mechanism: instead of registering a new service (potentially raising suspicion), the malware searches for an existing driver-service on the system (selecting one at random) and reusing this service for loading its own driver.
To do so, the malware reads the legitimate driver’s image module path and loads this (benign) image into memory. Next, it overwrites the driver image on disk with its own (malicious) driver (again, embedded inside its own PE) and starts the driver-service either using ZwLoadDriver or by sending a command to the Service Control Manager for starting the hijacked service. Once the malicious driver has been loaded into the kernel, the malware restores the original driver on disk (using the data previously loaded into memory) and terminates.
The Lastline sandbox is able to track malware running in the user- as well as kernel-mode. Thus, the system finds that new code is loaded into the kernel and continues tracking the malware code there:
Even more, whenever the sandbox encounters interesting artifacts (such as code unpacked in memory, code injected into a user-mode process, a driver loaded into the kernel, or a file being written or modified on the file-system), the sandbox automatically extracts this artifact to allow further analysis by users:
The driver loaded into the kernel is obfuscated through several layers of packing: in the first layer, the sample contains multiple functions that implement basic mathematical operations (such as add and xor) that operate on the second stage that is hard-coded in the dropped file.
For example, the following function EncData_Add
takes one argument that is added to data at offset B7345A00 (the location of the second stage payload in memory).
Interestingly, most of these arithmetic operations done by the unpacker in the first stage are only included to throw off human analysts: the main loop of the unpacker generates a set of random numbers using functions like PsGetCurrentThreadId, PsGetCurrentProcessId, KeQuerySystemTime, or GetRuntimeCount as seeds. These random numbers are then passed as arguments to invoke the arithmetic functions mentioned above. At the same time, the arguments are stored on a separate stack, so that later the same operations can be applied in reverse order. Given the nature of these add and xor functions, this means that the content of the second stage is back to its original content once the loop completes. Our dynamic analysis is of course not affected by this technique.
Once this decoy decryption is complete, the actual main decryption function uses the first 64 bytes of the encrypted data (at 0xB7345A00 in our case) as a XOR key to decrypt the remaining data (stored beyond the first 64 bytes). In multiple iterations over the code, the XOR key is used to decrypt 4 bytes at a time (location B734529F) and is rotated right by 3 (bits) after each iteration (location B73452A9) . Eventually, the unpacker jumps to the decrypted, second-stage code (location B73452BB):
Unpacking: Stage 2
The first 0x2FF bytes of the second stage payload act as minimal loader: this small code section is responsible for allocating memory, arranging code sections, relocating variables and jumps, building import tables, and finally jumping to the final payload.
The loader reads header information from memory right after the loader code itself (that is, after the first 0x2FF bytes of the second stage). This information is separated into four different headers: main-,
section-, relocation-, and import-header.
The Main Header contains 5 fields:
- Size: complete size of data
- Base Alignment: base values subtracted when relocating a value
- AOE: address of entry point to data
- Relocation Sections: number for relocation header
- Sections: number of sections present inside the header
The Section Header contains 3 fields:
- Index: section start address (RVA) in memory
- Section Size: length of the raw section data
- Data: raw section data
The Relocation Header contains values that need to be changed to make code work properly after relocation. The header stores the offsets of values inside the memory to relocate; during relocation, the values stored at these offsets are adjusted to the new base-address (plus alignment) of the memory.
The header has 2 properties:
- Element Count: number of elements in the relocation array
- Array of Offsets: each offset entry points to the relative virtual address (RVA) that needs to be updated as part of the memory relocation
The Import Header contains the imports that the code needs to run. The code builds a lookup table of function addresses for each function and uses this table to later call the functions. Each entry in the header has 3 fields:
- Module Name: null terminated ASCII string storing the module from which the API function is imported
- Index: lookup table index at which the address of the imported function is stored
- API Name: null terminated ASCII string storing the name of the API function to import
The second stage loader starts with allocating the required memory as defined in the main header (Size property at 0xF080 + additional 0x100 bytes). Then, it reads the section header, copies all sections (based on header details), and performs the relocation of addresses using the data from the relocation header.
Next, the loader builds imports as follows:
- read the name of the library (module name) from the import header and obtain the base address in memory at which the module is loaded using NtQuerySystemInformation (with value SystemModuleInformation(0xB) as parameter),
- read the Index number from the import header and calculate the address (base-address of the lookup table + index) where the function address will be stored, and
- read the API function name from the import header, resolve the address of the function address in memory, and store this information in the lookup table at the calculated address.
With all imports in place, the loader gets the offset to AOE (address of entry point) from the main header, adds it to the base address, and jumps into this final payload at the computed address.
Unpacking: Final Payload
The final stage starts by importing system native service routines (“Nt-functions”, such as NtCreateFile or NtWriteVirtualMemory) from ntdll into the kernel, since these functions are not directly accessible to kernel modules. To do so, the code parses the ntdll module’s export table and code section to get the System Service Descriptor Table (SSDT) index numbers:
With this information, the code can now directly call native routines without having to call the “Zw-function” counterparts (such as ZwCreateFile or ZwWriteVirtualMemory) that go through the SSDT for each call.
To complete the unpacking, the code creates a system thread and launches the decrypted ModPOS functionality.
System Thread Analysis: User-Mode Injection
The main functionality of the ModPOS framework operates as system thread inside the Windows kernel. Once invoked, it first generates a system fingerprint, using information such as processor name, processor serial number, and the system drive serial number. This information is combined into a system GUID using key 0x380F7A53 to XOR-encode the gathered information. This GUID is used as name for creating a Mutex (via NtCreateMutant) - if this mutex is already present on the system, the thread simply exits assuming the framework is already running.
As next interesting step, the thread injects code into a trusted user-mode process csrss.exe (windows subsystem process). To this end, it enumerates the processes running on the system using NtQuerySystemInformation (with parameter SystemProcessInformation(0x5)), checks the name of each process, and, once found, opens the target process. The code retrieves the process handle and allocates 0x1000 bytes inside the target process using NtAllocateVirtualMemory. Then it combines two location-independent shellcode buffers embedded inside the kernel module and writes them into the target process's memory through NtWriteVirtualMemory (location 89837A07). This shellcode serves as entry point to running in user-mode as described in more detail below. Once again, as new code is injected into another process, the sandbox extracts this code to allow further analysis.
Next the system thread queues an asynchronous procedure call (APC) on all threads of the target process via NtQueueApcThread (location 89837A51). The APC entry point is set to the base address of the allocated memory (location 89837A46) at offset of 0x1C (location 89837A4C):
Then the system function waits for the injected code to be executed in the context of the target process. For this, it tries (up to 20 times) to read a specific memory address inside the target process (location 89837A8E) that will be filled by the injected code once the APC has been executed. The value used for polling is the thread ID for which the APC was invoked (location 89837A98):
User-Mode Analysis: csrss.exe
Similar to the final stage of the of the kernel component, the code injected into csrss.exe starts by resolving addresses of API functions. More specifically, the code registers an exception handler (location 00E000A8) for ignoring memory read violations and starts searching for the Kernel32 module by iteratively reading memory addresses starting at address 0x80000000 (down to address 0x0).
When a PE section is found in memory (by matching the “MZ” PE header at a 0x1000 byte alignment), the code parses the export table from memory. The code locates interesting functions by matching the function name against hashes of API functions. To compute the hash of the function name, the code uses the same custom algorithm also used by the kernel-mode driver component described earlier:
For example, these are hashes the code is searching for:
Once the location of the required kernel32 functions is known, the code creates a new thread (to exit the context of the APC and make tracking control flow more difficult) and exits. The entry point of this new thread is Base Memory Address + 0x2E3.
Since the new thread has its own thread ID, it stores this updated value to the memory location the driver component is polling:
Once the system thread (polling in a loop) has read the user-mode thread ID, it knows that the injection was successful and establishes a communication channel between the kernel- and user-mode components. To this end, the system thread creates a Named Pipe with the name
where the first integer is the user-mode process ID in whose context the user-mode code is running, and the second integer represents how many attempts there have been performed to create this pipe. After successfully creating this pipe, the system thread registers another APC for reading data from the pipe by calling ZwReadFile, passing this pipe as handle to read from.
Code Injection Into Browser and Service Processes
Once the malware has successfully spread from the kernel to the user-mode system process csrss.exe, it continues spreading to other processes running on the system. To do so, it searches for the processes iexplore.exe (Internet Explorer), firefox.exe (Firefox browser), and svchost.exe and gets their process IDs.
For any process found this way, ModPOS injects code into their memory. Interestingly, the injection method is different this time: instead of injecting the code through kernel alone, this time it leverages code running inside csrss.exe to perform parts of the injection as follows:
- Get the target process handle: the driver component gets the list of running process using NtQuerySystemInformation and search for the target process. Once found, it duplicates the handle to the target process ID using ZwOpenProcess.
- Duplicate the handle of the named pipe which was created between the kernel component and the code injected into csrss.exe) using ZwDuplicateObject into target process. This way, code injected into the next process does not need to know the name of the named pipe for communicating with the driver component.
- Write the shellcode into the target process using NtWriteVirtualMemory. This shellcode contains the code as well as the handle for communicating on the named pipe.
- Duplicate the handle to the target process into csrss.exe using ZwDuplicateObject. This way, csrss.exe can interact with the target process without having to know which process is being accessed (that is, the code works the same for any of the above mentioned processes).
- Execute the shellcode via the APC in the context of csrss.exe to inject a remote thread into the target process using CreateRemoteThread:
The Lastline sandbox detects that new code is executed in the context of the hijacked processes and continues tracking the malicious behavior in these processes and - again - extracts process snapshots to allow analysis of the injected code.
Finally the ModPOS code is executing its target payload in the context of the correct process. Once again, the first step is to build the import table - the information is stored in the same way as for the kernel component. After building the import table, the code starts executing its main payload, such as connecting to the command-and-control infrastructure.
The ModPOS C&C works via HTTP, and the payload uses standard Windows API functions to communicate with the server:
The variant that we analyzed for this blog uses an HTTP POST for requesting the page "/robots.txt" on the following IPs
Upon successful connection, the code downloads and stores the C&C response. Interestingly, the response is in HTML format - the code will find HTML comment tag "<!--" inside the response and extract data from it.
Once the data from the C&C server is extracted, the user-mode component can then do whatever the attacker wants: it can load new plugins into memory or use the named pipe to interact with the kernel component (for example to inject the downloaded plugin into the kernel or any other process running on the system).
ModPOS is a sophisticated, heavily-obfuscated and packed malware framework that gives an attacker an incredible amount of control over a target system. The malware uses a series of unpacking steps to spread the malicious code to user- and kernel-mode processes, and injects code into a series of processes on an infected host.
The full-system emulation approach used by the Lastline sandbox allows us to track all parts of the ModPOS infection chain, and to extract the full range of behavior exhibited by this malware in user- as well as kernel-space. This allows us to catch malware even before an attacker pushes a specific plugin to an infected system by detecting spreading throughout the operating system. Additionally, the extracted process snapshots allow for deeper analysis of all steps in the infection chain.