Leveraging the new clustering feature of the Lastline Knowledge Base to study recent ransomware threats
Authored by: Grégoire Jacob and Stefano Ortolani
With this blog post, we want to demonstrate how you can leverage the Lastline Knowledge Base and its new clustering feature to extract some key observations around a given threat. Using recent real world threats as study cases, we present different workflows to retrieve the analysis data related to these threats, to cross-reference this data with online information and blogs and, finally, to extract from this data actionable items to react to these threats and build informed remediation plans.
Clustering at Lastline
Before diving into our study cases, a few words to present our new clustering feature. The Knowledge Base now offers clustering services in order to group analyzed executables into families of similar programs or threats. The service supports multiple clustering perspectives in parallel by considering different approaches to compare samples and determine their similarity:
Similarity based on runtime activity: Dynamic clusters identify malware families sharing common observable behaviors during their execution. Such behaviors encompass sharing a common C&C infrastructure, reusing the same persistency mechanisms, or targeting and tampering with the same system components.
Similarity based on code structure: Code-hashes clusters identify malware families sharing important portions of their code base. Code-hashes identify the different code blocks of the analyzed programs; these hashes are computed in a resilient way, making them resistant to various factors such as the relative location of the code in memory. The extraction of these code blocks and their hashing are performed by our sandbox during execution so the unpacked code can be accessed. By construction, code-hashes clusters are less influenced by the dynamic environment and the configuration embedded by the malware author (e.g. C&C configurations). These factors are mainly determined by the data embedded in the sample and the data access at runtime. Instead, code-hashes clusters rely on a notion of similarity based on equivalent functionalities, the functionalities being determined by the sample code blocks.
The clustering results provided by the service are leveraged to identify samples and associate them to known threat families.Threat identification helps Incident Response (IR) and Security Operations Center (SOC) teams in their process of remediation and recovery as we will demonstrate in this post.
Rise of the ransomware threat
To choose interesting study cases, we decided to look at recent news and blogs in security. Recent reports from various companies and governmental organizations such as the FBI show that ransomware have become a growing concern over the past years. Ransomware families are rapidly multiplying as shown in the timeline below.
A lot of material can be found about ransomware and the different techniques used to build their payload. In 2015, a Lastline presentation made at Black Hat underlines the predictability of their behaviors with intensive searches over the local and network accessible file systems to look for potential targets, the modification of numerous sensitive files increasing their entropy due to encryption, and finally noisy user notifications to ask for a ransom. Despite these shortcomings, ransomware remains a lucrative criminal activity and new families keep appearing on a regular basis. In this blog post, we chose to cover two of these new families: CryptXXX and Zepto.
Ransomware CryptXXX (a.k.a. Exxroute)
Mid-April, Proofpoint released an article about a new ransomware family called CryptXXX. This family has shown a fast evolution with already a version 3.1 disclosed in a follow-up article from June. We recently observed a sample of this family:
We observed in this sample the regular behaviors we find in most ransomware families with an intensive sweep over the different directories of the guest system in search for documents and other sensitive data files to encrypt. The picture below shows the analysis overview produced by Lastline for this sample.
With the new Lastline release, the Knowledge Base provides additional information around a sample with the attribution of the sample to different types of clusters. This information is integrated to the analysis results and a new section called ‘Intelligence Attribution’ has been introduced in the analysis overview. The clustering results for the given sample are pictured below. You can see in this image that a sample belongs to multiple clusters depending on the operating system used for analysis but also depending on the chosen notion of similarity, whether dynamic execution of static code similarity is considered.
In this case, multiple clustering results converge towards CryptXXX. This information offer important additional value. By naming the threat, we can leverage this information to decide how to remediate the threat. In the case of ransomware, it comes particularly handy to search for potential recovery tools to recover your encrypted files. A quick search in Google with the right keywords already points to interesting results: ‘cryptxxx file recovery’.
Another interesting value provided by clustering is the capability to look for similar samples. From the clustering results of the given sample, you just have to click on the intelligence widget to trigger a search in LLKB for the given cluster label. Here we will follow the CryptXXX dynamic cluster with identifier 386. A new page shown below is opened with examples of analysis reports belonging to the given cluster.
Having access to analysis reports from a same family, an analyst can quickly compare multiple executions and extract the shared information. This is particularly useful for the analyst to build a family model such as a robust IOC profile. Indeed, with a single execution path, it is really hard to determine what part of the execution are dependent on the environment or randomized at runtime. The comparison of multiple execution reports gives the analyst some point of reference to understand where such runtime dependencies are observed and where samples share similar activity. The search interface of LLKB already bubbles up the network information shared by the different samples as shown in the picture below. A quick look at the results show that CryptXXX samples do not show any domain resolution but share a common hard-coded IP 220.127.116.11. In the context of blacklist generation, this IP address constitutes an immediate actionable item to remediate the threat.
Ransomware Zepto (locky variant)
Recently reported in a blog postfrom the Cisco Talos team, Zepto is the newest variant of the ransomware Locky. Since this discovery on June 30th 2016, this ransomware regularly made the headlines.
The name Zepto comes from the specific extension this ransomware appends to encrypted files. An article from BleepingComputer briefly describes the naming schema used by Zepto: “With this new version, Locky uses the .zepto extension and files are renamed to a name like 024BCD33-41D1-ACD3-3EEA-84083E322DFA.zepto. This new naming format is in the form of [first_8_hexadecimal_chars_of_id]-[next_4_hexadecimal_chars_of_id]-[next_4_hexadecimal_chars_of_id]-[4_hexadecimal_chars]-[12_hexadecimal_chars].zepto. For example, for a file called 024BCD33-41D1-ACD3-3EEA-84083E322DFA.zepto, the extracted victim ID would be 024BCD3341D1ACD3.”
Most recent ransomware, Zepto included, target accessible writable directories and network shares in their search for sensitive files to encrypt. Our Lastline sandbox uses such shares to lure malware into exhibiting their behaviors. Using this information, we can now combine one of these shares with the file example obtained from the article to craft our search entry:
Since the victim identifier constituting the encrypted filename is unique, searching our Knowledge Base for the exact file name example given by the BleepingComputer article is unlikely to yield results. However, we can leverage some of the features offered by LLKB. Most of the information indexed in our base is normalized beforehand to erase known dependencies to the environment (e.g. user name) or runtime randomizations. User queries are normalized in a similar way in order to maximize the chances of successful results. In this case, the filename structure used by Zepto is successfully recognized as a valid UUID structure and normalized as shown in the picture above.
The query successfully returns a first set of results as shown in the picture above. It is interesting to notice that, despite the conflicting AV labels shown in the report list, the clustering facet shows that all samples were clustered under a unique dynamic cluster label: dyn_wxp_ransomware_zepto_9083. In the same way it was for CryptXXX, the network information is bubbled up with a list of shared domains and IPs as shown in the picture below.
For now, we only have investigated dynamic clusters. We have seen that dynamic clusters successfully group together samples having similar runtime behaviors and sharing the same C&C infrastructure. We can now obtain a second perspective on the results with the second type of clustering based on static code similarity supported by LLKB. Using the facet navigation, we can now choose the code-hash clustering view of the data: as pictured below, the samples from our search all belong to the same cluster: 'c#_wxp_ransomware_unknown_1552'.
By clicking on that view and selecting this cluster, a new search is automatically triggered. This second search has extended the number of results. This behavior was to be expected since static clusters group together samples sharing important portions of their code base, independently of the malware author configuration and the runtime environment. Zepto is known to be a straight variant of Locky code wise but configured differently to use a different file extension and to point to a different C&C infrastructure. These differences observable at runtime disappear at the level of static code hashes as illustrated by the clusters overlap in the picture below (a single code-hash cluster pointing to multiple dynamic cluster
This second search has also enriched the number of IPs and domains bubbled up from the reports. Code hashes clusters, as we just said, are more resistant to environment changes, in particular at the network level. Over time, the C&C infrastructure of a malware can be relocated to avoid blacklisting or sinkholes can be deployed by researchers or law enforcement to observe the C&C communications. The two pictures below, first the enriched set of IPs and domains, and second, an extract of the HTTP traffic observed in one of the samples. The picture shows clear difference of traffic between an active C&C host and a sinkholed host.
An interesting intel, provided by the query and shown in the picture below, is the timeline of the samples and their associated AV labels. These labels are generated for reference from the AV information available at the time of the analysis. A first observation is that the cluster label is usually more consistent than the AV labels associated to the samples belonging to the cluster. AV names tend to be mixed and inconsistent across samples and products. A second observation with regards to these labels is that a certain number of samples are labeled as 'Undetected' only samples missed by the major AV products when generating the label are marked as undetected. Knowing that Zepto was discovered on June 30th, the Lastline scores in comparison show that from day one, the threat was successfully covered.
From there, we can now start a drill down operation by picking up the first domain returned by the system: rbwubtpsyokqn.info. You just copy-paste it in the search bar to start your search.
When you search by IP address or domain, LLKB provides you additional networking information around your search term such as the passive DNS information. This information, as given in the example below, is interesting to understand the evolution and movement of the threat and its C&C infrastructure.
Through these different use cases, we have shown that the Lastline Knowledge Base is a very flexible tool supporting different types of workflows. LLKB allows users to rapidly obtain global information about a threat, in particular thanks to its new clustering feature. As we have seen, global information enables a better assessment of the threat, of its properties but also provides actionable items to build informed remediation plans: threat identification to search for remediation tools, identification of the dominant dropping mechanisms, information shared by the threat such as IPs and domains for blacklisting, similar threat reports for robust IOC profile generation. LLKB also allows users to drill down and to perform deeper analyses by searching for any of the elements shared by the threat under scrutiny.