08/04/2025

100 Days of YARA: Writing Signatures for .NET Malware

If YARA signatures for .NET assemblies only rely on strings, they are very limited. We explore more detection opportunities, including IL code, method signature definitions and specific custom attributes. Knowledge about the underlying .NET metadata structures, tokens and streams helps to craft more precise and efficient signatures, even in cases where relevant malware samples might be unavailable.

Case 1: YARA signatures based on screenshots

Occasionally, a malware analyst’s job requires writing threat hunting or detection signatures based on articles or social media posts without having the sample at hand. Samples might be confidential, not available on public sharing sites, or hashes might be unavailable.

Although this is a specific use case, this article will also teach you how to add context to .NET signatures and how to choose correct formatting without the intermediate step of a hex editor when samples are present.

If there is only a screenshot, what information can be used?

Firstly, screenshots of a dnSpy session may show method names, parameter names, method identifiers, and class names. Additionally, there might be integer arrays with unique salts, passwords, or encoded payloads. Decompiled code is often also visible in screenshots, but generally not reversible to an IL code pattern. We will discuss how to choose the appropriate format for each of these patterns.

A screenshot like the following shows assembly information which is saved internally as custom attributes.

Figure 1: Screenshot of assembly info for a malware sample in dnSpy

An analyst who does not know .NET internals might write a signature like the following. To be extra sure that they account for different encodings of the string, they may apply ascii and wide modifiers to all of them.

⚠️The YARA rules are provided as plain text to avoid detection of the blog and prevent accidents with the URL.

Faulty YARA signatures with various errors in them — Figure 2: Faulty YARA signature, can you spot all mistakes?

However, the condition will not match the sample because the patterns suffer from common pitfalls. We will explain those pitfalls and create an improved, working signature after discussing .NET Internals.

.NET Metadata Header and Streams

.NET files are Portable Executable files that contain Common Language Runtime (CLR) meta data. The location of the CLR header is set in the 15th entry in the data directory of the PE file headers, which is named CLR Runtime Header in the PE COFF specification.

The CLR header points to the metadata header, which starts with the storage signature ‘BSJB’. The metadata header defines the stream headers. A standard .NET executable has the following streams: #GUID, #Strings, #US, #Blob and either an optimized (#~) or an unoptimized (#-) metadata stream (see figure 3).

The metadata stream references data in #GUID, #Strings, #Blob and points to the IL code. The IL code itself may reference user defined strings on the #US heap.

Diagram showing the structure of .NET Streams with a metadata header containing a storage header and stream headers, leading to different streams like #GUID, #Strings (UTF-8), #Blob, #US (UTF-16), and #~/#-, which are used by IL Code. The storage signature "BSJB" identifies the format. — Figure 3: The metadata header and .NET streams

First detection opportunities arise in the metadata header, because obfuscators may add invalid streams, e.g., two streams with the same name, or stream names that are not defined in the specification. This anomaly alone typically does not suffice for detecting malware but can be used to craft robust obfuscator-detection signatures, which provide important information for reverse engineers and malware analysts.

The following table denotes purposes and a high-level format description of each stream. Use this as a reference when deciding which modifiers and patterns to use in a YARA signature.

Stream name	Format	Content
#Blob	heap with binary objects of arbitrary size, aligned on 4-byte boundaries, each object preceded by the compressed length, strings are usually UTF-8, e.g., “\x05Hello”	saves among others: default names, method and property signatures, custom attributes, e.g., assembly information, typelib guid
#GUID	array of 16-byte binary objects	globally unique identifiers like MVID
#Strings	UTF-8 strings which are always enclosed by zero-bytes, e.g., “\x00Hello\x00”	method names, class names, field names, parameter names
#US	heap of UTF-16 strings preceded by compressed length, trailing byte is either 0 or 1 which indicates if at least one of the characters needed two bytes to be encoded, e.g. for “\x05H\x00E\x00\x00” the trailing byte is 0x00, but for “\x05\xCA\xFE\xBA\xBE\x01” it must be 0x01	string constants defined in user-code
#~ or #-	-	metadata tables

Some streams, #Blob and #US, prepend a compressed length to each element of the stream. The compressed length is calculated as follows (see p.68 in [2]):

Value Range	Compressed Size	Compressed Value
0x0 - 0x7F	1 byte	<value>
0x80 - 0x3FFF	2 bytes	0x8000 \| <value>
0x4000 - 0x1FFFFFFF	4 bytes	0xC0000000 \| <value>

As long as #US strings and #Blob entries are shorter than 128 bytes, the prepended compressed length is the same as the actual length. This is likely the case for most patterns that a malware analyst wants to create.

It is because of the prepended length that the fullword modifier may prevent matches, if the length happens to be an alphanumeric character.

GUIDs

The GUID that is shown in our example screenshot is also called TypeLib ID and was first described by Brian Wallace in his article “Using .NET GUIDs to help hunt for malware”^[1].

The Typelib ID is added by Visual Studio and uniquely identifies a project. It is saved in the #Blob stream, therefore it is always prepended by its length 0x24, which is the same as the ‘$’ character. It is a strong pattern that can stand on its own and is also robust against recompilation.

For malware families like AgentTesla with leaked sources, whose code or code snippets are re-used in various projects, the TypeLib ID should probably not be used if detecting the family is the goal.

Another GUID mentioned by Wallace is the MVID in the #GUID stream. The MVID changes with recompilation and serves well in identifying a specific sample, e.g., to see if the same payload was re-packed. It does not work for writing recompilation-robust detection signatures.

Fixed Signature for Case 1

Now we can fix the faulty YARA signature that was based on the assembly information screenshot:

Figure 5: Fixed YARA signature based on Assembly info and an additional url to demonstrate compressed length check

In the fixed YARA signature, we removed the “AssemblyTitle” and “Guid” strings because these are encoded in the metadata tables and do not actually appear in the binary.

Furthermore, we use the formatting based on table 1. The $guid and $title strings are from the assembly information, so they are saved in the #Blob stream with their compressed length prepended. That means there is no need for a wide modifier.

A common pitfall is the usage of YARA’s fullword modifier for #Blob (or #US) strings. The prepended length can be within the alphanumeric range just as it is the case with $title in our example. It has the length 0x34 which is incidentally also the character ‘a’. Therefore, the fullword modifier prevents matches for such strings, which is not what we want.

By checking the prepended length, we have a splendid replacement for the intention of the fullword modifier. There are three different ways to check the length:

It can be directly embedded into the string pattern (see $guid and $title).
We can use hex patterns (not shown here), but because they are less readable, I recommend supplementing these with a comment of the decoded string.
We can check the length in the condition, which is useful to have still readable wide strings (see condition for $url)

In addition to being able to work like a fullword modifier, the inclusion of the prepended length also adds structural context for the signature pattern. An assembly info text that accidentally appears as method name (#Strings stream) is probably not the pattern we were looking for.

To show an additional example for strings in #US and 2-byte compressed length, I also added the $url. Such a download URL might have been mentioned in an analysis report and here we assume it might be referenced by the IL code, thus part of the #US stream. The length of this url is 98 characters, which is 98*2= 96 bytes (0xC4) because the #US saves them in UTF-16. Additionally, the #US stream entries have an appended 0x0 or 0x1, which means we have to add 1 byte to the length, which is now 0xC5. The value 0xC5 is within the range of 0x80 – 0x3FFF, so 2 bytes are used to encode this length. Applying the formula, we get: (0x8000 | 0xC5) = 0x80C5.

The buggy $timestamp did not consider that the timestamp is saved in little endian format. Knowing that this timestamp is part of the PE Header, we add this context to the pattern by placing it at a fixed offset from the PE signature. Alternatively, the “pe” module of YARA parses the timestamp – however, parsing has the drawback that it may be worse for performance and can only run on sufficiently valid PE images but may fail to detect the malware in embedded files, memory dumps or broken files. Hence, the more versatile option is the pattern-based solution.

Finally, we change the $forms string because “WindowsFormsApp54”, “Program” and “Main” are namespace, class and method, which are placed as separate entries in the #Strings heap. Their connection is encoded in the metadata tables and cannot be covered by a single pattern. We remove “Program” and “Main” entirely from the YARA rule because they are relatively common strings. The “WindowsFormsApp54” is a default name used by Visual Studio. With exception of programming exercises, it should be uncommon among clean files and coupled with the timestamp we might find the sample that was used for the screenshot. Because “WindowsFormsApp54” is saved in #Strings, it is enclosed by zero-bytes.

One word of caution: Specifically for threat hunting rules, which often must be written without samples, it may be error-prone to manually calculate details like the compressed length. But the knowledge about underlying structures and encodings used in the .NET streams helps to avoid typical errors as we have seen in the buggy hunt rule. When you craft actual detection signatures for production, these structural details are easily extracted and work great to avoid false positives.

Case 2: Detecting Methods and IL Code

For simple cases, a strings listing of a malicious .NET sample provides enough information to write a YARA signature. However, obfuscation makes this approach unusable if it encodes user defined strings and replaces method, field and class names. To successfully create signatures for such files, the versatile analyst may want to look at actual IL code and method signatures.

The method we will look at for case 2 is the following:

Figure 6: Decompiled method that shall be detected

Tokens

Anyone who writes signatures for x86 code knows that addresses to functions or data locations generally should be wildcarded to create robust signature patterns. That is because small changes to the code like additional variables, functions and instructions will also affect these addresses after recompilation.

A .NET token is similar to an address in x86 in that regard. Just like addresses they might change their values with recompilation. However, they are not exactly the same and wildcarding the whole token is not recommended.

There are two types of tokens in .NET assemblies: Coded tokens and non-coded ones. The non-coded tokens are part of the IL code.

.NET metadata consists of a lot of tables, which define classes, parameters, methods and more. Tokens reference a row in a metadata table. That means they describe two data points: a record identifier to specify which row is used and a table index to indicate which table they reference.

Every token consists of 4 bytes. The first byte is the table index, which is also called token type. The remaining bytes 2-4 are the record identifier(RID). While the first byte defines the metadata table, the RID defines which entry in this table is being used.

Why is the table index also called token type? That is because every meta data table is responsible for storing entries of a certain type. For instance, methods are saved in the table mdtMethodDef, that means any token that points into this table is a method definition reference with the token type 0x06.

The token types themselves have the same values in every .NET assembly, making them an important data point when writing signatures. The following table lists their values (see page 76 in [2]).

Token type	Value (RID \| (Type << 24))
mdtModule	0x00000000
mdtTypeRef	0x01000000
mdtTypeDef	0x02000000
mdtFieldDef	0x04000000
mdtMethodDef	0x06000000
mdtParamDef	0x08000000
mdtInterfaceDef	0x09000000
mdtMemberRef	0x0A000000
mdtCustomAttribute	0x0C000000
mdtPermission	0x0E000000
mdtSignature	0x11000000
mdtEvent	0x14000000
mdtProperty	0x17000000
mdtModuleRef	0x1A000000
mdtTypeSpec	0x1B000000
mdtAssembly	0x20000000
mdtAssemblyRef	0x23000000
mdtFile	0x26000000
mdtExportedType	0x27000000
mdtManifestResource	0x28000000
mdtGenericParam	0x2A000000
mdtMethodSpec	0x2B000000
mdtGenericParamConstraint	0x2C000000

The RIDs on the other hand should rather be wildcarded, because similar to addresses in x86 their values might change when table entries are added or removed, and the sample is recompiled.

IL Code Patterns and Wildcards

Let’s use the knowledge about tokens to create an IL code signature. To see the opcode, open a sample in dnSpy and select “IL code” as language. Then copy and paste the code sequence that you want to add to the signature.

A partial output for our Buffer method looks as follows. This code initializes an array of size 256 and a dictionary, then uses Enumerable.Range(0, 256) to iterate the array.

/* 0x00000378 2000010000   */ IL_0000: ldc.i4    256 
/* 0x0000037D 8D19000001   */ IL_0005: newarr    [mscorlib]System.String 
/* 0x00000382 0A           */ IL_000A: stloc.0 
/* 0x00000383 731F00000A   */ IL_000B: newobj    instance void class [mscorlib]System.Collections.Generic.Dictionary`2<string, uint8>::.ctor() 
/* 0x00000388 0B           */ IL_0010: stloc.1 
/* 0x00000389 16           */ IL_0011: ldc.i4.0 
/* 0x0000038A 2000010000   */ IL_0012: ldc.i4    256 
/* 0x0000038F 282000000A   */ IL_0017: call      class [mscorlib] System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32)

The opcode part is the second column in this listing. For instance, the last instruction, which calls Range(0,256), has the following hexadecimal byte sequence:

28 20 00 00 0A

The first byte, 0x28, is the opcode for the call instruction.

The following three bytes, 0x20 0x00 0x00, are the RID, because the token is saved in little-endian format. The last byte, 0x0A, is the token type mdtMemberRef.

That means for this call instruction we wildcard bytes 2-4 because we want to preserve the information that we call a member reference. The resulting sub pattern will look as follows:

28 ?? ?? ?? 0A

The full YARA signature for the IL code may look as follows:

Figure 7: IL code signature with wildcards for the token RIDs

Note that we keep the integer values for the array size and the Range(0,256) call. Depending on the context and the probability that these values change, such values might need to be wildcarded too. Values that represent keys for encryption, campaign IDs or version numbers are often subject to change.

Some articles recommend wildcarding the full token including the token type, however, there is usually no advantage in doing that. On the contrary, in addition to losing type information, this might result in bad performance if the remaining byte sequences do not have sufficient length. For YARA patterns, a minimum length of 4 consecutive bytes without wildcards is recommended because YARA’s search algorithm does a first sweep with 4-byte substrings (see [4]), which are called atoms.

Detecting each part of a Method

A method consists of a body, the method name, parameter names, and a signature. In .NET assemblies these are saved in different streams, therefore different locations of the assembly.

Diagram explaining .NET method data locations, showing how the method and parameter tables in the metadata stream (#~/#-) link to #Strings (method and parameter names), #Blob (method signature), and the method body containing IL Code. The method body also connects to #US, which holds user-defined strings. — Figure 8: The locations where data about .NET methods is saved

Let’s assume we want to use all that information for a YARA signature.

Firstly, the name of a method as well as the parameter names are saved in the #Strings stream. Therefore, we know that the method name and the parameter names will be enclosed in zero-bytes and in UTF-8 format. This does not only serve us well if we need to write signatures based on screenshots alone, but also saves time if samples are available because we do not need to look up the representation of these names in a hex editor.

Secondly, any string that is referenced by IL code is stored UTF-16 encoded in the #US heap.

We discussed #String and #US strings already for case 1.

Thirdly, the method’s body is the actual IL code. We discussed this part in the previous section.

Lastly, the method signature is saved in the #Blob stream. Method signature in this context means the calling convention, parameter types and return type that the method expects and should not be confused with a detection signature. The buildup of such a method signature is as follows:

method_sig ::= <callconven_method> <num_of_args> <return_type> [<arg_type>[,<arg_type>]*]

Ildasm.exe shows the method signature’s byte sequence. Using the fully qualified name of a method, a suitable command to display the byte sequence is:

ILDasm.exe /text /bytes /nobar /item="ns11.Class9::method_22" sample

The following shows an example output with the byte sequence of the method signature in the last line:

.method public hidebysig instance void 

          method_22(class [System.Drawing]System.Drawing.Imaging.BitmapData Param_55, 

                    class [mscorlib]System.IO.MemoryStream Param_56) cil managed 

SIG: 20 02 01 12 29 12 2D

The method signature has the following meaning:

0x20 is the calling convention IMAGE_CEE_CS_CALLCONV_HASTHIS, which means this is an instance method, see [3] and p.146 in [2]
0x02 is the number of arguments, which is 2
0x01 is the return type VOID, see p.141 in [2]
0x12 0x29 is the first argument, 0x12 refers to the CLASS type (see p.145 in [2]) and 0x29 is a coded token of the class reference
0x12 0x2D is the second argument, 0x12 refers to the CLASS type (see p.145 in [2]) and 0x2D is a coded token of the class reference

Just as we wildcard RIDs in IL code patterns, we should also wildcard the coded tokens in method signatures. Coded tokens are compressed forms of tokens to allow for smaller sizes than 4 bytes. They are not used in IL code, but in internal structures like the method signatures.

Additionally, we prepend the length 0x07 to our pattern because it is required for every #Blob entry.

The final hex pattern for this method signature is:

07 20 02 01 12 ?? 12 ??

Method signature patterns on their own are a weak data point. Furthermore, unless the scanning engine parses .NET metadata, the method signature cannot be associated with the method body and name. So, for plain pattern search, any method with the same method signature will match. Thus, it is useful to add context to a YARA rule but will certainly not suffice on its own.

Final Signature for Case 2

Based on the knowledge from the previous section, we add a few more strings to our YARA signature for the Buffer method:

Figure 9: YARA signature with patterns from IL code and various streams that are used to store .NET method data

The code references the string “X2”. Although it only consists of 2 characters, we increase the pattern of $us_string length sufficiently by using our knowledge that #US elements have a prepended length and, in this case, a trailing 0.

Additionally, we include the class name and the namespace for this exercise.

We extract the method signature via:

ildasm.exe /text /bytes /nobar /item="WindowsFormsApp54.Small::Buffer"

The byte sequence for the method signature, 0x05 0x00 0x01 0x1D 0x05 0x0E, is composed as follows:

0x05 is the prepended length of 5
0x00 is the default calling convention, IMAGE_CEE_CS_CALLCONV_DEFAULT, see [3] and p.146 in [2]
0x01 is the number of arguments
0x1D means the return type is SZARRAY, see p.137 in [2]
0x05 means the array's underlying type is byte, see p.134 in [2]
0x0E means the first argument is of type string, see p.145 in [2]

There is no wildcard necessary because no coded token is present.

Tips for .NET YARA signatures

Knowledge about internal structures helps to add context to signatures. This results in more accurate and robust detection signatures because we increase the chances that the patterns are embedded in the correct structures.

Furthermore, it increases our ability for expression in YARA and leads to more flexibility and less mistakes when dealing with missing information. It also increases efficiency because we do not need to look up the correct formatting in the hex editor.

This is not only true for .NET. Other types of signatures, like those on CPython bytecode also profit from factoring in its file and data structures.

The value of readability and maintainability should not be underestimated. Signatures that require to reverse engineer the code of the sample to determine what they detect, take often the same time for quality checks and maintenance as writing a similar signature from scratch. Yara byte patterns for IL code should always contain comments with the disassembly or decompilation of the detected IL code.

References

[1] Brian Wallace, 2015, “Using .NET GUIDs to help hunt for malware”, VirusBulletin

[2] Serge Lidin, 2014, “.NET IL Assembler”, Apress

[3] https://learn.microsoft.com/en-us/dotnet/framework/unmanaged-api/metadata/corcallingconvention-enumeration

[4] Florian Roth, 2021, Revision 1.5, https://github.com/Neo23x0/YARA-Performance-Guidelines?tab=readme-ov-file#atoms

Sample Hashes

f9ee3eff3345ea280c01d5fce5461b24c537cf6c3dfadc626ef73eed815c2008

Back to Blog

Karsten Hahn

Principal Malware Researcher