Note to experienced readers: You may be able to skip the first history lesson and scroll straight to the “Off to work” part of this post.
Firstly, you may ask “What do you mean by ‘Signature Generation’?”. Have you ever wondered how Anti-Virus software does its job when heuristics are turned off? It searches the file in question and looks for references to certain unique sequences of assembly instructions(opcodes). If it finds the designated sequence of bytes(opcodes) it may trigger an alert accordingly.
Secondly, “Why would I want to bother creating a signature?”. There are many reasons why you may want to, if you’re reading this chances are you’ve already got that reason, but I’m going to list a few just for the sake of completeness.
- You need to identify a sample that currently has no existing signatures.
- You’re passionate about what you do and want to further your knowledge.
Now that Isn’t a lot of reasons, but keep in mind that reason 1 covers a very wide spectrum, in my case I needed to remove an infection(A Windows PE infecter) that had no existing signatures publicly available. I wanted to automate the disinfection process and thus I needed to identify the sample before I attempted to disinfect an uninfected file…
I hope you can now grasp why(if you didn’t already) understanding the basics of Signature creation can be useful.
Locating packer/protector signatures in malware is usually not a problem when you’ve got such tools as PEiD and RDG Packer Detector along with their corresponding databases lying around your hard-drive… But at times they won’t find anything, the same applies to Anti-Virus software.
Off to work… We need to know where in the file to look for signatures… There are 2 commonly used methods, they are as follows:
- With a Windows PE file we may be able to start searching from the executable’s entrypoint.
- We can also consider searching from the start of the file(offset 0).
Clearly option 1 should be preferred, as when you’re scanning a file for potentially thousands of signatures, scanning the entire files contents can take quite a while even on fast systems.
My usual path is to start at the entrypoint of the file and look for any static or constant code, it’s a good idea to search around core initialization routines, as if they are changed too much the executable may no longer function.
You can also generate MD5 hashes of common compiler generated entrypoints, you could then consider scanning some of your sample’s to check which ones have known entrypoints, if you don’t get a match you can bet your executable is packed or protected, should this packer be unknown you can start looking for signatures. If you’ve got a load undetected sample’s lying around it can help to have this process automated for you.
There are various ways to find signatures in PE files, common places and things to check are as follows:
- Similarity’s in section names.
- Section offsets and sizes.
- Bytes around the entrypoint.
- Start and end of sections.
- Around the Import Table.
Once you have a signature, you can take a collection of infected sample’s, scan them all and hopefully all will match, if so, you can safely say you’ve got a signature that should hopefully be unique to this strain.
Once this is done you should consider adapting the signature so it is compatible with PEiD’s database and such, consider sharing it with the world to save others time in the future… Of course you should make sure you have updated signature files, otherwise you may find you’re working on an already detected sample…
I’ve covered the basic idea behind generating signatures and such, I may provide a case study in the future but I don’t have time at the moment, at 623+ words I think this post is long enough without me going into depth with case studies.
If you have any questions, please leave a comment.
KOrUPt.