An Automata-Theoretic Approach to Binary String Virus Recognition and Verification Using Deterministic Finite Automata

Main Article Content

Aldaruhz T. Darkis

Abstract

The development and simulation of transition graphs and deterministic finite automata (DFA) were performed through the Java Formal Languages and Automata Package (JFLAP). Binary input string based experiments were completed with both valid and invalid, benign and malicious strings included in the input testing. After uploaded executable files were converted from a binary stream to a set of inputs that could be processed by a DFA and compared against predefined malicious pattern signatures that had been saved to the DFA pattern library, the malware detection module conducted the analysis of these uploads. The purpose of this study was to determine if the development of a mathematically patterned DFA can be used to automatically direct the design of an efficient machine to recognize binary strings and to detect malware. The research work focused on the recognition of binary languages specified on the alphabet set {0,1}; binary languages exhibiting the use of forbidden byte patterns, modular arithmetic restrictions, and the recognition of malicious binary sequences which are representative of the behavior of malware. In particular, this study integrates the theory of formal languages with the definition and operation of an automated malware detection system by applying DFA-based recognition techniques in conjunction with the operation of the malware detection system. The results of the JFLAP simulations indicated that the correct operation of the proposed DFAs (DFA based algorithms) was validated and that“don’t care” states do not produce false acceptances. Also, the minimal DFAs were effective in detecting binary test string inputs as well as detecting and identifying malicious byte sequences with the minimum rate of false negatives. Finally, the malware detection simulation was able to detect and identify suspicious execution behavior by processing the inputs through deterministic state transitions in conjunction with formal verification techniques.

Article Details

Section
Articles