By Celeste Fralick, chief data scientist and senior principal engineer, McAfee
Just about everyone in the cybersecurity field has accepted the notion that adversaries are just as smart as we are. Anytime we celebrate the latest threat detection and prevention breakthrough, we’re well aware that the bad guys are at work devising ways to evade or disrupt it. Artificial intelligence (AI) in its various permutations—from traditional machine learning (ML) to deep learning (DL)—is no exception. As it evolves and grows, we fully expect that adversaries will ramp up their knowledge and ability to exploit tools that use AI.
What does this mean for enterprise security teams and cybersecurity vendors? Increasingly, enterprises are relying on AI to improve their risk posture. AI can be especially valuable in threat hunting and detection, but it is not without its flaws. Adversaries have the ability to manipulate AI data and algorithms to the point where the AI system is defeated. Malware can then pass through undetected, putting vital corporate data, systems, and users at risk. This type of malicious interference with AI-based security systems has been dubbed adversarial machine learning (AML). As a consequence, both vendors and enterprise security teams need to be extra vigilant about continually monitoring AI-based security systems to ensure that they are doing what they are meant to do as they evolve and adapt to the changing threat landscape.
Ultimately, it’s really all about the data. With over 278 Exabytes per month of IP traffic projected by 2021, you can be sure that adversaries will be looking for new and inventive ways of attacking us. Here’s the bottom line: as the amount of data increases, reliance on AI will inevitably increase too, so we can pretty much guarantee that adversaries will exploit vulnerabilities within our analytics.
The good news is that researchers and data scientists have been proactive and are diligently working on countermeasures. Many researchers are following a process of Attack, Detect, and Protect to test the efficacy and performance of AI systems and their applications, such as facial recognition, automobiles, medical data, and others. We’re now able to model the adversary, simulate an attack and create countermeasures prior to attacks. Detection and mitigation of AML is an essential risk reduction strategy.
But adversaries are just as diligent as we are, and will also read the literature to help plot their schemes. As we’ve discovered, there are lots of different attack methods and ways in which adversaries can use AI to their advantage. One example is evasion attacks. In this case, adversaries deluge the system with false negatives (malware disguised as benign code), causing security analysts to completely ignore alerts or de-prioritize them. Another example is poisioning attacks, which inject false data with the intent of poisoning the training data set and creating biases to certain classifications. This can actually change the AI model and significantly impact decisions and outcomes. Adversaries can even use AI to crawl the internet in search of vulnerabilities, which can open up opportunities for attack.
Researchers are studying other significant variables, like how much the attacker actually knows about the AI system. For example, in what we call “white-box” attacks, the adversary knows the model and its features. In “gray-box” attacks, they don’t know the model, but do know the features. In “black-box” attacks, they know neither the model nor the features. Even in a black-box scenario, adversaries remain undaunted. They can persistently use brute-force attacks to break through and manipulate the AI malware classifier. This is an example of what is called “transferability”—the use of one model to trick another model.
So, you might ask, what is the actual potential for bringing a seemingly robust AI detection system to its knees? It doesn’t take much, in fact. In one study, malware that was detected 99% of the time was now accepted as benign 99% of the time. In this example, only 11 features out of 700 per malware sample were changed.1
Let’s look at another example involving speech recognition. Researchers at UC Berkeley discovered that, by injecting crafted non-speech noise that sounds like white noise into your voice assistant, commands like “unlock your front door” or “place $1,000 purchase” could be executed.2
This gives you a taste of some of the research that’s been done in AML. On a positive note, forward-thinking cybersecurity vendors are working in concert with researchers at other organizations to develop simulation experiments with the aim of testing and attacking existing AI algorithms and models. I believe that breaking these analytics will ultimately make them stronger. Along with gaining an understanding of potential adversarial attack and evasion methods, we are in the process of refining a host of methodologies that will defend against AML. It’s imperative that our community follows this proactive trajectory, as it will enable all of us to develop solutions that ensure continued growth and economic success.
What’s the big takeaway? Cybersecurity vendors should take the vulnerabilities of AI into account when they design detection and classification systems. In the meantime, here are some actions you can take to maximize your defenses:
- Keep an eye on an unusually high number of false positives and false negatives.
- Engage human analysts in critical decision-making rather than in executing low-level, repetitive tasks.
- Initiate an active risk program that includes development and verification of analytics.
- Ensure that you upgrade your software on a regular basis.