Podcasts

Episode 2: Machine Learning When the Perimeter Is Burning

Knowing what sites, apps, and APIs you’re exposing to the Internet is crucial for determining your realistic risk level and making accurate security decisions.
 
In this episode, Frank Catucci and Dan Murphy are joined by special guest Bogdan Calin, Principal Security Researcher at Invicti, to talk about ways of determining an organization’s web attack surface and the resulting risk level. In particular, they discuss the pro and cons of various AI and ML approaches to this problem and go deeper into the workings of the pioneering Predictive Risk Scoring feature that Bogdan helped design and build.
 
In the fiction segment, it’s Bob the CISO’s first day at a new company and from the first cursory check, he’s worried that the org is exposing a lot more that it should be. A call with Alice the head developer does nothing to put his mind at ease—quite the opposite, in fact…

Hosted by: Frank Catucci, Dan Murphy
Guests: Bogdan Calin
October 24, 2024

Transcript

Episode Two: Machine Learning When the Perimeter is Burning

It was Bob’s first week at MegaHelix Technologies. This was his first time serving as the Chief Information Security Officer (CISO) at a decently sized company. He had been an AppSec guy for a long time and had held the title of CISO at his now-defunct startup, but this was different. 

When you build a house from scratch, you know where all the plumbing is, how the wires run, and how to get that leaky faucet to turn off if you twist it just right. But this was a different house—one that had been through many hands—and it was time to do a home inspection.

Bob started with the cloud accounts. It was a bit strange for a company of this size to have hosting on both Cerulean and Parana Web Services. MegaHelix did mostly B2B, with a lot of partner portals and APIs that lived on the edge and provided connectivity to other cloud-based businesses. Bob whipped up a quick Python script to export the DNS records of all the domains in his new firm. There were a lot more than he expected. 

He opened his Taut IM client and sent off a quick message to Alice, the company’s lead developer. The phrase “Alice is typing…” hung in the air for several minutes, long enough that Bob decided to fire up a quick video call. Alice accepted, with the camera off.

“Hey Alice, I just wanted to double-check with you on that list of exposed hosts.”

“Yeah, so those are left over from the old Galactus project. There are like a billion services. After the cuts, I don’t think those guys are with the company anymore.”

“So… Can we decommission them?”

“Uh, probably not. Those things are doing like a million API calls a day, and most are probably really important. Nobody knows how they work anymore.”

The coffee cup came down on the table a bit too hard. A flash of brown marred the pure white of Bob’s brand new business card, almost perfectly filling the center of the “O” in “CISO.” His hand went to his forehead, and he began rubbing his temples.

Bob next checked to see whether the list of publicly available DNS names was firewalled off with security groups and access control lists. Pressed for time, he did a quick sample, and here he found mixed results. Some hosts, typically with names ending in “partner,” were often IP-restricted to a particular set of sources—APIs and web apps intended for a private party to access over the internet. But this wasn’t always the case. He found an alarming number of hosts with names like “dev” or “test” that were open to the internet.

On a hunch, he dropped his corporate VPN and pointed his browser at one of the obscure hostnames. It included a reference to the year 2020, hinting at a release of partner software that was almost surely out of date. The auto-generated hostname was long enough that no one would ever guess it, but Bob had been around long enough to know that security through obscurity never worked out, especially when an attacker understood how DNS works.

His browser spun for a moment, and Bob felt a brief hope that the site was no longer accessible. But then it loaded—the login page for what was clearly an outdated version of his company’s software, sporting the bright coral color straight out of last decade’s branding. A pop-up banner greeted him with the text, “MegaHelix MegaConnect Version 20.2 QA System. Login with admin and the usual test password.”

Bob took a deep breath and went to get more coffee. As the steam from the fresh pot drifted towards his nostrils, he found himself imagining the systems he was charged with defending changing from a castle wall with well-defined gates to a sprawling slum spread across a valley with no design or focus. He raised the cup of coffee to his lips and took a long, thoughtful sip. It was going to be a long night.


Dan Murphy: Welcome to another episode of AppSec Serialized, the podcast where we talk about web app and API security. My name is Dan Murphy, and I’m the Chief Architect at Invicti Security. With me, as always, is Frank Catucci, our Chief Technical Officer. Today, we have a special guest on the podcast, Bogdan Calin, our Principal Security Researcher, who is joining us to talk about AI and machine learning. Welcome, Bogdan.

Bogdan Calin: Thank you very much for inviting me.

Dan: In our intro, we heard a short story about a CISO who discovers a field of web apps that he now owns. So, Bob, in our story, walked in on day one and realized, “Wow, my perimeter is a lot bigger than I expected.” Frank, have you ever felt anything like that before?

Frank Catucci: Many times. There are a couple of scenarios that come to mind right off the bat. The first one was during my consulting days. I would often ask, “How many web apps or how many APIs do you have?” The answers would inevitably vary, depending on the stakeholders around the table. I would hear anything from 10 to 100 to 1,000, to “We really have no idea.” The discovery process and trying to secure what you don’t even know exists is really a futile effort.

There have been specific instances, one in particular where I walked into a large news organization. They were using a previous product and said, “Hey, these are the web apps we’re looking at; these are the APIs. We think this is all that exists. What do you think?” I had a very similar experience to running a tool that goes out and does various types of discovery methods—DNS-based, non-DNS-based, crawling, etc. The number of web apps that came back was staggering. When I presented the findings, I was almost escorted off the property. They insisted there was no possible way that these were publicly accessible. When I explained that I had gathered this information from a guest Wi-Fi or a mobile hotspot, without even having corporate credentials or access to the corporate network, they were shocked. The fact of the matter was that most of those web apps, if not all, were accessible via the public internet, and the company was completely unaware of it. 

This is real risk. Trying to secure what you don’t know is accessible is an extremely difficult task. This is something that all organizations need to be well aware of at any phase of maturity in their AppSec programs.

Dan: Yeah, totally. I think we probably have some Bobs out there in the audience who have had that feeling before. We’ve been building something called Predictive Risk Scoring. When you discover that you’ve got a lot of assets, perhaps facing the outside world, how can predictive risk scoring help in that scenario?

Frank: When you have a large volume of surface to attack, be it web applications or APIs that you may not know exist—or even if you do know they exist but need a strategy for addressing them—the key is knowing where to start. 

We developed Predictive Risk Scoring to essentially assign a risk rating or score that allows you to prioritize which applications are the biggest threats. This helps determine where to focus your efforts first, identifying which applications and their vulnerabilities pose the highest risk.

By providing the client or customer with a very accurate model that reports back the risk score of those apps, we can stack-rank them and guide you on where to begin. Even though your list might include a thousand web apps, you’re starting with the top three, five, or ten that have the highest risk score. This shifts the overall approach from simply patching everything to understanding where your risk lies and where to start. It really provides a risk management and prioritization framework that enhances confidence in the AppSec journey.

Dan: So, when we do predictive risk scoring, we make a single request to the front page of a site to infer as much as we can about it. Bogdan, you did a lot of the work on this. What parameters does predictive risk scoring use during that initial request? How does it determine if a site is risky or not?

Bogdan: We have a list of about 220 parameters. In machine learning, these are called features, and they are correlated with the security of a website. Some examples include whether the website supports deprecated or old TLS versions, which is a significant indicator. Another example is the copyright date—many websites still display something like “Copyright 2010,” which is interesting because older websites tend to be more vulnerable. We also look at factors like the number of form inputs, as more forms generally mean a larger attack surface with more potential vulnerabilities.

Other parameters include the number of external requests, the use of cookies that are not marked as HTTP-only, the version of different JavaScript libraries, the amount of insecure content on the site, and whether the website uses a Content Security Policy. 

In total, we analyze about 220 different features. We gather all this information with a single request to the site, and then we feed it into a machine learning model—a gradient boosting decision tree, to be specific—which returns a prediction about the security risk of the website.

Dan: The metaphor I often use is that it’s like having an experienced home inspector who knows how to evaluate a property for structural integrity, whether the roof is good or the foundation is solid. But they’re making this assessment from the curb, using all their years of experience to eyeball the house and say, “You know what? I can kind of tell that this is likely to have a problem.” It’s a pretty cool technology and very neat stuff.

Frank, so this technology of doing risk assessment—taking that curbside view and evaluating whether something is safe or not—is this something you could see bad actors using as well as those on the defensive side of things? 

Frank: Absolutely. It would be naive to think that attackers aren’t using similar technology to formulate their plans of attack. They’re looking for the most vulnerable web apps to exploit for their financial gain. It’s also important to note that many of these efforts originated from enumerating attack surfaces and understanding what’s coming back to you. This is a common tactic used both for good and bad. For example, bug bounty programs like those on HackerOne or Bugcrowd often use similar practices, especially in cases where the scope isn’t limited to specific web apps but instead involves an entire domain. 

Attackers can use these same techniques, whether they’re targeting specific vulnerabilities or looking for low-hanging fruit in opportunistic attacks. It’s crucial for us and our customers to utilize these methodologies proactively, ahead of the attackers, to understand where risks lie and give defenders an equal opportunity to address them. This technology, like most application security and cybersecurity methodologies, is used by both sides—for good and for malicious purposes.

Dan: This technology is not just a ChatGPT wrapper. There’s been a big trend lately where AI is hot, and many companies are rushing to release products that simply jam data into ChatGPT and ask it for answers. But this isn’t a case where we just took the DOM and shoved it into ChatGPT’s context window to ask if it’s secure. No, this is not based on a large language model. This is a custom model that has been specifically trained, which is pretty cool. It’s more than just asking ChatGPT to summarize something.

Frank: There’s also the issue of privacy and the implications of shared data and learning data from customers. These are not things that companies may be comfortable with adopting in a large language model methodology. It’s important to note that what we’re doing is more limited, focused, and privacy-aware. We’re not just using a public entity like OpenAI’s ChatGPT, where training data could be shared or misused. This approach is much more responsible and considerate of privacy concerns.

Dan: Looking at the broader picture, we’re starting to see AI and ML techniques being incorporated into AppSec solutions. Do you see this as part of a larger trend in AppSec, where mechanisms to filter out noise and assess risk are becoming more prevalent?

Frank: I think we’re just scratching the surface of these impacts, and I’d love to hear your perspective on that, Bogdan.

Bogdan: It’s crucial for each company to identify the main tasks their products aim to achieve, such as assessing security risks. Once that’s clear, they should figure out how to improve these tasks with AI—whether making them better, faster, or more efficient. I genuinely believe that AI will change many products in the future. It can optimize processes, improve others, and solve problems that were previously unsolvable. 

There’s a lot AI can do, but it’s important not to use it just for the sake of it. You need to identify a real problem that AI can solve; otherwise, it doesn’t make sense to push it just to check a box.

Dan: AI is a means to an end, not an end in itself, in AppSec. 

Frank: That’s an important point. There’s a lot of media attention around AI, but we’ve taken a careful approach to ensure that we provide real, actionable value with our AI efforts, rather than just jumping on the AI bandwagon. We’re dedicated to understanding the true value AI can offer and addressing real problems with it.

Another critical aspect is considering both the defensive and offensive sides. Whether you’re looking at the capabilities to attack or defend, there’s equal opportunity for both. It’s essential to develop solutions that leverage AI’s capabilities responsibly, using customer data with care and delivering results that genuinely enhance our AppSec tooling and programs.

Bogdan, did you have anything else to add, maybe regarding use cases?

Bogdan: Attackers are kind of limited because these online models, like when you use the OpenAI API, are protected against classical attacks. For example, if you ask it to generate a payload for an SQL injection or a phishing attack, it will refuse.

Dan: Yeah, they’ll say no. They refuse. 

Bogdan: This stuff will not work because they have this training. But when we have powerful enough local models, that will change. Then anyone will be able to use these LLMs locally. Recently, a model called Command R Plus from Cohere was released, which is even more powerful than GPT-4. It’s an open-source model that can be run locally if you have about 64 GB of memory, if I remember correctly. So with 64 GB, you can run this model locally without depending on anyone. This could cause a lot of problems because attackers will use these kinds of models, and they won’t have any restrictions anymore. When local models become powerful enough, it will give a big advantage to attackers.

Dan: I’ve noticed that too, just playing around with a couple of local models. It’s interesting when working in security—the questions you ask that are filtered out by online models are not filtered by local models. I have a local one that is jailbroken, and I categorize it as being like talking to a psychotic toddler. It knows how to do SQL injection and XSS, but it doesn’t have enough parameters to do it well yet. As this changes, and as we get more compute power, it will become less like a psychotic toddler and more like a malevolent teenager that can actually do some damage. It’s going to get interesting.

Bogdan: Yes, the Command R Plus model from Cohere has 100 billion parameters, which is a huge difference from the smaller models you’re mentioning.

Dan: Yeah, you can tell.

Bogdan: It’s a huge difference. That’s why, when you have a powerful model, a lot of things change. You have many possibilities. I think attackers will definitely use these models. They haven’t used them much until now, but there’s no reason why they shouldn’t in the future. It will create a very large attack surface. Another interesting aspect is the larger context window. For example, Gemini 1.5 Pro was released yesterday for public use, and it has a 1 million token context window. They are working on it, and in testing, it has a 10 million token context window. You can fit a lot into that context.

Dan: Bogdan, for those who aren’t familiar, when you say “context window,” what’s an example of a situation where a small context window would limit you, and a large context window would allow you to do something more interesting?

Bogdan: For example, with a small context window, you might only be able to input a question and maybe part of some code. Let’s say you put the first file of your project into the model and ask questions about it. But with a large context window, you can input the entire codebase. You could, for example, upload a large log file and ask the model to find vulnerabilities in the code or look for anomalies in the log file. 

I just tested this with the newly released Gemini 1.5 Pro, and I uploaded some source code and asked it to find vulnerabilities. It was a test website, but the model found vulnerabilities that wouldn’t have fit in a normal context window. I also uploaded an access log file from Nginx and asked it to look for attack patterns, and it managed to find some. You can then ask more refined questions, like identifying the first attack from a particular IP or finding which IPs generated the most activity. You can basically ask natural language questions about this data without needing custom tools for the log. It’s quite powerful. 

If you have a GitHub repository, for example, you could upload the entire repository and ask if there are any vulnerabilities in it. The model might not find all of them, but they’re getting better and better over time.

Dan: The haystack through which we can search for needles has gotten a lot bigger with these larger context windows. 

Frank: Conversely, Dan, we just need to make sure that we are developing the technology and methods that can combat and defend against these threats using the same type of technology, right? And that’s the key message. Moving forward, the industry needs to utilize this same type of technology for defenses and to circumvent more automated types of attacks.

Dan: In general, with the rise of LLM tech, not just for generating attacks but also for writing code, it has really unbalanced the equation a bit. It’s like taking a car and supercharging the engine so it can go really fast, but the tools we have available for manual code review and testing for vulnerabilities haven’t scaled at the same rate. 

Unless we apply the same scaling to those defensive measures, it will quickly become an unbalanced equation. Where we are now is that development teams are going to start cranking out a lot more code than they used to. LLMs are amazing tech, but they’re not magic. Unless we employ similar techniques on the defensive side, we’re going to have a fast car that can’t brake properly to take a corner, and it’ll just zoom out of control. These are going to be very interesting times in security, but I, for one, am super excited for it. It’s going to be great—more lines of code, more bugs, more vulnerabilities.

Bogdan: One more thing I want to mention is that a lot of companies are starting to use LLMs, but the issue is that these LLMs are like big black boxes. Nobody understands exactly how they work. Normally, when you think about security, you want to control everything and understand how everything works so you can protect it properly. But these models are like insanely huge black boxes, and we’re integrating them into our software without fully understanding how they work internally. We learn new things about them every day. For example, if you ask a question in a particular way, it might respond differently each time you ask it. This lack of transparency is a big problem.

Dan: Yes, they are non-deterministic, and from a security perspective, that’s really interesting. I’ve seen examples where you can instruct an LLM not to reveal a secret, but if you ask it the same question 10 times, it might eventually give you the answer because of the temperature variance. The model is designed not to give the same response every time, so with persistence, you might eventually extract the secret. For example, on the 100th time you ask the question, it might finally reveal the secret.

Bogdan: So, you end up with software that is 99% secure, but 1% of the time it’s not. Normally, you wouldn’t deploy something like that, but with LLMs, we are deploying something like that, which I think is quite interesting.

Dan: Excellent. Well, I think we’re at our allotted time. I just want to say thank you very much, Bogdan. It’s always fascinating to talk with you about this stuff. Frank, thank you as well.

And this has been another episode of AppSec Serialized.

Credits

COMING SOON

Episode 5: CISO on the Seesaw

In this episode, Frank Catucci and Dan Murphy talk to a real-life CISO, Invicti’s own Matthew Sciberras, discussing the balancing skills required to define and apply application security policies with limited resources. In the story segment, Alice the head dev realizes her cherished new project will be delayed due to vulnerabilities—if only she had scanned earlier…

Latest Episodes

Build your resistance to threats. And save hundreds of hours each month.

Get a demo See how it works