Transcript
Episode Three: APIs Wide Open
Mallory yawned and stretched. She had dozed off in her chair as PacMan finally finished updating the packages on her Arch Linux system. Fully patched and upgraded, she checked out the latest info that had shown up in the private Hell-a-Fam channel that she shared with her inner circle of friends and hackers. One of the newest members had offered up hot breach data. The new guy was just looking to establish a reputation. She was pretty sure he was a contractor who worked for like a hundred companies with super limited permissions and passed off the weak stuff to which his gig work let him have access as evidence of his godlike application of zero days. Even the name of the file was pretentious: cygnus-x-1.7z, named after a black hole, probably implying super massive amounts of data that had been vacuumed up. It was probably not interesting at all. Still, she had a quick peek at it.
After decompressing the LZMA archive, she saw it was just Nginx logs, the records of run-of-the-mill HTTP requests, like security cam footage from an unknown convenience store. Pretty dull stuff. But wait! There was something interesting in the data. Unlike most web server logs, these were from an instance of Nginx that had HTTP header logging turned on. She sat up straighter and fired up ripgrep to search the thousands of files for a particular HTTP header, one called authorization
. And there it was, hidden in the millions of lines of logs: a single GET request to an address she pulled from the host header to a business-to-business API endpoint.
On a whim, she base64-decoded the blob of the authorization header to obtain the credentials and ran a quick curl command to the same location. She tapped the shift key of her keyboard impatiently, waiting for the query to resolve. A sharp intake of breath—the creds were still good, and the Cygnus X1 server was on the edge, facing her, facing the world. It looked like this was some kind of business-to-business integration. The JSON blob was pretty boring and looked like QA test data. It had a webhook URL and an API key field. Thinking that this might be a thread that led to a new system, she tried to access the new URL, putting the API key into a few spots. Nothing. The URL looked fake anyway, probably old test data.
Her mind went back to the URL, which ended in b2b/5
. That last path element—could that be a monotonically increasing integer? A simple ++ on the back end? She tried it out and changed the URL to b2b/6
and got back different data. This looked like a classic insecure direct object reference. The code on the other end was expecting a polite and demure API client that would only request the entities to which it had access. Mallory was anything but polite and demure. She hopped back on Hell-a-Fam to see if anyone had a tool for this. Sure enough, there was one, a tight little program written in Go called Idoru, a Gibson reference. She ran it. There was dead space up until she started to hit six-digit numbers, but at b2b/100001
she found real data, leading to a webhook URL to a company called MegaHelix on some server named GalactusV2020, which looked old and musty. There were credentials with which to call the MegaHelix API.
There was no way this was going to work. She did a simple GET to that URL. 404 Not Found. But it was not a 403, an access denied that would have been game over. On a whim, she removed the path and hit the root URL, and there before her, she had it: a full map of an exposed API, a Swagger file, the full blueprint to this abandoned, unsecured building. She quickly opened it up in Postman and looked at what was there. A low, breathy whistle escaped her lips. The endpoints had names like Credentials
, Payment
, BatchJob
, PowershellParams
, AwsSecretAccessKeys
, SshFiles
, CustomJavascriptBlobs
, and more. She pounded an ecstatic staccato rhythm on her Control and Alt keys, the Cherry MX Blue switches of her mechanical keyboard dancing in perfect synchronicity with her rapid pulse. It was going to be a long and very fun night.
Frank Catucci: Welcome to another episode of AppSec Serialized, the podcast where we talk about web app and API security. My name is Frank Catucci, I am the CTO and Head of Security Research here at Invicti Security.
Dan Murphy: And I am Dan Murphy, the Chief Architect here at Invicti Security. Our topic for today is API security. We’re going to dive into the world of API security. Frank, APIs are a major concern these days. They are the secret door through which a lot of attacks are perpetrated. What are some recent attacks where the attacker has gained access to a system through the door of an API?
Frank: There’s quite a few to choose from, Dan, but I always like to start with the one that’s almost the most in your face as of recently, and that’s the Optus API breach.
Attackers essentially discovered a publicly exposed endpoint. But what was more impressive about that publicly exposed endpoint is that it didn’t require any authentication methodology at all. So there was no authentication or authorization tied to that endpoint. And what that endpoint had access to was sensitive customer data—anything from driver’s license numbers, date of birth, home address, phone numbers, any personal identifiable information that you could possibly imagine.
Beyond that, it was also the scope. We look at this, and I want to say the numbers were somewhere just shy of 12 million customer records for personal identifiable information that were all just sitting there ripe for the plucking without any authentication or authorization sitting in front of a publicly exposed endpoint. We’re talking hundreds of millions of dollars in damage just from the Optus breach, which was very, very widespread. The impact on the user base, especially if you look at all of the customers that Optus had, was just massive, just massive massive reach. That’s just the Optus one.
Dan: And this is an example of an operation where you have a bank vault, you have this heavily policed exterior because no one’s going to put something on the edge with no authentication, but this was a gap. It was what they call a shadow API, something that is there that anybody who really knew would know instantly by looking at it that it wasn’t appropriate to have this right next to the bank vault door—a wide open archway with a turnstile you could just walk right into. Those shadow APIs are a real concern.
Frank: Absolutely. We have a lot of different types of breaches that have occurred with multiple different types of attack vectors. Another one that I like to look at is what happened to Dropbox. I believe this is about a year ago or so, maybe a little bit more than a year ago. Essentially, attackers were able to gain access to Dropbox’s internal code repos. It started with a phishing attack, but that phishing attack ended up resulting in access to over a hundred internal repos. In those internal repos were API keys.
Once we got that far, having the API keys in your hand, especially from internal repositories, means pretty much anything on those APIs is going to be able to be accessed. You’re using legitimate credentials type of access or authentication with those keys to be able to grab that information. This one in particular utilized a couple of different things. Sure, there was a phishing mode of entry. However, where the data was exfiltrated from was the APIs and those API keys that were actually stored along with the user data in some of those internal repositories.
Dan: It’s interesting because I come from a software engineering background, and so to me, an API is a very natural thing. But for folks who don’t necessarily come from that background, it’s sometimes hard to visualize. What is an API? How does it differ from a web app? The answer is that those things are a little bit blurred. A lot of modern applications are single-page applications that are simply invoking APIs as part of the click-around.
One of the metaphors that I like to use is that APIs are almost like the internals of the web app. They don’t have a GUI, they don’t have something that you can see and touch and feel. They are the service elevators; they are the high-capacity ways to get up the skyscraper that no one who’s coming in the front door sees. It’s pretty easy to forget about them, but they lift up a lot of cargo, and a lot of stuff goes up those service elevators. Much in the same way with a real physical building, you never see it, so you don’t necessarily know that it’s being cleaned, that it’s being kept up to date, that it is passing all of its service maintenance checks.
In fact, it’s pretty easy for an API, which doesn’t have an end-user visible manifestation, to be ignored and to go out of date. You and I, we can go to a website, and much like our Predictive Risk Scoring technology, it takes a look at a website and says, “Does this look kind of sketchy?” APIs are a lot more difficult to do that kind of quick two-second analysis on because they don’t have anything that you can touch, they don’t have anything you can see, they don’t have anything you can feel. They are really kind of like a catalog of operations, of invisible operations, that could be performed on a computer.
So they are like a list of endpoints, right? And all those endpoints are composed of a path and an operation. REST is probably the most popular type of API, and you might have different things that you could do on an endpoint. For example, for /users
, you might want to create a new one by doing a POST to /users
, get the information for a user maybe by putting in their telephone number and getting back their social security number, like the example you had earlier where you do a GET users/
and then their phone number or something like that, a DELETE operation, a PATCH operation. It goes on and on.
But all these things, they’re kind of out of sight, they’re out of mind, and they’re built to operate in bulk. They’re often like that service elevator. They’re lifting up huge amounts of tonnage, sometimes millions of interconnected systems that are just chatting to each other without a human watching and a human saying, “Oh look, this looks visually dusty. This looks like it’s out of shape.” The robot is running that elevator, and so it’s pretty easy for them to become invisible. And when something is invisible, well, it can be a source of attack.
Frank: You made some good points there. When we’re looking at different types of vulnerabilities, like I have an OWASP Top 10, I know what my web application may or may not be susceptible to. If we look at the last one I want to talk about, it’s Zendesk. If we take a look at that breach that occurred, we had essentially a GraphQL endpoint that had a vulnerability. That vulnerability in that GraphQL endpoint happened to be, ready for it, SQL injection. This is something that we would traditionally associate with a web application—a traditional SQL injection that has been occurring since day one of SQL databases used to populate data queries from any type of web application. However, this one was a GraphQL endpoint that had a SQL injection.
That vulnerability basically allowed criminals to get access to all of customers’ conversations, email addresses, ticket numbers, and sensitive data in those support tickets on other customer sites that were using Zendesk. These are things that you would traditionally associate with a web application. Where I was really going with this, beyond that breach, Dan, is we talk about the differences—the key differences. You mentioned scale, those elevators and things. But how would you, if you were talking to a user about security of web apps and APIs, what are the main technical or architectural differences between APIs and web applications?
Dan: If we ignore for a second the single-page apps that are kind of a hybrid between the two, with a traditional API, the user agent, the thing on the other end, is not the web browser. It’s a piece of code. So it may be some other web service invoking a webhook, or maybe that is some backend code or systems that are talking to each other, but it’s not someone clicking and sitting inside of a browser. The rise of APIs kind of historically grew out of the code that would execute when you submitted that form way back in the day. That code that does something when the user puts in a bunch of input is kind of what grew into APIs.
Nowadays, it’s very in fashion to style big systems as a series of smaller connected Lego blocks that talk to each other over an API. But the key difference is that it’s not a browser on the other side. But like you said, it’s still code. So all of the classic vulnerabilities, it’s still possible to have code evaluation if you don’t sanitize the inputs, and you’ve got an app on the backend that’s written in Node that evaluates some of the inputs. The same ability that you can find in a web app, you can find inside of that backend. Of course, that’s not true for everything.
There are going to be certain classes of attacks that are very specific to a web browser—things like clickjacking or cross-site scripting that require a JavaScript engine in which to be executed. But there are still a lot of very high-severity attacks. The same sort of ways that a bad guy would exploit and do SQL injection in the backend, the same thing holds true for APIs. Sometimes we forget that. It goes to that out-of-sight, out-of-mind property that I alluded to earlier where it’s all code on the backend, and that code can have many of the same vulnerabilities that a traditional web app could have.
Frank, I actually have a question for you. We talk about APIs and how companies have sometimes a lot of these APIs. What are some of the existing mechanisms that companies use to organize and keep track of all the gaps that they have on their edge? Where do people hold all of these APIs?
Frank: One of the things that you mentioned earlier was the amount of lift that these APIs were designed to do. We look at the ability to do things like rate limiting, etc., for these types of attacks on something that’s made to essentially deliver a large amount of information and data. Now, this gets to the next question that you just asked me: where can we keep that information? Where do we basically have a good idea or understanding of where that lives?
Well, we can do a couple of different things in the same place. We can utilize some security pieces of these existing solutions. I look at API gateways almost like a pared-down, simple format Swiss army knife for API, if you will. We look at things like the AWS API gateways, MuleSoft API using specific formats like OpenAPI, Swagger, etc. Within these gateways, we have the ability to do a lot of different things. We can not only keep an inventory of our APIs and know what exactly exists, but we also can control those APIs in a number of different ways. API gateways are really good at formulating some type of control, whether it’s an authorization or authentication type of entrance point. They also could have specified gates of rights, almost like an operationalized form of RBAC, if you will, that could occur at the gateway level.
There are also some API gateways that do security procedures, right? They function as a security beacon or a security sentinel that would allow additional insight into those. There are also micro-gateways that do internal east-west traffic for APIs. There are ways to set up complex API gateways with external or public APIs and then east-west internal APIs, and then maybe making sure that the tokenization that’s used is terminated and reestablished for a public API that does have to talk to an internal API before it feeds data back. So there’s a lot of advanced configuration and precautions that can go into those organizational or inventory different pieces of API management.
We talk about specs, right? These can live in various formats, like I mentioned, but there’s also informal specs. There’s things that are beyond using OpenAPI Swagger. You might get a spec from a HAR file, Fiddler, Postman, or even a Burp scan. But all of these essentially need to be inventoried and tracked in order to be able to secure them and understand exactly what their context and their purpose is.
As we look at those, your question on where do you store these things? I’ve seen them live in various formats, and some of them are not as flattering as API gateways. I’ve even seen them in some Excel sheets. But when we talk about another question, you asked where can these things be stored or inventoried to keep track, there are a ton of challenges and gaps that really revolve around precisely what you asked, and that is the inventory of these assets or these existing inventories of APIs. Dan, what are some gaps that you would say are plaguing the majority of customers or users with regard to these inventories?
Dan: Yeah, so API gateways and central repositories are great, but if you, dear listener, are in an organization and thinking, “Oh, I wish that I had everything planned out such that all of my APIs were tracked, were well known and enumerated,” you know, you’re in good company. I think that reality is often a little bit messy. If APIs are pipes, it’s very hard to swap out a pipe that has active users from a lot of different places. So it’s pretty natural to have not only a single API gateway, but some customers with whom I’ve spoken, I’ve asked them, “Well, what do you use?” And they say, “Well, we use lots of things. We have a lot of different parts.”
It’s the sprawl that kind of gets you. The unknown APIs that are out there are the ones that I would consider to be the riskiest. And that kind of speaks to the need for discovery because APIs tend to be organic, they tend to be created to connect to business opportunities, and they don’t always have a ton of oversight when they’re deployed. Like a pipe, they get buried under the street and they do their job, and people forget about them.
I was recently on vacation. I’m a history buff, and I was walking on the walls of a walled city. It was a castle that was attached to a city, and it was beautiful. It was really amazing, a tiny little town. I was struck by the planning, by how they had managed to encapsulate all of the town inside of the walls. But you can kind of see the sprawl of the medieval city, and then you notice right outside of the walls where there were additional buildings that had kind of been built. They were in the same style, from the same time period, and they basically sprawled out beyond the walls. So even though you may start with a nice beautiful castle and a nice walled city that has all of your APIs in one gateway, pretty soon you’re going to overflow the bounds, and someone’s going to build outside the walls, and then somebody else is going to add on to it. And that’s actually normal. It speaks to the need for discovery and for finding out and reconciling the official list with the list that comes from reality.
That’s one of the things that we’re kind of building: the ability to discover from real traffic. We’ve got a bit of tech that’s coming out that allows you to take a Kubernetes cluster and load a daemon set, load kind of like a helper into it, that is going to take a look at the traffic that is flowing on that network. It’s kind of designed for taking a look at traffic that’s right behind where you’re terminating TLS, so we’re going to look at the unencrypted HTTP traffic. But what we do is we build that map, and we allow you to survey what has been built beyond those castle walls so that you know where your threats are. So, God forbid, if somebody slipped into production, that big water main that happens to make the Galactus Project work, by looking at traffic, you could find it.
And you can say, “Oh, wow, you know what? We have these six sets of well-documented APIs, and then we’ve got this one that’s doing 2 million queries per day that is not on the map.” But you can build that map. You can reconstruct that chart of all of the endpoints based on the traffic. You do a neat little bit of code where you look at the TCP segments, you match them up into a stream, you extract it, you parse it out, and you can say, “This unknown item now has a Swagger file that describes it, now has an OpenAPI 3.0 spec that describes it, and that is suitable for feeding into a scanner.” Because as soon as you discover something that is unknown on the outside, you can actually point an automated tool and say, “Okay, let’s knock on the edges of this pipe. Let’s make sure that it’s solid. Let’s make sure that it wasn’t built with an obvious, beyond-the-walls way to get in,” and just ask, like our example that you led with, to be able to do something without any authentication.
So, in closing, it’s kind of the ones that you don’t know about that get you. They’re the ones to be aware of. And Frank, I don’t want to say that testing APIs through DAST is the only way to do it. What are some other types of API security that one can use to get true defense in depth?
Frank: You hit the nail on the head there. The first one is you can’t secure what you don’t know exists. And even if you do know that it exists, you don’t know to what extent or what capabilities it has. So really understanding, first and foremost, is trying to get a good fingerprint or foothold into just exactly what your API’s inventory looks like. What exactly are they allowed to do? Where are they? How can I secure them? So, API inventory would be first in your security in-depth kind of defense.
But when we look at what goes into that, we want to make sure that if we’re looking at it in depth, what can happen? We look at this from a code and spec level. We want to make sure that the API has functionality that functions only within a spec. We want to make sure the API does not have capabilities that are not defined. We want to make sure that we’re understanding exactly how that should function, what it should be allowed to talk to, what it shouldn’t be allowed to talk to. We need to make sure that the code level understanding of that API is also secure. We can look at things just as you would look at from a codebase standing. We know that we have to scan the code that’s written in there. We know that we have to actually do testing. As I mentioned earlier, a GraphQL endpoint with a SQL injection is going to be a problem, just as if you had a web application with a SQL injection.
So that’s where the strength of the DAST scanner comes in because we also thoroughly test APIs for those types of vulnerabilities, whether it’s a REST-based API, GraphQL, etc. But is that enough? I think the answer is always going to be no. We need different layers and different layers of protection. There are things that can occur post-deployment. We can have things like those API gateways that have very concise constraints on what the API is allowed to do, how it is allowed to do that, maybe who has access, etc. The API gateway is a perfect spot. We also have traditional web application firewalls that have some capabilities to block some of those attacks on the API, or even to be able to broker as a reverse proxy and do some analysis there. Now, we also have solutions that do full runtime protection on APIs. So, if there’s something that is being executed and the application or the API does not handle that accordingly, or it’s being abused, it can automatically do a drop, reset, or whatever is needed to make sure that it is not getting executed.
So, there are a lot of things that we can do in-depth. I would say, first and foremost, just like applications, we’re doing the discovery, we’re looking at the code itself, we’re making sure that the specs are stringent and being adhered to. Then, we’re putting in those various layers of protection on top of that. One thing that we also need to understand is that, just like applications, these APIs are changing and gaining functionality on a daily basis, or even multiple times a day.
Then, we need to make sure that we’re not deploying things that bypass your application security program best practices. So, there are a lot of things we can do there in layers. Maybe you can give us a walkthrough, Dan, of how that actually occurs. What is used to perform the scan? How does the scan occur from a dynamic scanning perspective when we look at an API from a specification?
Dan: Yeah, that’s a great question because, as users who use the web every single day, you can get a good idea that, at a very high level, you can think of a DAST scan as just clicking through all the things, trying to open every single door, go through all the links, submit all the forms, and then find out all the little pins that jiggle and try to permute them until they pop, until you get a little bit of cross-site scripting popping up inside of your browser.
But APIs, unless you’ve stared for hundreds of hours, as some of us have, at OpenAPI YAML files, it’s a little harder to understand what they are and what they look like. So, like I said before, they’re kind of like the map. That map, that you chart from either the original spec that the developer wrote, which of course is going to be complete and accurate and have everything, or maybe the original spec that the developer wrote augmented by traffic sniffing that shows the secret backdoor API that nobody forgot to put into the spec.
But you’re going to start with this specification, and that is a YAML or JSON file, and it effectively has an inventory of a list of operations and paths. Those are going to be paths that look like /users/this
. Usually, they’ve got a prefix like /api/v1
to version it. But for each one of these paths, you’re going to have a series of operations. If you google “Swagger UI”, it kind of gives the almost canonical way that programmers tend to think about what these things look like. But that list of operations and paths is given to the scan engine as an input, and each one of those operations will have a schema that defines what sort of input goes in and what sort of output comes out.
What the scanner is going to do is take a look at that input, say it’s a simple JSON document, and it’s going to determine which of those keys can be injected into. So, all the normal places that we would attack if we came across this API in the course of a normal web browse, we will do. But we’ll also take a look and try to mutate things and create representative payloads that match the input that is expected. Because if you try to test an API and you just give it a low-effort payload, you can end up not getting deep enough into the app. On the other side, you just get back this 400 that says bad input.
Usually, the really juicy code happens a little bit deeper than that, so you usually want to get past input validation. You want to get to the point where you’re acquiring that SQL table, where you’re making that call out to the command line tool. So, it’s very, very important to get as proper-looking inputs as you possibly can. Some things like cross-site scripting probably don’t make sense, but it’s totally legit to be able to steal an AWS identity token via SSRF from an API. Those sorts of attacks are completely reasonable and completely valid to do.
Frank: Excellent. Thanks, Dan. We are just about out of time, and I hope everyone enjoyed this episode of AppSec Serialized on API security.
Credits
- Fiction story and voiceover: Dan Murphy
- Discussion: Frank Catucci & Dan Murphy
- Production, music, editing, processing: Zbigniew Banach
- Marketing support and promotion: Meaghan McBee
Bonus content
It’s widely known that your selection of background music can make or break any application security investigation, so here’s Mallory’s background track loop to help you along: