XML vs HTML

Similar syntax, very different rules — a practical comparison for developers.

Validate or format XML (not HTML) with the free XML Formatter — runs entirely in your browser.

The Core Difference in Purpose

HTML (HyperText Markup Language) is a specific application for describing web pages. It has a fixed set of elements (<p>, <div>, <form>) with defined semantics, and browsers know how to render them visually.

XML (Extensible Markup Language) is a general-purpose data format. It has no predefined elements — you define your own vocabulary. XML carries data structure and meaning, not presentation. RSS feeds, SOAP messages, Android layouts, and Maven build files are all XML with application-specific vocabularies.

Side-by-Side Comparison

Tag names
HTML: fixed set defined by the spec (<p>, <table>, <video>…).
XML: you define any tag names you need (<invoice>, <product>, <quantity>).
Case sensitivity
HTML: tag and attribute names are case-insensitive — <P>, <p>, and <P> are the same element.
XML: fully case-sensitive — <Note> and <note> are distinct elements that must each have matching closing tags.
Unclosed tags
HTML: void elements like <br>, <img>, and <input> have no closing tag and the browser handles them correctly.
XML: every element must close, either with </tagname> or as a self-closing empty element <tagname/>.
Attribute values
HTML: attribute values may be unquoted if they contain no spaces, and boolean attributes like disabled need no value at all.
XML: all attribute values must be quoted (single or double) and all attributes must have explicit values.
Error handling
HTML: browsers implement a detailed error-recovery algorithm — malformed HTML is displayed as best the browser can guess.
XML: parsers stop immediately on any well-formedness error and report a fatal error. There is no recovery.
Multiple root elements
HTML: a page is implicitly rooted at <html>, and parsers handle fragments gracefully.
XML: exactly one root element is required. A document with two top-level elements is invalid.
Whitespace handling
HTML: collapsing of whitespace is defined per-element by CSS display rules.
XML: whitespace is significant and preserved by default unless the parser or application explicitly strips it.

The XHTML Middle Ground

XHTML is HTML reformulated as a valid XML application. An XHTML document must obey XML's strict rules — all tags closed, all attributes quoted, exactly one root element — while using HTML's element vocabulary (<p>, <div>, etc.).

XHTML was popular in the early 2000s as a stepping stone toward modular web formats. It fell out of favor with HTML5's arrival. You may still encounter XHTML in legacy codebases or feed validators — serve it as application/xhtml+xml, not text/html, to get strict XML parsing in browsers.

When to Use XML vs HTML

SVG: Where XML and the Browser Meet

SVG (Scalable Vector Graphics) is the primary XML format that lives inside web pages. Inline SVG embedded in HTML must follow XML rules — self-closing tags, quoted attributes, no overlapping elements. This is one of the few cases where browser developers deal with strict XML parsing alongside lenient HTML parsing in the same document.

The XML Formatter can format, validate, and minify SVG files — paste the SVG content and click Format to clean up the indentation.