Skip to content
ReFreezed edited this page May 18, 2021 · 6 revisions

[v1.2] The XML module, available through the xml global, handles XML data parsing and contains XML-related functionality.

Note: The API is very similar to Penlight's. Most functions have new names, but the Penlight names also work.


Introduction

Accessing XML data through the data object (or calling xml.parseXml()) will get you an XML node back (or specifically, an element). A node can be two things: XML tags become elements (represented by tables) while all other data become text nodes (represented by strings).

Elements are sometimes also called documents in this documentation and other places, especially when referring to the root element in a node tree.

Elements

Elements always have a tag field and an attr field (for attributes). They are also arrays containing child nodes.

element = {
	tag  = tagName,
	attr = {
		[name1]=value1, [name2]=value2, ...
	},
	[1]=childNode1, [2]=childNode2, ...
}

A similar format is used in other libraries too. LuaExpat calls it LOM.

Example

The following XML...

<animal type="dog" name="Puddles">
	<hobbies>Biting &amp; eating</hobbies>
	<!-- Comments are ignored. -->
	How did this <![CDATA[ get here? ]]>
</animal>

...results in this table:

document = {
	tag  = "animal",
	attr = {
		["name"] = "Puddles",
		["type"] = "dog",
	},
	[1] = "\n\t",
	[2] = {
		tag  = "hobbies",
		attr = {},
		[1]  = "Biting & eating",
	},
	[3] = "\n\t\n\tHow did this  get here? \n",
}

Notice how all whitespace is preserved, and that CDATA sections become text.

Functions

clone

nodeClone = xml.clone( node [, textSubstitutionCallback ] )

Clones a node and it's children. If the textSubstitutionCallback arguments is given, it should be a function with this signature:

text = textSubstitutionCallback( text, kind, parentElement )

This function is called for every text node, tag name and attribute in the node tree. It can modify the values of these things for the clone by returning a modified string. kind will be "*TEXT" for text nodes, "*TAG" for tag names, and the attribute name for attributes. parentElement will be nil if the initial node argument is a text node.

compare

nodesLookEqual = xml.compare( value1, value2 )

Returns true if the values are two nodes that look equal, false otherwise. Returns false if any value is not a node.

element

element = xml.element( tag [, childNode ] )
element = xml.element( tag, attributesAndChildNodes )

Convenient function for creating a new element. The second argument, if given, can be either a node to put in the element as it's first child, or a combination of an array of child elements and a table of attributes. Examples:

local person = xml.element("person")
local month  = xml.element("month", "April")
local planet = xml.element("planet", xml.element("moon"))

local chicken = xml.element("chicken", {
	age = "3",
	id  = "942-8483",

	xml.element("egg"),
	xml.element("egg"),
})

Also see xml.newElement().

Penlight alias: xml.elem()

isElement

bool = xml.isElement( value )

Check if a value is an element.

Penlight alias: xml.is_tag()

isText

bool = xml.isText( value )

Check if a value is a text node. (Any string value will make the function return true.)

makeElementConstructors

constructor1, constructor2, ... = xml.makeElementConstructors( tags )
constructor1, constructor2, ... = xml.makeElementConstructors "tag1,tag2,..."

Given a list of tag names, return a number of element constructors. The argument can either be an array of tag names, or a string with comma-separated tags.

A constructor creates a new element with the respective tag name every time it's called. It's a function with this signature:

element = constructor( [ childNode ] )
element = constructor( attributesAndChildNodes )

The argument, if given, can be either a node to put in the element as it's first child, or a combination of an array of child elements and a table of attributes (same as the argument for xml.element()).

Example:

local bowl,fruit = xml.makeElementConstructors "bowl,fruit"
local document   = bowl{ size="small", fruit"Apple", fruit"Orange" }
print(document) -- <bowl size="small"><fruit>Apple</fruit><fruit>Orange</fruit></bowl>

Penlight alias: xml.tags()

newElement

element = xml.newElement( tag [, attributes ] )

Create a new element, optionally initialized with a given attributes table. Examples:

local person  = xml.newElement("person")
local chicken = xml.newElement("chicken", {age="3", id="942-8483"})

Also see xml.element().

Penlight alias: xml.new()

parseHtml

element = xml.parseHtml( xmlString [, filePathForErrorMessages ] )

Parse a string containing HTML markup. Returns nil and a message on error. Example:

local document = xml.parseHtml("<!DOCTYPE html>\n<html><head><script> var result = 1 & 3; </script></head></html>")
print(document[1][1].tag) -- script

parseXml

element = xml.parseXml( xmlString [, filePathForErrorMessages ] )

Parse a string containing XML markup. Returns nil and a message on error. Example:

local document = xml.parseXml("<foo><bar/></foo>")
print(document[1].tag) -- bar

removeWhitespaceNodes

xml.removeWhitespaceNodes( document )

Recursively remove all text nodes that don't contain any non-whitespace characters from the document.

print(document)
--[[ Output:
<horses>
	<horse>
		<name> Glitter </name>
	</horse>
	<horse>
		<name>Rush  </name>
	</horse>
</horses>
]]

xml.removeWhitespaceNodes(document)
print(document)
--[[ Output:
<horses><horse><name> Glitter </name></horse><horse><name>Rush  </name></horse></horses>
]]

substitute

newDocument = xml.substitute( xmlString, data ) newDocument = xml.substitute( document, data )

Create a substituted copy of a document. This is the opposite function of Element:match(). See the Penlight manual on the subject for more info (look for the sections describing templates). Returns nil and a message on error.

toHtml

htmlString = xml.toHtml( node [, preface=false ] )

Convert a node into an HTML string.

preface, if given, can either be a boolean that says whether a standard <!DOCTYPE html> string should be prepended, or be a string containing the given preface that should be added. Example:

local document = xml.parseHtml('<html  x = "y"  ><body><input type=text disabled></body></html>')

print(xml.toHtml(document))
--[[ Output:
<html x="y"><body><input type="text" disabled></body></html>
]]

toPrettyXml

xmlString = xml.toPrettyXml( node [, initIndent="", indent=noIndent, attrIndent=noIndent, preface=false ] )

Convert a node into an XML string with some "pretty" modifications.

(Generally, you probably want to use xml.toXml() instead of this function.)

initIndent will be prepended to each line. Specifying indent puts each tag on a new line. Specifying attrIndent puts each attribute on a new line. preface, if given, can either be a boolean that says whether a standard <?xml...?> string should be prepended, or be a string containing the given preface that should be added. Examples:

local document = xml.parseXml('<foo x="y"><bar/></foo>')

print(xml.toPrettyXml(document, "", "  "))
--[[ Output:
<foo x="y">
  <bar/>
</foo>
]]

print(xml.toPrettyXml(document, "", "    ", "  ", '<?xml version="1.0"?>'))
--[[ Output:
<?xml version="1.0"?>
<foo
  x="y"
>
    <bar/>
</foo>
]]

This function is used when calling tostring(element). Also see xml.toXml().

Penlight alias: xml.tostring()

toXml

xmlString = xml.toXml( node [, preface=false ] )

Convert a node into an XML string.

preface, if given, can either be a boolean that says whether a standard <?xml...?> string should be prepended, or be a string containing the given preface that should be added. Examples:

local document = xml.parseXml('<foo  x = "y"  ><bar /></foo>')

print(xml.toXml(document))
--[[ Output:
<foo x="y"><bar/></foo>
]]

print(xml.toXml(document, '<?xml version="1.0"?>'))
--[[ Output:
<?xml version="1.0"?>
<foo x="y"><bar/></foo>
]]

Also see xml.toPrettyXml().

walk

xml.walk( document, depthFirst, callback )
callback = function( tag, element )

Have a function recursively be called on every element in a document (including itself and excluding text nodes). If depthFirst is set then child elements are visited before parent elements. Example:

xml.walk(document, false, function(tag, el)
	if tag == "dog" then
		local dogName = (el.attr.name or "something")
		printf("Found doggo called %s!", dogName)
	end
end)

Element Methods

Element.addChild

Element:addChild( childNode )

Add a child node to the element.

Penlight alias: Element:add_direct_child()

Element.eachChild

for childNode in Element:eachChild( )

Iterate over child nodes.

Penlight alias: Element:children()

Element.eachChildElement

for childElement in Element:eachChildElement( )

Iterate over child elements (skipping over text nodes).

Penlight alias: Element:childtags()

Element.eachMatchingChildElement

for childElement in Element:eachMatchingChildElement( tag )

Iterate over child elements that have the given tag name.

Element.filter

Element:filter( [ textSubstitutionCallback ] )

Clones the element and it's children. This is the same function as xml.clone().

If the textSubstitutionCallback arguments is given, it should be a function with this signature:

text = textSubstitutionCallback( text, kind, parentElement )

This function is called for every text node, tag name and attribute in the node tree. It can modify the values of these things for the clone by returning a modified string. kind will be "*TEXT" for text nodes, "*TAG" for tag names, and the attribute name for attributes.

Element.findAllElementsByName

elements = Element:findAllElementsByName( tag [, doNotRecurse=false ] )

Get all child elements that have the given tag, optionally non-recursively.

Penlight alias: Element:get_elements_with_name()

Element.getAttributes

attributes = Element:getAttributes( )

Get the attributes table (i.e. Element.attr). Note that the actual table is returned - not a copy of it!

Note: You can use Element:setAttribute() or Element:updateAttributes() for updating attributes.

Penlight alias: Element:get_attribs()

Element.getChildByName

element = Element:getChildByName( tag )

Get the first child element with a given tag name. Returns nil if none exist.

Penlight alias: Element:child_with_name()

Element.getFirstElement

element = Element:getFirstElement( )

Get the first child element. Returns nil if none exist.

Penlight alias: Element:first_childtag()

Element.getText

text = Element:getText( )

Get the full text value of the element (i.e. the concatenation of all child text nodes, recursively).

Element.getTextOfDirectChildren

text = Element:getTextOfDirectChildren( )

Get the full text value of the element's direct children (i.e. the concatenation of all child text nodes, non-recursively).

(In most cases you probably want to use Element:getText() instead of this method.)

Penlight alias: Element:get_text()

Element.mapElements

self = Element:mapElements( callback )
replacementNode = callback( childElement )

Visit and call a function on all child elements of the element (non-recursively), possibility modifying the document. Returning a node from the callback replaces the current element, while returning nil removes it.

Penlight alias: Element:maptags()

Element.match

matches = Element:match( xmlStringPattern )
matches = Element:match( elementPattern )

Find things in the document by supplying a pattern. This is the opposite function of Element:substitute(). See the Penlight manual on the subject for more info (look for the sections describing templates). Returns nil and a message on error.

Element.removeWhitespaceNodes

Element:removeWhitespaceNodes()

Recursively remove all text nodes that don't contain any non-whitespace characters from the document. This is the same function as xml.removeWhitespaceNodes().

Element.setAttribute

Element:setAttribute( attributeName, attributeValue )
Element:setAttribute( attributeName, nil )

Add a new attribute, or update the value of an existing. Specify a nil value to remove the attribute.

Penlight alias: Element:set_attrib()

Element.substitute

newDocument = Element:substitute( data )

Create a substituted copy of a document. This is the same function as xml.substitute(), and the opposite of Element:match(). See the Penlight manual on the subject for more info (look for the sections describing templates). Returns nil and a message on error.

Penlight alias: Element:subst()

Element.toHtml

htmlString = Element:toHtml( [ preface=false ] )

Convert the node into an HTML string. This is the same function as xml.toHtml().

Element.toPrettyXml

xmlString = Element:toPrettyXml( [ initIndent="", indent=noIndent, attrIndent=noIndent, preface=false ] )

Convert the node into an XML string with some "pretty" modifications. This is the same function as xml.toPrettyXml().

(Generally, you probably want to use Element:toXml() instead of this method.)

Element.toXml

xmlString = Element:toXml( [ preface=false ] )

Convert the node into an XML string. This is the same function as xml.toXml().

Element.updateAttributes

Element:updateAttributes( attributes )

Add new attributes, or update the values of existing.

Penlight alias: Element:set_attribs()

Clone this wiki locally