Skip to content

Commit 189bed1

Browse files
committed
Updated for HTML5 use, html 4.01 no longer supported.
1 parent fd340bd commit 189bed1

File tree

3 files changed

+154
-152
lines changed

3 files changed

+154
-152
lines changed

README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33

44
A working demo can be seen [here](http://htmlpreview.github.io/?https://github.com/blowsie/Pure-JavaScript-HTML-Parser/blob/master/demo.html).
55

6-
Credit goes to John Resig for his [code](http://ejohn.org/blog/pure-javascript-html-parser/) written back in 2008.
6+
_Credit goes to John Resig for his [code](http://ejohn.org/blog/pure-javascript-html-parser/) written back in 2008 and Erik Arvidsson for his [code](http://erik.eae.net/simplehtmlparser/simplehtmlparser.js) written piror to that._
77

8-
This code has been updated to fix several problems.
8+
This code has been updated to work with HTML 5 to fix several problems.
99

1010

1111

@@ -25,7 +25,7 @@ Handles tag, text, and comments with callbacks. For example, let’s say you wan
2525
for ( var i = 0; i < attrs.length; i++ )
2626
results += " " + attrs[i].name + '="' + attrs[i].escaped + '"';
2727
28-
results += (unary ? "/" : "") + ">";
28+
results += ">";
2929
},
3030
end: function( tag ) {
3131
results += "</" + tag + ">";
@@ -45,7 +45,7 @@ Handles tag, text, and comments with callbacks. For example, let’s say you wan
4545
Now, there’s no need to worry about implementing the above, since it’s included directly in the library, as well. Just feed in HTML and it spits back an XML string.
4646

4747
var results = HTMLtoXML("<p>Data: <input disabled>")
48-
results == '<p>Data: <input disabled="disabled"/></p>'
48+
results == '<p>Data: <input disabled="disabled"></p>'
4949

5050

5151
### DOM Builder ###
@@ -83,7 +83,7 @@ While this library doesn’t cover the full gamut of possible weirdness that HTM
8383
HTMLtoXML("<p><b>Hello") == '<p><b>Hello</b></p>'
8484
**Empty Elements:**
8585

86-
HTMLtoXML("<img src=test.jpg>") == '<img src="test.jpg"/>'
86+
HTMLtoXML("<img src=test.jpg>") == '<img src="test.jpg">'
8787

8888
**Block vs. Inline Elements:**
8989

@@ -93,4 +93,4 @@ While this library doesn’t cover the full gamut of possible weirdness that HTM
9393
HTMLtoXML("<p>Hello<p>World") == '<p>Hello</p><p>World</p>'
9494
**Attributes Without Values:**
9595

96-
HTMLtoXML("<input disabled>") == '<input disabled="disabled"/>'
96+
HTMLtoXML("<input disabled>") == '<input disabled="disabled">'

demo.html

Lines changed: 66 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,79 @@
1-
<!DOCTYPE HTML>
2-
<html>
3-
<head>
4-
<title>Pure JavaScript HTML Parse - Demo</title>
5-
<link href="http://netdna.bootstrapcdn.com/twitter-bootstrap/2.3.2/css/bootstrap-combined.min.css" rel="stylesheet">
6-
</head>
7-
<body>
8-
<div class="container" style="padding-top: 30px">
9-
<div class="row">
10-
<div class="span12">
11-
<div class="hero-unit">
12-
<h1>Pure JavaScript HTML Parser</h1>
13-
<p>All-in-one: XML Serializer, DOM Builder, DOM Document Creator, A SAX-style API </p>
14-
<p>
15-
<a class="btn btn-primary btn-large" href="https://github.com/blowsie/Pure-JavaScript-HTML-Parser">Learn more</a>
16-
</p>
17-
</div>
18-
</div>
19-
</div>
20-
<div class="row">
21-
<div class="span5">
22-
<div style="padding: 10px">
23-
<form id="form">
24-
<label>Input (HTML):</label><br />
25-
<textarea cols="60" rows="10" id="input" style="width: 100%;"></textarea><br />
26-
<input type="submit" value="Run" class="btn btn-primary" />
27-
</form>
28-
<br />
29-
<label>Output (XML):</label><br />
30-
<textarea cols="60" rows="10" id="output" style="width: 100%;"></textarea>
31-
</div>
32-
</div>
33-
<div class="span7">
34-
<div style="padding: 10px">
35-
<p>While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff. All of the following are accounted for:</p>
36-
<ul>
37-
<li>Unclosed Tags:
38-
<pre>HTMLtoXML("&lt;p>&lt;b>Hello") == '&lt;p>&lt;b>Hello&lt;/b>&lt;/p>'</pre>
39-
</li>
40-
<li>Empty Elements:
41-
<pre>HTMLtoXML("&lt;img src=test.jpg>") == '&lt;img src="test.jpg"/>'</pre>
42-
</li>
43-
<li>Block vs. Inline Elements:
44-
<pre>HTMLtoXML("&lt;b>Hello &lt;p>John") == '&lt;b>Hello &lt;/b>&lt;p>John&lt;/p>'</pre>
45-
</li>
46-
<li>Self-closing Elements:
47-
<pre>HTMLtoXML("&lt;p>Hello&lt;p>World") == '&lt;p>Hello&lt;/p>&lt;p>World&lt;/p>'</pre>
48-
</li>
49-
<li>Attributes Without Values:
50-
<pre>HTMLtoXML("&lt;input disabled>") == '&lt;input disabled="disabled"/>'</pre>
51-
</li>
52-
</ul>
53-
<br />
54-
<div class="alert alert-info"><b>Note:</b> It does <b>not</b> take into account where in the document an element should exist. Right now you can put block elements in a head or th inside a p and it'll happily accept them. It's not entirely clear how the logic should work for those, but it's something that I'm open to exploring.</div>
55-
</div>
56-
</div>
57-
</div>
58-
</div>
59-
<script src="htmlparser.js"></script>
1+
<!DOCTYPE HTML>
2+
<html>
3+
<head>
4+
<title>Pure JavaScript HTML5 Parser - Demo</title>
5+
<link href="http://netdna.bootstrapcdn.com/twitter-bootstrap/2.3.2/css/bootstrap-combined.min.css" rel="stylesheet">
6+
</head>
7+
<body>
8+
<div class="container" style="padding-top: 30px">
9+
<div class="row">
10+
<div class="span12">
11+
<div class="hero-unit">
12+
<h1>Pure JavaScript HTML5 Parser</h1>
13+
<p>All-in-one: XML Serializer, DOM Builder, DOM Document Creator, A SAX-style API </p>
14+
<p>
15+
<a class="btn btn-primary btn-large" href="https://github.com/blowsie/Pure-JavaScript-HTML-Parser">Learn more</a>
16+
</p>
17+
</div>
18+
</div>
19+
</div>
20+
<div class="row">
21+
<div class="span5">
22+
<div style="padding: 10px">
23+
<form id="form">
24+
<label>Input (HTML):</label><br />
25+
<textarea cols="60" rows="10" id="input" style="width: 100%;"></textarea><br />
26+
<input type="submit" value="Run" class="btn btn-primary" />
27+
</form>
28+
<br />
29+
<label>Output (XML):</label><br />
30+
<textarea cols="60" rows="10" id="output" style="width: 100%;"></textarea>
31+
</div>
32+
</div>
33+
<div class="span7">
34+
<div style="padding: 10px">
35+
<p>While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff. All of the following are accounted for:</p>
36+
<ul>
37+
<li>Unclosed Tags:
38+
<pre>HTMLtoXML("&lt;p>&lt;b>Hello") == '&lt;p>&lt;b>Hello&lt;/b>&lt;/p>'</pre>
39+
</li>
40+
<li>Empty Elements:
41+
<pre>HTMLtoXML("&lt;img src=test.jpg>") == '&lt;img src="test.jpg">'</pre>
42+
</li>
43+
<li>Block vs. Inline Elements:
44+
<pre>HTMLtoXML("&lt;b>Hello &lt;p>John") == '&lt;b>Hello &lt;/b>&lt;p>John&lt;/p>'</pre>
45+
</li>
46+
<li>Self-closing Elements:
47+
<pre>HTMLtoXML("&lt;p>Hello&lt;p>World") == '&lt;p>Hello&lt;/p>&lt;p>World&lt;/p>'</pre>
48+
</li>
49+
<li>Attributes Without Values:
50+
<pre>HTMLtoXML("&lt;input disabled>") == '&lt;input disabled="disabled">'</pre>
51+
</li>
52+
</ul>
53+
<br />
54+
<div class="alert alert-info"><b>Note:</b> It does <b>not</b> take into account where in the document an element should exist. Right now you can put block elements in a head or th inside a p and it'll happily accept them. It's not entirely clear how the logic should work for those, but it's something that I'm open to exploring.</div>
55+
</div>
56+
</div>
57+
</div>
58+
</div>
59+
<script src="htmlparser.js"></script>
6060
<script>
61-
window.onload = function () {
61+
window.onload = function () {
6262
var input = document.getElementById("input");
6363
var output = document.getElementById("output");
6464
var form = document.getElementById("form");
6565

6666
input.value = "<p>hello <b style='test foo' disabled align=\"b\\\"ar\">john <a href='http://ejohn.org/'>resig</b><img src=test.jpg></img><div>test</div><p>hello world";
6767
output.value = "";
6868

69-
form.onsubmit = function (e) {
69+
form.onsubmit = function (e) {
7070
if (e) e.preventDefault();
7171
if (typeof event != "undefined") event.returnValue = false;
7272

7373
output.value = HTMLtoXML(input.value);
74-
return false;
75-
};
74+
return false;
75+
};
7676
};
77-
</script>
78-
</body>
79-
</html>
77+
</script>
78+
</body>
79+
</html>

0 commit comments

Comments
 (0)