Unicode literal in BufferedStream._readFromBuffer causes failure in HTMLBinaryInputStream

In html5lib/inputstream.py, `unicode_literals` is imported from `__future__`. This causes `html5lib.inputstream.BufferedStream` to misbehave, specifically the `_readFromBuffer` method, which ends with `return "".join(rv)`. Due to this being a unicode literal, any read from after the first becomes a chunk of unicode instead of a chunk of bytes.

An example of the problem caused:

``` python
from urllib2 import Request, urlopen
from html5lib.inputstream import HTMLBinaryInputStream

req = Request(url='http://example.org/')
source = urlopen(req)
HTMLBinaryInputStream(source)
```

Causing:

``` pytb
Traceback (most recent call last):
  File "<stdin>", line 6, in <module>
  File ".../html5lib/inputstream.py", line 411, in __init__
    self.charEncoding = self.detectEncoding(parseMeta, chardet)
  File ".../html5lib/inputstream.py", line 448, in detectEncoding
    encoding = self.detectEncodingMeta()
  File ".../html5lib/inputstream.py", line 535, in detectEncodingMeta
    assert isinstance(buffer, bytes)
AssertionError
```

(That is, when `HTMLBinaryInputStream` is used with a file-like object (such as the result of `urllib2.urlopen`), it wraps it in a `BufferedStream`, which then fails (at line 535) with the `assert isinstance(buffer, bytes)`.)

This can be fixed by using a byte literal in `_readFromBuffer`, instead, i.e. `return b"".join(rv)`. (There are at least three places in inputstream.py where string literals are used like this: at lines 117, 318 and 348.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unicode literal in BufferedStream._readFromBuffer causes failure in HTMLBinaryInputStream #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unicode literal in BufferedStream._readFromBuffer causes failure in HTMLBinaryInputStream #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions