Skip to content

Commit 65bd878

Browse files
committed
switch to nosetests as preferred way to run tests. also updated README clarifying about using string for training data (closes #16)
1 parent 8f73625 commit 65bd878

File tree

3 files changed

+12
-32
lines changed

3 files changed

+12
-32
lines changed

README.rst

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,16 +35,16 @@ Scrapely has a powerful API, including a template format that can be edited
3535
externally, that you can use to build very capable scrapers.
3636

3737
What follows is a quick example of the simplest possible usage, that you can
38-
run in the Python shell. This example is also available in the ``example.py``
39-
script, located at the root of the repository.
38+
run in a Python shell.
4039

4140
Start by importing and instantiating the Scraper class::
4241

4342
>>> from scrapely import Scraper
4443
>>> s = Scraper()
4544

4645
Then, proceed to train the scraper by adding some page and the data you expect
47-
to scrape from there::
46+
to scrape from there (note that all keys and values in the data you pass must
47+
be strings)::
4848

4949
>>> url1 = 'http://pypi.python.org/pypi/w3lib'
5050
>>> data = {'name': 'w3lib 1.0', 'author': 'Scrapy project', 'description': 'Library of web-related functions'}
@@ -156,6 +156,12 @@ And then install scrapely with::
156156

157157
aptitude install python-scrapely
158158

159+
Tests
160+
=====
161+
162+
`nose`_ is the preferred way to run tests. Just run: ``nosetests`` from the
163+
root directory.
164+
159165
Architecture
160166
============
161167

@@ -183,7 +189,8 @@ the other hand, the extraction code is reliable and production-ready. So, if
183189
you want to use Scrapely in production, you should use train() with caution and
184190
make sure it annotates the area of the page you intent being annotated.
185191

186-
Alternatively, you can use the Scrapely tool to annotate pages.
192+
Alternatively, you can use the Scrapely command line tool to annotate pages,
193+
which provides more manual control for higher accuracy.
187194

188195
License
189196
=======
@@ -197,3 +204,4 @@ Scrapely library is licensed under the BSD license.
197204
.. _same Github account: https://github.com/scrapy
198205
.. _slybot: https://github.com/scrapy/slybot
199206
.. _selectors: http://doc.scrapy.org/en/latest/topics/selectors.html
207+
.. _nose: http://readthedocs.org/docs/nose/en/latest/

scrapely/tests/__init__.py

Lines changed: 0 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
import sys
22
from os import path
33
from itertools import count
4-
from unittest import TestSuite, TestLoader, main
5-
from doctest import DocTestSuite
64
from scrapely import json
75

86
_PATH = path.abspath(path.dirname(__file__))
@@ -25,28 +23,3 @@ def iter_samples(prefix, html_encoding='utf-8', **json_kwargs):
2523
html_str = open(html_page, 'rb').read()
2624
sample_data = json.load(open(fname + '.json'), **json_load_kwargs)
2725
yield html_str.decode(html_encoding), sample_data
28-
29-
UNIT_TESTS = [
30-
'scrapely.tests.test_extraction',
31-
'scrapely.tests.test_htmlpage',
32-
'scrapely.tests.test_htmlpage_data',
33-
'scrapely.tests.test_pageparsing',
34-
'scrapely.tests.test_template',
35-
'scrapely.tests.test_scraper',
36-
]
37-
38-
DOC_TESTS = [
39-
'scrapely.extractors',
40-
'scrapely.extraction.regionextract',
41-
'scrapely.extraction.similarity',
42-
'scrapely.extraction.pageobjects',
43-
]
44-
45-
def suite():
46-
suite = TestSuite()
47-
for m in UNIT_TESTS:
48-
suite.addTests(TestLoader().loadTestsFromName(m))
49-
for m in DOC_TESTS:
50-
suite.addTest(DocTestSuite(__import__(m, {}, {}, [''])))
51-
return suite
52-

setup.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@
2525

2626
try:
2727
from setuptools import setup
28-
args['test_suite'] = 'scrapely.tests.suite'
2928
args['install_requires'] = ['numpy', 'w3lib']
3029
if sys.version_info < (2, 6):
3130
args['install_requires'] += ['simplejson']

0 commit comments

Comments
 (0)