Skip to content

Commit 8f73625

Browse files
committed
clarify difference between Scrapely and Scrapy in README
1 parent eb94694 commit 8f73625

File tree

1 file changed

+37
-5
lines changed

1 file changed

+37
-5
lines changed

README.rst

Lines changed: 37 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,33 @@
11
========
2-
scrapely
2+
Scrapely
33
========
44

55
Scrapely is a library for extracting structured data from HTML pages. Given
66
some example web pages and the data to be extracted, scrapely constructs a
77
parser for all similar pages.
88

9+
How does Scrapely relate to `Scrapy`_?
10+
======================================
11+
12+
Despite the similarity in their names, Scrapely and `Scrapy`_ are quite
13+
different things. The only similarity they share is that they both depend on
14+
`w3lib`_, and they are both maintained by the same group of developers (which
15+
is why both are hosted on the `same Github account`_).
16+
17+
Scrapy is an application framework for building web crawlers, while Scrapely is
18+
a library for extracting structured data from HTML pages. If anything, Scrapely
19+
is more similar to `BeautifulSoup`_ or `lxml`_ than Scrapy.
20+
21+
Scrapely doesn't depend on Scrapy nor the other way around. In fact, it is
22+
quite common to use Scrapy without Scrapely, and viceversa.
23+
24+
If you are looking for a complete crawler-scraper solution, there is (at least)
25+
one project called `Slybot`_ that integrates both, but you can definitely use
26+
Scrapely on other web crawlers since it's just a library.
27+
28+
Scrapy has a builtin extraction mechanism called `selectors`_ which (unlike
29+
Scrapely) is based on XPaths.
30+
931
Usage (API)
1032
===========
1133

@@ -111,10 +133,13 @@ To scrape another similar page with the already added templates::
111133
Requirements
112134
============
113135

114-
* numpy
115-
* w3lib
116-
* simplejson or Python 2.6+
136+
Scrapely depends on the following libraries:
137+
138+
* numpy
139+
* w3lib
140+
* simplejson or Python 2.6+
117141

142+
Note that Scrapely **does not** depend on `Scrapy`_ in any way.
118143

119144
Installation
120145
============
@@ -131,7 +156,6 @@ And then install scrapely with::
131156

132157
aptitude install python-scrapely
133158

134-
135159
Architecture
136160
============
137161

@@ -165,3 +189,11 @@ License
165189
=======
166190

167191
Scrapely library is licensed under the BSD license.
192+
193+
.. _Scrapy: http://scrapy.org/
194+
.. _w3lib: https://github.com/scrapy/w3lib
195+
.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
196+
.. _lxml: http://lxml.de/
197+
.. _same Github account: https://github.com/scrapy
198+
.. _slybot: https://github.com/scrapy/slybot
199+
.. _selectors: http://doc.scrapy.org/en/latest/topics/selectors.html

0 commit comments

Comments
 (0)