- Install Python 3.5+
- Install NLTK 3
- Open terminal / command prompt and enter following command:
$ python >>> import nltk >>> nltk.download('stopwords')
To index data, run index.py script and pass document's directory and directory for storing indexed data:
$ python index.py --help
usage: index.py [-h] docs_path data_path
Index data for boolean retrieval
positional arguments:
docs_path Directory for documents to be indexed
data_path Directory for storing indexed data
optional arguments:
-h, --help show this help message and exit
$ python index.py ./docs ./dataAfter indexing data successfully, run query.py script to perform query:
$ python query.py --help
usage: query.py [-h] query
Boolean query
positional arguments:
query words seperated by space
optional arguments:
-h, --help show this help message and exit
$ python query.py "popular available"
{'D:\\workspace\\boolean-retrieval-engine\\docs\\A Festival of Books.txt'}When provide input for the query script, words must be seperated by space. For example, with input "popular available", it's mean that find all documents which contain popular AND available. The returned result will be a set of documents satisfy the query. All numeric, punctuation and word which is not in dictionary will be ignored.