Seq2Seq model that restores punctuation on English input text.
No dependencies needed besides Python 3.7.4, virtualenv, and TensorFlow.
virtualenv env
source env.sh
pip install tensorflow # or tensorflow-gpu / custom wheelFor more information on the project structure, see the README in the tensorflow-boilerplate repository.
-
Google News Word2Vec: place the .bin file in the
datadirectory, runpython -m scripts.install_word2vec, and optionally delete the.binfile. -
WikiText: download, unzip, and place both of the word-level datasets in the
datadirectory. Clean the data withpython -m scripts.clean_wikitext, and optionally delete the original*.tokensand*.rawfiles.
This will train a model with non-default batch size and learning rate, and will save its weights in experiments/myexperiment0:
source env.sh
run fit myexperiment0 seq2seq wikitext --batch_size=32 --learning_rate=0.001Modify other hyperparameters similarly with --name=value. To see all supported hyperparameters, check the main classes on models/seq2seq.py and data_loaders/wikitext.py.
To evaluate the trained model based on gold-normalized edit distance using beam search:
run evaluate myexperiment0 --beam_width=5To interact with the trained model in the console by typing input sentences:
run interact myexperiment0