File tree Expand file tree Collapse file tree 1 file changed +13
-9
lines changed Expand file tree Collapse file tree 1 file changed +13
-9
lines changed Original file line number Diff line number Diff line change @@ -39,24 +39,28 @@ pip install bert-pytorch
3939## Quickstart
4040
4141** NOTICE : Your corpus should be prepared with two sentences in one line with tab(\t) separator**
42+
43+ ### 0. Prepare your corpus
4244```
43- Welcome to the \t the jungle \n
44- I can stay \t here all night \n
45+ Welcome to the \t the jungle\n
46+ I can stay \t here all night\n
4547```
4648
47- ### 1. Building vocab based on your corpus
48- ``` shell
49- bert-vocab -c data/corpus.small -o data/corpus.small.vocab
49+ or tokenized corpus (tokenization is not in package)
5050```
51+ Wel_ _come _to _the \t _the _jungle\n
52+ _I _can _stay \t _here _all _night\n
53+ ```
54+
5155
52- ### 2 . Building BERT train dataset with your corpus
56+ ### 1 . Building vocab based on your corpus
5357``` shell
54- bert-dataset -d data/corpus.small -v data/corpus.small.vocab - o data/dataset .small
58+ bert-vocab -c data/corpus.small -o data/vocab .small
5559```
5660
57- ### 3 . Train your own BERT model
61+ ### 2 . Train your own BERT model
5862``` shell
59- bert -d data/dataset.small -v data/corpus .small.vocab -o output/bert.model
63+ bert -c data/dataset.small -v data/vocab .small -o output/bert.model
6064```
6165
6266## Language Model Pre-training
You can’t perform that action at this time.
0 commit comments