toolbox/waifu
11b 5dbde00d27 feat: bring down target word count per episode
After tokenization, most stuff was going over the 2048 context window so let's bring this down a little.
2022-12-26 17:31:28 -03:00
..
core feat: bring down target word count per episode 2022-12-26 17:31:28 -03:00
datasets fix: human/bot messages being incorrectly labeled as eachother 2022-12-24 17:58:33 -03:00
modules feat: some minor filtering to hopefully improve CAI data 2022-12-26 12:04:04 -03:00
scripts chore: update module list in build_dataset.py 2022-12-23 16:45:18 -03:00
utils feat: add the LIGHT dataset and VDM 2022-12-18 17:29:15 -03:00
__init__.py fixup! feat: initial commit 2022-12-18 17:29:15 -03:00