Commit Graph

5 Commits

Author SHA1 Message Date
11b 5e34b105dc feat: alternative way of handling and augmenting episode data (wip) 2023-01-04 09:05:51 -03:00
11b 96b41dee60 feat: improve handling of special tokens in the Kajiwoto dataset 2022-12-27 12:52:08 -03:00
11b 5dbde00d27 feat: bring down target word count per episode
After tokenization, most stuff was going over the 2048 context window so let's bring this down a little.
2022-12-26 17:31:28 -03:00
11b 60e649f57a feat: some minor filtering to hopefully improve CAI data 2022-12-26 12:04:04 -03:00
11b e0552639fa feat: update CAI dataset/module to handle userscript dumps and use definitions 2022-12-23 16:38:13 -03:00