toolbox/ROADMAP.md

3.8 KiB

Project Roadmap

An intro for dummies

If you're not familiar with all the theory/technology behind the project, here's a super simplified rundown:

  • There are "text generation AIs" that are freely available for researchers. These are called open-source LMs (language models).
  • Modern chatbots are usually made by taking a language model and "fine-tuning" it, which basically just means feeding it data similar to what you want it to generate.
    • In our case, this means fine-tuning it with conversation and roleplay data (research usually calls this "dialogue data", and they call models fine-tuned on dialogue data "dialogue models").
  • LMs can have different "sizes". For example, Meta's OPT language model is offered in 125m, 350m, 1.3B, 2.7B, 6.7B, 30B and 66B sizes (where "m" = million and "B" = billion parameters).
    • The bigger the model, the better its quality. However, the more hardware you need to fine-tune and use it. And when I say more, I don't mean a couple more gigabytes of system RAM, I mean going from a single 6GB GPU to hundreds of 80GB GPUs.

So, knowing the above, our main "top-level"/medium-term objective at the moment is to get as much good quality data as we can, and fine-tune the biggest model we can. From there, we can play around with the models and see what the results are like, then debate and decide how to move forward.


For anyone who's interested in the actual details, here's a TL;DR version of the project's current roadmap at the task level:

Current Status

  • We have all the tooling to build a dataset from various sources, fine-tune a pre-trained LM on that dataset, and then run inference on checkpoints saved during the fine-tune process.
    • All of that tooling can be found within this repository.
  • We have taken a small model, Meta's OPT-350m, and fine-tuned it on a small dataset we've built with the tooling described above. We've released it as a tiny prototype.
    • The model checkpoint is hosted on HuggingFace under Pygmalion-AI/pygmalion-350m.
    • Note: Inference should not be done on the regular HuggingFace web UI since we need to do some prompt trickery and response parsing. To play around with the model, try out this notebook.
  • We have written a userscript which can anonymize and dump your CharacterAI chats, and made a website where you can upload them to be used as training data for future models. If you're interested in contributing, please read through this Rentry for more information.
  • We released a prototype 1.3B model fine-tuned on a new dataset, which includes the anonymized CharaterAI data.
    • It's hosted on HuggingFace under Pygmalion-AI/pygmalion-1.3b.
    • We've already received feedback from several users (thank you everyone who took the time to test out the model and write to us!) and identified several shortcomings in it.

Next Steps

  • We're on the lookout for more high-quality data sources, and we still welcome new CharacterAI dumps.
  • We plan on training a new 1.3B model, but only after brainstorming and making changes to our data processing pipeline in an attempt to improve on the problems we've seen on the current 1.3B.
    • Feel free to join the Matrix server if you want to join in on the discussions.
  • Once we have a decently performing 1.3B model, we plan on setting up some sort of way for people to rate/rank bot responses and send that data to us, which will then be used to train a reward model to be used for a PPO fine-tune, similar to what OpenAI did for ChatGPT. We might join forces with other communities for this step, as well.