The Challenge
Chai uses the best language models in the world to power the chatbots that users speak to on our platform. Up until now, we have more than 100 thousand daily active users on our mobile app with more than 1 million messages sent per day. You could be part of our journey by helping us build state of the art language models and improve our chatbots.
The goal of this competition is to make improvements to the state of the art language model and improving the quality of the conversations people have with the chatbots.
Prize
$1,000 every time you beat the best score on the MCL leaderboard by at least 0.5. $500 every time you beat the best score on the Reward model leaderboard.
Evaluation
Submissions are first evaluated on the median conversation length (MCL) that actual users will have with your model. When you submit a new model, we will run an A/B test on our app and have users speak it. Your score is the median length of conversations that users have with it.
Submissions are also evaluated based on a Reward Model score. This reward model scores your model responses, if the score is 0.9, then we estimate a 10% probability that the conversation will end. Higher reward model scores should lead to higher MCL but don't always do. You can run the reward model locally to better understand how to improve your models.
Leaderboard
Median conversation length (MCL)
Submission | Participant | Score | Latency (s) |
---|---|---|---|
🥇hakurei/litv2-6B-rev2 | Reimu Hakurei | +1.95 | 1.74 |
🥈hakurei/litv2-6B-rev1 | Reimu Hakurei | +1.24 | 1.65 |
🥉hakurei/lit-6B | Reimu Hakurei | +1.12 | 2.17 |
KoboldAI/GPT-J-6B-Shinen | Julius ter Pelkwijk | +0.65 | 3.34 |
EleutherAI/gpt-j-6B | EleutherAI | 0 | 2.48 |
KoboldAI/OPT-6B-nerys-v2 | Julius ter Pelkwijk | -6.31 | 12.80 |
Reward model
Submission | Participant | Reward |
---|---|---|
🥇KoboldAI/GPT-J-6B-Shinen | Julius ter Pelkwijk | 0.8803 |
🥈EleutherAI/gpt-j-6B | EleutherAI | 0.8797 |
🥉hakurei/litv2-6B-rev2 | Reimu Hakurei | 0.8796 |
KoboldAI/OPT-6B-nerys-v2 | Julius ter Pelkwijk | 0.8781 |
Example
The models we use are hosted on HuggingFace and you can run the example script from our Google Colab notebook.
Submission
Upload your model to HuggingFace and send the link to us here. The deadline for submission is January 1, 2023.
FAQ
How can I submit a model? Once you have a model you want to submit, upload it to HuggingFace and share it in the submissions channel for the team to review! We will be in touch with you immediately.
Do you provide computing resources? At this stage we don't, we expect people will use Colab or their own setup. If this is a massive problem we can look into this with you.
Is latency important? The latency for our currently deployed models is ~1.5s for an inference. We estimate that for every 1s improvement in latency your score will go up by 0.4MCL. We cannot deploy submissions if they take longer than ~4s per inference.
Can I get some help? Yes! We are happy to help you with any questions or clarifications. Reach out to us on WhatsApp or Discord.
What is the deadline for submissions? The competition will run until Jan. 1st 2023.
What happens once I submit? We will deploy your model asap and be in touch with you. If your solution tops the leaderboard we'll have a call with you to see your code: we want to share winning solutions with other participants so that the community can build on one another's work.
Resources
- Guide to fine-tuning Text Generation models: GPT-2, GPT-Neo and T5
- Example user conversation dataset on HuggingFace: ChaiML/user_model_inputs