Initialize
Hi!
This site has been completely revamped in a way that I hope is mostly non-obvious. My main hope is to actually post on it, especially now that my projects are entering a much more interesting phase of their development. The original site was built on the Whiteglass theme for Jekyll, but I found myself kind of constrained by it when trying to make visual improvements.
I actually originally rewrote this all in Svelte but encountered a lot of pain when it came to Markdown formatting conflicting with Svelte functions and trying to auto-escape things such that I could write normal KaTeX without escaping fifty characters. There was probably an elegant way to do it but the way I chose to do it was “port all of my work to Next.JS”.
Here are a few of the things I’m working on, in no particular order and at an extremely surface level:
- HackPPO - I’ve been experimenting with developing modern RL methods using torchtune as a base. Currently it’s just an RLVR recipe, which I am in the process of fleshing out and benchmarking.
- Omniclassifier - I’ve been working on a generic LM classifier checkpoint for use with post-training - both as a sort of distillation of RLAIF and as a foundation for more specific reward/value models. This ties in pretty well to HackPPO.
- No-SFT Post-Training - Is it a good idea to not use any SFT at all when post-training? Probably not. But one can dream. A combination of RLVR and standard PPO can probably get you to the Minimum Viable Chatbot, which while not especially useful would certainly be interesting to talk to. A lot of the above is in service of this kind of silly goal.
Anyway, I hope to post a lot more here in the future. Thanks for reading!
— Aria