Welcome to the RunPod Roundup! In this week we'll be discussing new text and image generation models, including an exciting new Stable Diffusion model.
The goal of RunPod Roundup is to keep you abreast of new developments over the week that you might have missed, with a focus on new models and offerings and other actionable developments that you can run in a RunPod instance right this moment.
High Context LLM Models Now Available - 8k Through 16k Tokens
After a very long time of languishing in the sufficient-but-not-optimal area of 2k tokens, LLM models are finally breaking through this barrier. Earlier in the month, we saw SuperHot 8k context models from TheBloke and this week we saw 16k models from Panchovix. Despite higher VRAM requirements, all of these models fit comfortably within RunPod instances (though if you were already on the borderline for your chosen GPU/model combo, you may need to bump it up to the next higher level of GPU.)
These larger context options could not have come at a better time, as many applications of LLMs have been butting up against this limit for quite some time– for example, a roleplay where the AI plays multiple characters can easily run into hundreds of tokens for a single round of responses which can fill up the window quickly. Give the new models a shot and let us know what you think!
Stable Diffusion XL (SDXL) Released For All
After spending some time percolating in beta, SDXL is now available for anyone to download. According to the creators, this version corrects some long-standing (and often parodied) problems with the original model, such as human anatomy and text. It does not appear that SDXL is compatible with Automatic1111 yet, but fear not - we'll have a how-to article coming out shortly on how to get it set up in a RunPod instance through alternate means. For now, though, you can grab the model from the StabilityAI HuggingFace site.
Meta and Microsoft Release their Llama 2 Open Source LLM Model
Llama 2 is now available for download, with 7b, 13b, and 70b parameter size available. The model boasts a 4k contest length and has been built with dialogue in mind using Reinforcement Learning from Human Feedback. According to human evaluators, the model performs comparably to ChatGPT and you can run it right in your own RunPod pod. The HF site advises that you may need an A100 just for the 13B model, so be aware of the heightened resource requirements compared to other commonly available open-source LLM models.
Note: You'll need to first request access to Llama 2 from Meta here and have it approved before you'll be able to download it from Huggingface, and the turnaround may be a few days.
Feel free to reach out to RunPod directly if you have any questions about these latest developments, and we'll see what we can do for you!