Building Models & Improvement Wishes -Llama Stack RFC

Building Models -Llama Stack RFC

Meta - wants to enable everyone to get the most out of the 405B,

such as

Structured Output: Improved support for generating structured data (e.g., JSON) through restricted prediction.
Knowledge Distillation: Efficient tools and pipelines for transferring knowledge from larger models to smaller ones.

Continued Pre-training: Easier access to sampled pre-training data for maintaining data distribution consistency.
Preference Optimization: Best practices and recipes for fine-tuning using Rejected Sampling (RS) and Direct Preference Optimization (DPO).

Agentic Capabilities: Tools and interfaces for integrating Monte Carlo Tree Search (MCTS) with LLMs to enhance logical reasoning.

Disciussion

Restricted Prediction: It would be awesome to have native and efficient support for restricted prediction along with a pre-defined schema, such as JSON schema support in llama.cpp grammars. This could boost the usability of the models for generating reliable structured data.

Continued Pretraining: Maintaining the original data distribution during continued pre-training is tricky. A usual approach is to mix sampled pre-training data with new training data. If Llama models could provide access to a sampled pre-training dataset, it would make this process a lot smoother and ensure consistency.
Knowledge Distillation: The 405B model is amazing. I wish that we could have an E2E knowledge distillation tool/pipeline/API to fine-tune using token distributions from teacher models.
RS/DPO: It is good to know that Llama3.1 has switched from RLFH to Rejected Sampling (RS) and Direct Preference Optimization (DPO) for optimizing preferences. It would be amazing to offer sample fine-tuning recipes or best practices that could help us fine-tune Llama3.1 models effectively and avoid overfitting.

MCTS + LLM: There have been some cool attempts to use Monte Carlo Tree Search (MCTS) to boost the logical reasoning capabilities of LLMs. However, there’s still a gap when it comes to tools or viable paths for tightly integrating MCTS with LLMs, except for some rumored projects in closed-source models. Creating a robust tool or interface for this integration would be a huge win for AI agent developers.