Building Models & Improvement Wishes -Llama Stack RFC
Building Models -Llama Stack RFC
Meta - wants to enable everyone to get the most out of the 405B,
such as
- Real-time and batch inference
- Supervised fine-tuning
- Evaluation of your model for your specific application
- Continual pre-training
- Retrieval-Augmented Generation (RAG)
- Function calling
- Synthetic data generation
RFC-0001 - Llama Stack · Issue #6 · meta-llama/llama-toolchain (github.com)
Model Improvement Wishes
Model Capabilities
- Structured Output: Improved support for generating structured data (e.g., JSON) through restricted prediction.
- Knowledge Distillation: Efficient tools and pipelines for transferring knowledge from larger models to smaller ones.
Fine-Tuning
- Continued Pre-training: Easier access to sampled pre-training data for maintaining data distribution consistency.
- Preference Optimization: Best practices and recipes for fine-tuning using Rejected Sampling (RS) and Direct Preference Optimization (DPO).
Model Architecture
- Agentic Capabilities: Tools and interfaces for integrating Monte Carlo Tree Search (MCTS) with LLMs to enhance logical reasoning.
Models
- Restricted Prediction: It would be awesome to have native and efficient support for restricted prediction along with a pre-defined schema, such as JSON schema support in llama.cpp grammars. This could boost the usability of the models for generating reliable structured data.
Fine-tuning
Continued Pretraining: Maintaining the original data distribution during continued pre-training is tricky. A usual approach is to mix sampled pre-training data with new training data. If Llama models could provide access to a sampled pre-training dataset, it would make this process a lot smoother and ensure consistency.
Knowledge Distillation: The 405B model is amazing. I wish that we could have an E2E knowledge distillation tool/pipeline/API to fine-tune using token distributions from teacher models.
RS/DPO: It is good to know that Llama3.1 has switched from RLFH to Rejected Sampling (RS) and Direct Preference Optimization (DPO) for optimizing preferences. It would be amazing to offer sample fine-tuning recipes or best practices that could help us fine-tune Llama3.1 models effectively and avoid overfitting.
Agentic
- MCTS + LLM: There have been some cool attempts to use Monte Carlo Tree Search (MCTS) to boost the logical reasoning capabilities of LLMs. However, there’s still a gap when it comes to tools or viable paths for tightly integrating MCTS with LLMs, except for some rumored projects in closed-source models. Creating a robust tool or interface for this integration would be a huge win for AI agent developers.
Comments
Post a Comment