Building Models & Improvement Wishes -Llama Stack RFC

Building Models -Llama Stack RFC

Meta - wants to enable everyone to get the most out of the 405B,

such as 

  • Real-time and batch inference
  • Supervised fine-tuning
  • Evaluation of your model for your specific application
  • Continual pre-training
  • Retrieval-Augmented Generation (RAG)
  • Function calling
  • Synthetic data generation

RFC-0001 - Llama Stack · Issue #6 · meta-llama/llama-toolchain (github.com)

Model Improvement Wishes

Model Capabilities

  • Structured Output: Improved support for generating structured data (e.g., JSON) through restricted prediction.
  • Knowledge Distillation: Efficient tools and pipelines for transferring knowledge from larger models to smaller ones.

Fine-Tuning

  • Continued Pre-training: Easier access to sampled pre-training data for maintaining data distribution consistency.
  • Preference Optimization: Best practices and recipes for fine-tuning using Rejected Sampling (RS) and Direct Preference Optimization (DPO).

Model Architecture

  • Agentic Capabilities: Tools and interfaces for integrating Monte Carlo Tree Search (MCTS) with LLMs to enhance logical reasoning.

Disciussion

Models

  • Restricted Prediction: It would be awesome to have native and efficient support for restricted prediction along with a pre-defined schema, such as JSON schema support in llama.cpp grammars. This could boost the usability of the models for generating reliable structured data.

Fine-tuning

  • Continued Pretraining: Maintaining the original data distribution during continued pre-training is tricky. A usual approach is to mix sampled pre-training data with new training data. If Llama models could provide access to a sampled pre-training dataset, it would make this process a lot smoother and ensure consistency.

  • Knowledge Distillation: The 405B model is amazing. I wish that we could have an E2E knowledge distillation tool/pipeline/API to fine-tune using token distributions from teacher models.

  • RS/DPO: It is good to know that Llama3.1 has switched from RLFH to Rejected Sampling (RS) and Direct Preference Optimization (DPO) for optimizing preferences. It would be amazing to offer sample fine-tuning recipes or best practices that could help us fine-tune Llama3.1 models effectively and avoid overfitting.

Agentic

  • MCTS + LLM: There have been some cool attempts to use Monte Carlo Tree Search (MCTS) to boost the logical reasoning capabilities of LLMs. However, there’s still a gap when it comes to tools or viable paths for tightly integrating MCTS with LLMs, except for some rumored projects in closed-source models. Creating a robust tool or interface for this integration would be a huge win for AI agent developers.

Comments

Popular posts from this blog

AI Agents for Enterprise Leaders -Next Era of Organizational Transformation

Airport twin basic requirements

AI രസതന്ത്രജ്ഞൻ: തൂവൽ പോലെ ഭാരം കുറഞ്ഞ സ്റ്റീലിന്റെ സ്വപ്നം യാഥാർത്ഥ്യമായ കഥ