Hugging Face Model Trainer
by Hugging Face
Hugging Face Model Trainer is a community-built agent skill that brings professional-grade model fine-tuning directly into your AI coding assistant. Developed by the Hugging Face team, this skill enables AI agents running in Claude Code, OpenAI Codex, Google Gemini CLI, and Cursor to orchestrate the entire lifecycle of language model training -- from dataset validation and hardware selection through job submission, real-time monitoring, and model publication on the Hugging Face Hub. At its core, the skill leverages TRL (Transformer Reinforcement Learning), the industry-standard library for post-training foundation models. It supports three primary training paradigms: Supervised Fine-Tuning (SFT) for instruction-following with high-quality input-output pairs, Direct Preference Optimization (DPO) for aligning model outputs with human preferences using chosen-rejected response pairs, and Group Relative Policy Optimization (GRPO) for reinforcement learning on tasks with verifiable reward signals such as mathematical reasoning and code generation. Reward modeling for RLHF pipelines is also supported. The skill abstracts away the complexity of GPU infrastructure by running training jobs on Hugging Face's managed cloud compute. It automatically selects appropriate hardware based on model size, ranging from affordable T4 instances for sub-1B parameter demos to A100 and H100 GPUs for production workloads up to 13B+ parameters. For larger models, it automatically applies LoRA (Low-Rank Adaptation) to reduce memory consumption while preserving quality. Cost estimation is built in, allowing users to preview expected expenses before committing GPU resources. Post-training, the skill supports GGUF conversion with quantization for local deployment via llama.cpp, Ollama, or LM Studio. Real-time training metrics are streamed through Trackio dashboards, providing visibility into loss curves, learning rates, and estimated completion times. All trained models are automatically pushed to the Hugging Face Hub with proper versioning and metadata, making them immediately available for inference or further iteration. With 8.1k GitHub stars on the parent repository and backing from Hugging Face's ecosystem of 1M+ models, this skill transforms any compatible AI coding agent into a capable machine learning engineering assistant.
Installation
Key Features
- ✓Supports three core training methods -- SFT for instruction tuning, DPO for preference alignment, and GRPO for reinforcement learning with verifiable rewards -- plus reward modeling for RLHF pipelines
- ✓Automatic hardware selection and cost estimation across GPU tiers from T4 ($1-2 for demos) through A10G and A100 to H100 instances for production-scale training
- ✓Built-in LoRA and PEFT support that automatically engages for models above 3B parameters, reducing VRAM requirements by up to 60% with optional Unsloth integration
- ✓Real-time Trackio monitoring dashboards showing training loss, learning rate, validation metrics, and estimated completion time for running jobs
- ✓GGUF conversion pipeline with quantization support for deploying fine-tuned models locally via llama.cpp, Ollama, or LM Studio
- ✓Dataset validation that checks format compatibility before GPU allocation, preventing the 50%+ of training failures caused by dataset format mismatches
- ✓Automatic model persistence to the Hugging Face Hub with proper metadata, making trained models immediately available for inference or sharing
- ✓Cross-platform compatibility with Claude Code, OpenAI Codex, Google Gemini CLI, and Cursor, enabling the same fine-tuning workflow across different AI assistants
Use Cases
- →Fine-tuning a small language model on domain-specific customer support conversations using SFT to create a specialized chatbot
- →Aligning a code generation model with developer preferences using DPO training on chosen-rejected response pairs from code review data
- →Training a math reasoning model using GRPO on benchmarks like GSM8K, where correctness can be programmatically verified
- →Converting a fine-tuned model to GGUF format with Q4_K_M quantization for local deployment in air-gapped or privacy-sensitive environments
- →Running low-cost demo training runs at under $1 to validate dataset formatting and pipeline configuration before committing to production-scale GPU hours
- →Building a reward model for RLHF pipelines that scores response quality based on human preference annotations