MIT’s Breakthrough Lets AI Models Learn New Skills Without Erasing Old

Large language models today are static. Deploy one, and it stays fixed—unable to adapt, grow, or absorb new knowledge without risking the loss of everything it already knows. For businesses relying on AI to handle specialized tasks, this rigidity forces a costly workaround: maintaining separate models for each function, draining resources and complicating operations.

Researchers at MIT, the Improbable AI Lab, and ETH Zurich have introduced a solution. Called self-distillation fine-tuning (SDFT), the technique enables a single model to learn new capabilities—such as medical reasoning or legal analysis—without degrading its existing skills. Unlike traditional methods that either freeze knowledge or require brute-force retraining, SDFT leverages a model’s own reasoning to refine itself, mimicking how humans accumulate expertise over time.

The implications for enterprises are profound. Instead of juggling a fragmented collection of models—each optimized for a single purpose—companies could deploy a single, evolving AI system capable of handling diverse tasks. This could slash inference costs, streamline deployment, and eliminate the need for repeated retraining cycles.

The Core Problem: Why AI Can’t Learn Like Humans

Most AI training today follows one of two approaches, both with critical flaws. Supervised fine-tuning (SFT) trains models by feeding them static datasets of expert demonstrations. The result? A model that excels at its assigned task but often forgets how to perform older ones—a phenomenon called catastrophic forgetting. Worse, it struggles to generalize beyond the examples it was shown.

The alternative, reinforcement learning (RL), rewards models for correct outputs. While effective for tasks with clear metrics—like solving math problems—RL fails when teaching entirely new information. Without prior knowledge, a model has no way to generate correct answers, meaning it never receives the positive feedback needed to learn. As one MIT researcher noted, a model with zero knowledge of a topic will never improve, no matter how many attempts it makes.

How SDFT Breaks the Barrier

SDFT sidesteps these limitations by turning the model into its own teacher. The process works in two phases

Teacher Role: A frozen version of the model analyzes a query alongside expert demonstrations, deducing the correct reasoning path using in-context learning (ICL)—a technique where models solve new problems by studying examples provided during inference.
Student Role: A separate, trainable version of the same model attempts the task blindly, without demonstrations. The teacher then evaluates its performance and guides it toward better reasoning.

This creates an on-policy learning loop: the model learns from its own mistakes, not just static data. Because the feedback comes from the model’s own reasoning—rather than an external reward function—SDFT can internalize new knowledge without overwriting old skills.

Real-World Tests: Stability Meets Performance

To prove its effectiveness, the team tested SDFT on the Qwen 2.5 model across three demanding enterprise scenarios: science question-answering, software tool use, and medical reasoning. The results were striking.

Science Q&A: SDFT achieved 70.2% accuracy, outperforming standard SFT (66.2%). More importantly, it retained 64.5% performance on unrelated tasks—whereas SFT collapsed entirely when learning new skills.
Knowledge Injection: When taught fictional 2025 natural disaster data, SDFT scored 98% on indirect reasoning (e.g., predicting humanitarian aid needs). Standard SFT memorized facts but failed to apply them logically.
Sequential Learning: After mastering science, tool use, and medical tasks in sequence, SDFT accumulated all three skills without regression. Traditional methods oscillated, losing ground each time a new task was added.

For businesses, this means a single model could handle everything from HR queries to legal drafting—growing over time without requiring a full rebuild.

Limitations and the Road Ahead

SDFT isn’t without trade-offs. The method demands 2.5x more compute than standard fine-tuning and is currently optimized for models with at least 4 billion parameters (though the team expects 1 billion-parameter models to work soon). Smaller architectures lack the in-context learning sophistication needed to act as their own teachers.

Yet the long-term vision is clear: AI that improves through use, not just initial training. As compute shifts from training to inference—where models interact with real-world data—the potential to harness those interactions for continuous learning becomes critical. If SDFT or similar methods take hold, the era of static AI models may finally be over.

For now, the technique is available on GitHub, with integration planned for Hugging Face’s Transformer Reinforcement Learning (TRL) library. Enterprises with dynamic needs—and the compute to support it—may soon have a powerful new tool for building AI that truly learns.