AntAngelMed: The Open-Source Medical AI Model Pushing Efficiency Boundaries with MoE Architecture

A New Milestone in Open-Source Medical AI

A team of researchers from China has unveiled AntAngelMed, a large open-source medical language model that they claim is the largest and most capable of its kind currently available. With a total of 103 billion parameters, this model introduces a novel approach to balancing performance and computational cost, potentially transforming how medical AI is deployed in resource-constrained settings.

AntAngelMed: The Open-Source Medical AI Model Pushing Efficiency Boundaries with MoE Architecture — Source: www.marktechpost.com

What Makes AntAngelMed Different? The MoE Architecture

Unlike conventional dense models where every parameter is activated for each input, AntAngelMed employs a Mixture-of-Experts (MoE) architecture with a remarkably low 1/32 activation ratio. This means only 6.1 billion parameters are active during any single inference, while the remaining parameters remain dormant until needed.

To understand the significance, it helps to compare how MoE works. In a typical dense network, all parameters process every token, leading to high computational demand. In contrast, an MoE model splits its network into many 'expert' sub-networks. A smart routing mechanism selects only a small subset of these experts per input, allowing the model to maintain a large total parameter count (which correlates with knowledge capacity) while keeping inference costs proportional to the active parameter count. This design delivers up to a 7× efficiency improvement over similarly sized dense architectures. According to the research team, with only 6.1B active parameters, AntAngelMed can match the performance of a dense model with roughly 40 billion parameters. Moreover, as output length grows during inference, the speed advantage can exceed 7× compared to dense models of comparable size.

Built on Ling-Flash-2.0 with Custom Optimizations

AntAngelMed inherits its core design from Ling-flash-2.0, a base model developed by inclusionAI and guided by the team's Ling Scaling Laws. The researchers layered several specific optimizations on top:

Refined expert granularity – finer sub-division of experts for better specialization
Tuned shared expert ratio – balancing shared knowledge with task-specific expertise
Attention balance mechanisms – ensuring all experts receive fair training signals
Sigmoid routing without auxiliary loss – simplifying the routing function while maintaining stability
MTP (Multi-Token Prediction) layer – predicting multiple tokens ahead for faster generation
QK-Norm – normalizing query and key vectors for improved attention stability
Partial-RoPE (Rotary Position Embedding) – applying positional encoding only to a subset of attention heads, reducing overhead

These enhancements together enable small-activation MoE models to rival much larger dense models in performance while being significantly more efficient.

The model is available on ModelScope at: AntAngelMed on ModelScope.

Three-Stage Training Pipeline

AntAngelMed underwent a carefully designed three-stage training process that builds general language understanding and then deeply adapts to the medical domain.

Stage 1: Continual Pre-Training on Medical Corpora

The first stage involved continual pre-training on large-scale medical data sources, including encyclopedias, web text, and academic publications. Starting from the Ling-flash-2.0 checkpoint, the model already had a strong foundation in general reasoning. This stage refines its capabilities specifically for medical knowledge, absorbing terminology, clinical guidelines, and biomedical literature.

Stage 2: Supervised Fine-Tuning (SFT) with Multi-Source Instructions

In the second stage, the model underwent Supervised Fine-Tuning on a diversified instruction dataset. This dataset carefully mixes general reasoning tasks – such as math, programming, and logic – to preserve the model's chain-of-thought capabilities, alongside specialized medical scenarios:

Doctor-patient Q&A interactions
Diagnostic reasoning exercises
Safety and ethics case studies

This balanced approach ensures AntAngelMed remains proficient in logical reasoning while gaining expert-level medical understanding.

Stage 3: Reinforcement Learning with GRPO

The final stage employs Reinforcement Learning using the GRPO (Group Relative Policy Optimization) algorithm. Combined with task-specific reward models, GRPO fine-tunes the model to produce more accurate, safe, and clinically useful responses. Reinforcement learning helps the model generalize beyond its training data and avoid common pitfalls in medical AI, such as hallucination or unsafe advice.

By the end of this training pipeline, AntAngelMed emerges as a highly capable medical language model that balances massive knowledge capacity with practical inference efficiency – a combination that could make advanced AI assistance more accessible in healthcare settings worldwide.

For those interested in the technical details, the official ModelScope repository contains the model weights and additional documentation.