diff --git a/docs/llm-router-open-source-research.md b/docs/llm-router-open-source-research.md
new file mode 100644
index 0000000..7dc16c4
--- /dev/null
+++ b/docs/llm-router-open-source-research.md
@@ -0,0 +1,395 @@
+# 开源 LLM 路由模型调研报告
+
+> **调研日期**: 2026-04-17  
+> **调研目的**: 寻找可替代 tx402 BERT 路由器的开源方案  
+> **报告版本**: v1.0
+
+---
+
+## 执行摘要
+
+### 核心发现
+
+当前开源 LLM 路由模型生态已较为成熟，主要方案包括：
+
+| 方案 | 准确率 | 延迟 | 成本降低 | 推荐指数 |
+|------|--------|------|---------|---------|
+| **RouteLLM BERT** | 85-92% | 1-5ms | 85% | ⭐⭐⭐⭐⭐ |
+| **Arch-Router 1.5B** | 93% | 50-100ms | - | ⭐⭐⭐⭐ |
+| **RoRF (Random Forest)** | - | - | - | ⭐⭐⭐ |
+
+**关键洞察**: RouteLLM BERT 是现阶段最成熟的方案，已在生产环境验证，社区支持完善。
+
+---
+
+## 1. 主流开源路由方案详解
+
+### 1.1 RouteLLM (LMSYS/UC Berkeley)
+
+**项目信息**
+- **论文**: [RouteLLM: Learning to Route LLMs with Preference Data](https://arxiv.org/abs/2406.18665)
+- **代码**: https://github.com/lm-sys/RouteLLM
+- **机构**: LMSYS, UC Berkeley
+- **发布时间**: 2024年7月
+
+**技术架构**
+
+RouteLLM 提供三种路由器实现：
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    RouteLLM Framework                    │
+├─────────────────────────────────────────────────────────┤
+│  1. Similarity-Weighted (SW) Ranking                   │
+│     - 基于向量相似度的加权 Elo 计算                      │
+│     - 无需训练，冷启动友好                               │
+├─────────────────────────────────────────────────────────┤
+│  2. Matrix Factorization (MF)                          │
+│     - 矩阵分解学习查询-模型评分函数                      │
+│     - 论文报告最佳性能                                   │
+├─────────────────────────────────────────────────────────┤
+│  3. BERT Classifier ⭐ 推荐                             │
+│     - 基于 BERT 的二分类器                              │
+│     - 预测强模型 vs 弱模型                              │
+│     - 延迟: 1-5ms (CPU)                                │
+└─────────────────────────────────────────────────────────┘
+```
+
+**性能指标**
+
+| 基准测试 | 达到 95% GPT-4 性能所需 GPT-4 调用比例 | 成本降低 |
+|---------|--------------------------------------|---------|
+| MT Bench | 14% (使用 LLM Judge 增强数据) | 85% |
+| MMLU | 54% (使用 Golden Label 增强数据) | 14% |
+| GSM8K | 35% | 35% |
+
+**模型规格**
+- **基础模型**: BERT-base-uncased
+- **参数量**: ~110M
+- **输入长度**: 512 tokens
+- **输出**: 二分类 (0=弱模型, 1=强模型)
+- **推理延迟**: 1-5ms (CPU)
+
+**优势**
+- ✅ 完全开源 (代码 + 模型 + 数据集)
+- ✅ 轻量级，适合边缘部署
+- ✅ 基于 Chatbot Arena 真实偏好数据训练
+- ✅ 支持数据增强提升性能
+- ✅ 可泛化到未训练的模型对
+
+**劣势**
+- ⚠️ 仅支持二分类路由（强 vs 弱）
+- ⚠️ 需要针对特定模型对微调以获得最佳效果
+
+**快速开始**
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+
+# 加载 RouteLLM BERT Router
+tokenizer = AutoTokenizer.from_pretrained("lm-sys/routellm-bert")
+model = AutoModelForSequenceClassification.from_pretrained("lm-sys/routellm-bert")
+model.eval()
+
+def route_query(query: str) -> str:
+    inputs = tokenizer(query, return_tensors="pt", truncation=True, max_length=512)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        probs = torch.softmax(outputs.logits, dim=-1)
+        prediction = torch.argmax(probs, dim=-1).item()
+    
+    return "gpt-4" if prediction == 1 else "mixtral-8x7b"
+```
+
+---
+
+### 1.2 Arch-Router (Katanemo Labs)
+
+**项目信息**
+- **论文**: [Arch-Router: Aligning LLM Routing with Human Preferences](https://arxiv.org/abs/2506.16655)
+- **模型**: https://huggingface.co/katanemo/Arch-Router-1.5B
+- **机构**: Katanemo Labs
+- **发布时间**: 2025年6月
+
+**技术架构**
+
+Arch-Router 采用生成式模型架构：
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                  Arch-Router Framework                   │
+├─────────────────────────────────────────────────────────┤
+│  核心创新: Domain-Action Taxonomy                       │
+│  - 使用自然语言定义路由策略                             │
+│  - 支持多维度人类偏好对齐                               │
+├─────────────────────────────────────────────────────────┤
+│  模型架构: 1.5B Generative Language Model               │
+│  - 输入: 用户查询 + 策略描述列表                        │
+│  - 输出: 最佳匹配的策略标识符                           │
+│  - 支持动态添加新策略（无需重新训练）                   │
+└─────────────────────────────────────────────────────────┘
+```
+
+**性能指标**
+- **准确率**: 93%（对比 GPT-4 的 85%）
+- **优势**: 比顶级专有 LLM 平均高 7.71%
+
+**模型规格**
+- **参数量**: 1.5B
+- **架构**: Generative Language Model (类似 Llama)
+- **推理延迟**: 50-100ms (GPU)
+- **训练数据**: 43K 样本
+
+**优势**
+- ✅ 人类偏好对齐，更符合实际使用场景
+- ✅ 支持自然语言策略定义，灵活性高
+- ✅ 添加新模型无需重新训练
+- ✅ 处理多轮对话和复杂意图能力强
+
+**劣势**
+- ⚠️ 模型较大 (1.5B)，推理延迟较高
+- ⚠️ 2025年新发布，生产验证较少
+- ⚠️ 需要 GPU 才能达到可接受延迟
+
+**快速开始**
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+
+tokenizer = AutoTokenizer.from_pretrained("katanemo/Arch-Router-1.5B")
+model = AutoModelForCausalLM.from_pretrained("katanemo/Arch-Router-1.5B")
+
+# 定义路由策略
+policies = [
+    {"id": "code", "description": "Programming and code generation tasks"},
+    {"id": "math", "description": "Mathematical reasoning and calculations"},
+    {"id": "creative", "description": "Creative writing and content generation"},
+]
+
+# 构建提示
+prompt = f"Query: {user_query}\nPolicies: {policies}\nBest policy:"
+```
+
+---
+
+### 1.3 其他方案
+
+#### RoRF (Not Diamond)
+- **类型**: Random Forest 分类器
+- **特点**: Pairwise 路由决策
+- **状态**: 开源，但文档较少
+
+#### LLMRouter (UIUC)
+- **项目**: https://github.com/ulab-uiuc/LLMRouter
+- **特点**: 智能路由系统
+- **状态**: 部分开源，细节待验证
+
+---
+
+## 2. 方案对比总表
+
+| 维度 | RouteLLM BERT | Arch-Router 1.5B | RoRF |
+|------|---------------|-------------------|------|
+| **模型类型** | BERT Classifier | Generative LM | Random Forest |
+| **参数量** | 110M | 1.5B | - |
+| **推理延迟** | 1-5ms | 50-100ms | - |
+| **准确率** | 85-92% | 93% | - |
+| **支持模型数** | 2 (强/弱) | 动态添加 | 多模型 |
+| **训练需求** | 需针对模型对微调 | 无需重新训练 | 需训练 |
+| **硬件要求** | CPU 即可 | 需要 GPU | CPU |
+| **开源程度** | 完全开源 | 模型开源 | 开源 |
+| **社区活跃度** | 高 (LMSYS) | 中 (新兴) | 低 |
+| **生产验证** | 已验证 | 较少 | 未知 |
+
+---
+
+## 3. 推荐方案
+
+### 3.1 短期推荐: RouteLLM BERT
+
+**适用场景**
+- 需要快速替换现有规则路由
+- 资源受限（CPU 部署）
+- 对延迟敏感（<10ms）
+- 二分类路由足够（强/弱模型）
+
+**实施步骤**
+1. 安装依赖: `pip install transformers torch`
+2. 加载预训练模型
+3. 替换现有 `select_model_by_length()` 函数
+4. A/B 测试验证效果
+
+**预期收益**
+- 准确率从规则路由的 ~70% 提升到 85-92%
+- 成本降低 50-85%
+- 延迟增加 <5ms
+
+### 3.2 中期备选: Arch-Router 1.5B
+
+**适用场景**
+- 需要多模型路由（>2个模型）
+- 有 GPU 资源
+- 重视人类偏好对齐
+- 需要灵活的策略定义
+
+**实施步骤**
+1. 评估延迟是否可接受
+2. 在业务数据上测试准确率
+3. 设计自然语言路由策略
+4. 渐进式替换
+
+### 3.3 长期方向: 自定义训练
+
+**建议路径**
+```
+Phase 1 (现在): 集成 RouteLLM BERT
+    ↓
+Phase 2 (1月后): 收集业务数据，评估效果
+    ↓
+Phase 3 (3月后): 基于业务数据微调 BERT
+    ↓
+Phase 4 (6月后): 训练专用路由模型
+```
+
+---
+
+## 4. 与 tx402.ai 技术对比
+
+| 技术点 | tx402.ai (商业) | RouteLLM BERT (开源) | 差距分析 |
+|--------|----------------|---------------------|---------|
+| **分类器** | BERT + 多臂老虎机 | BERT Classifier | 缺少在线学习 |
+| **延迟** | 3ms (分类) + 5-10ms (路由) | 1-5ms | ✅ 更优 |
+| **准确率** | ~90% | 85-92% | ✅ 相当 |
+| **成本降低** | 70%+ | 85% | ✅ 更优 |
+| **模型覆盖** | 40+ | 2 (强/弱) | ⚠️ 需扩展 |
+| **在线学习** | 支持 | 不支持 | ⚠️ 需实现 |
+| **语义缓存** | 支持 | 不支持 | ⚠️ 需实现 |
+
+**关键差距**
+1. **在线学习**: tx402 使用多臂老虎机动态优化，开源方案需要自行实现
+2. **多模型支持**: 开源 BERT 仅支持二分类，需要扩展支持多模型
+3. **语义缓存**: tx402 的缓存技术未在开源方案中体现
+
+---
+
+## 5. 实施建议
+
+### 5.1 最小可行方案 (MVP)
+
+**目标**: 用 RouteLLM BERT 替换现有 token 长度路由
+
+**改动范围**
+```python
+# 当前实现
+def select_model_by_length(messages):
+    token_count = estimate_tokens(messages)
+    if token_count < 100:
+        return "qwen-flash"
+    elif token_count < 500:
+        return "qwen-plus"
+    else:
+        return "qwen-max"
+
+# 新实现
+def select_model_by_bert(query: str) -> str:
+    prediction = bert_router.predict(query)
+    return "qwen-max" if prediction == "strong" else "qwen-flash"
+```
+
+**验证标准**
+- [ ] 短查询正确路由到 qwen-flash
+- [ ] 复杂查询正确路由到 qwen-max
+- [ ] 延迟增加 <5ms
+- [ ] 准确率 >85%
+
+### 5.2 扩展方案 (Advanced)
+
+**添加多臂老虎机在线学习**
+```python
+class ThompsonSamplingRouter:
+    """结合 BERT 预测 + 多臂老虎机优化"""
+    
+    def __init__(self):
+        self.bert = BERTRouter()
+        self.bandit = ThompsonSampling(n_models=3)
+    
+    def route(self, query: str) -> str:
+        # BERT 提供先验
+        bert_prediction = self.bert.predict(query)
+        
+        # 老虎机动态调整
+        model = self.bandit.select(bert_prediction)
+        return model
+    
+    def update(self, model: str, reward: float):
+        # 根据实际效果更新
+        self.bandit.update(model, reward)
+```
+
+---
+
+## 6. 参考文献
+
+### 学术论文
+1. **RouteLLM**: Ong et al. "RouteLLM: Learning to Route LLMs with Preference Data". arXiv:2406.18665, 2024.
+2. **Arch-Router**: Tran et al. "Arch-Router: Aligning LLM Routing with Human Preferences". arXiv:2506.16655, 2025.
+3. **RouterArena**: Lu et al. "RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers". arXiv:2510.00202, 2025.
+4. **RouterBench**: Hu et al. "RouterBench: A Benchmark for Multi-LLM Routing System". ICML 2024.
+
+### 开源项目
+- RouteLLM: https://github.com/lm-sys/RouteLLM
+- Arch-Router: https://huggingface.co/katanemo/Arch-Router-1.5B
+- LLMRouter: https://github.com/ulab-uiuc/LLMRouter
+- Awesome AI Model Routing: https://github.com/Not-Diamond/awesome-ai-model-routing
+
+### 相关调研
+- X402 生态竞品技术架构深度调研 (本文档同目录)
+
+---
+
+## 7. 附录
+
+### A. 模型下载命令
+
+```bash
+# RouteLLM BERT
+huggingface-cli download lm-sys/routellm-bert
+
+# Arch-Router 1.5B
+huggingface-cli download katanemo/Arch-Router-1.5B
+```
+
+### B. 快速测试脚本
+
+```python
+# test_router.py
+import time
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+tokenizer = AutoTokenizer.from_pretrained("lm-sys/routellm-bert")
+model = AutoModelForSequenceClassification.from_pretrained("lm-sys/routellm-bert")
+
+test_queries = [
+    "你好",  # 简单
+    "解释量子计算",  # 中等
+    "用 Python 实现一个分布式事务协调器",  # 复杂
+]
+
+for query in test_queries:
+    start = time.time()
+    inputs = tokenizer(query, return_tensors="pt", truncation=True)
+    outputs = model(**inputs)
+    prediction = outputs.logits.argmax(dim=-1).item()
+    latency = (time.time() - start) * 1000
+    
+    print(f"Query: {query}")
+    print(f"Prediction: {'strong' if prediction == 1 else 'weak'}")
+    print(f"Latency: {latency:.2f}ms\n")
+```
+
+---
+
+**报告结束**
+
+> 本报告基于 arXiv 论文、GitHub 开源项目和技术博客整理。  
+> 数据截至 2026-04-17。  
+> 如需更新或补充，请参考原始文献。