Ollama+LlamaIndex 实战：从零开始构建LLM 决策路由

28.7的博客2026-02-222026-02-22

LLM 驱动的路由器模块构建教程

LLM 驱动的路由器模块就像 AI 系统的调度员——它的核心作用是基于对用户问题的语义理解，把问题精准分发到最适合的“处理单元”（比如专属知识库、工具、函数、子模块），这是传统路由方式无法实现的核心能力。

一、核心价值：为什么要用 LLM 做路由？

1.1 解决传统路由方式的核心痛点

在没有 LLM 路由器的情况下，多数据源/多工具的 AI 系统只有两种低效选择：

单一处理单元兜底：用通用知识库回答所有问题（如用苹果知识库回答橙子问题），结果答非所问、信息不准确；
全量处理单元遍历：让所有数据源/工具都处理一遍问题，算力/时间成本高，还需额外筛选无关信息。

1.2 LLM 路由器的核心优势

优势维度	传统路由（关键词/if-else）	LLM 路由
语义理解	仅识别表面文字，无意图识别	理解真实意图，精准匹配
语言适配	仅处理标准化问题	兼容模糊/口语化/不规范提问
成本效率	全量遍历，无效调用多	精准筛选，响应快、成本低
可扩展性	新增单元需手动改规则，维护成本指数级上升	仅更新选项列表，自动适配
意图路由	仅基于表层文字路由	基于潜在意图精准路由

典型示例对比

传统路由：用户问“哪种水果的皮能做陈皮？”，仅识别“皮”“陈皮”，无法匹配到柑橘类知识库；
LLM 路由：理解“陈皮由柑橘果皮制成”，精准路由到橙子相关处理单元。

1.3 实际应用场景

LLM 路由器是复杂 AI 系统的“刚需模块”，典型场景包括：

企业知识库问答：路由到产品/售后/财务等专属知识库；
AI Agent 系统：决定调用计算器/天气 API/文档检索工具；
多模型协作：简单问题路由到轻量模型（省成本），复杂问题路由到高性能模型（保效果）。

二、项目实战：从零构建 LLM 路由器

2.1 环境准备与依赖安装

首先安装所需依赖（本次选用 Ollama 作为本地 LLM 调度引擎）：

1	pip install llama-index-readers-file pymupdf llama-index-core llama-index-llms-ollama llama-index-embeddings-ollama

2.2 基础版：生成结构化路由器提示词

核心思路：定义路由选项，通过提示词让 LLM 仅返回选项编号，实现基础路由决策。

from llama_index.core import PromptTemplate
from llama_index.llms.ollama import Ollama 

# 定义路由选项
choices = [
    "Useful for questions related to apples",
    "Useful for questions related to oranges",
]

# 格式化选项为字符串（带编号）
def get_choice_str(choices):
    choices_str = "\n\n".join(
        [f"{idx+1}. {c}" for idx, c in enumerate(choices)]
    )
    return choices_str

choices_str = get_choice_str(choices)

# 构建严格的路由提示词（仅返回数字）
router_prompt = PromptTemplate(
    """
    请根据以下问题，选择最合适的选项编号（仅返回数字）：  
    问题：{query_str} 
    选项：
    {choices_str}
    """
)

# 初始化 Ollama LLM
llm = Ollama(model="qwen2.5:7b", request_timeout=60.0)  
# 测试查询
query = "What color are oranges?" 

# 填充提示词并调用 LLM
filled_prompt = router_prompt.format(
    query_str=query,
    choices_str=choices_str
)
response = llm.complete(filled_prompt)
print("LLM 选择的选项编号：", response.text.strip())

基础版路由效果

2.3 进阶版：约束 LLM 输出格式（JSON 解析）

核心解决：LLM 可能输出多余解释性文字，导致解析失败。通过 JSON 结构化输出 + 解析器，确保输出可控。

from llama_index.core import PromptTemplate
from llama_index.llms.ollama import Ollama 
from llama_index.core.types import BaseOutputParser 
import json  

choices = [
    "Useful for questions related to apples",
    "Useful for questions related to oranges",
]

def get_choice_str(choices):
    choices_str = "\n\n".join(
        [f"{idx+1}. {c}" for idx, c in enumerate(choices)]
    )
    return choices_str

choices_str = get_choice_str(choices)

# 升级提示词：要求仅返回 JSON 字符串
router_prompt = PromptTemplate(
    """
    请根据以下问题，选择最合适的选项，并严格按照指定格式返回结果（仅返回JSON字符串，无其他内容）：  
    问题：{query_str} 
    选项：
    {choices_str}
    
    输出格式（JSON）：
    {"choice": 选中的数字（如1/2）}
    """
)

# 自定义输出解析器
class SimpleRouterParser(BaseOutputParser):
    def __init__(self, choices: list):
        self.choices = choices

    def parse(self, output: str) -> int:
        try:
            # 尝试解析 JSON 格式
            output_dict = json.loads(output.strip())
            print(f"LLM 输出 JSON：{output_dict}")
            choice_num = int(output_dict["choice"]) 
            
            # 校验选项范围
            if choice_num < 1 or choice_num > len(self.choices):
                raise ValueError(f"选项编号{choice_num}超出范围（1-{len(self.choices)}）")
            
            return choice_num
        
        # 兼容纯数字输出的降级处理
        except (json.JSONDecodeError, KeyError, ValueError) as e:
            try:
                choice_num = int(output.strip())
                if choice_num < 1 or choice_num > len(self.choices):
                    raise ValueError(f"选项编号{choice_num}超出范围（1-{len(self.choices)}）")
                return choice_num
            except:
                raise RuntimeError(f"解析失败：{str(e)}，原始输出：{output}")

# 初始化 LLM 并执行路由
llm = Ollama(model="qwen2.5:7b", request_timeout=60.0)  
query = "What color are oranges?" 

filled_prompt = router_prompt.format(
    query_str=query,
    choices_str=choices_str
)
response = llm.complete(filled_prompt)

# 解析输出
parser = SimpleRouterParser(choices=choices)
try:
    choice_num = parser.parse(response.text)
    print("LLM 选择的选项编号：", choice_num)
except RuntimeError as e:
    print("解析失败：", e)

进阶版路由效果

2.4 增强版：导入 Answer 类封装结果

核心改进：封装路由结果（选择编号 + 选择理由），让路由决策可解释、可追溯。

from dataclasses import dataclass
from llama_index.core import PromptTemplate
from llama_index.llms.ollama import Ollama

# 定义 Answer 数据类：封装选择结果和理由
@dataclass
class Answer:
    choice: int   # 选中的选项编号
    reason: str   # 选择理由

# 路由选项
route_choices = [
    "Useful for questions related to apples",
    "Useful for questions related to oranges",
]

# 格式化选项
def format_choices(choices):
    choices_str = "\n\n".join([f"{idx+1}. {c}" for idx, c in enumerate(choices)])
    return choices_str

# 构建带格式要求的提示词
router_prompt = PromptTemplate(
    """
    你的任务是根据用户问题，选择最匹配的选项，并按指定格式返回结果。
    
    【用户问题】：{query_str}
    【可选选项】：
    {choices_str}
    
    【输出要求】：
    1. 第一行仅返回选中的选项编号（数字，如1、2）；
    2. 第二行开始返回选择该选项的理由（需清晰说明匹配逻辑）；
    3. 严格遵守格式，不要添加额外内容。
    """
)

# 初始化 LLM
llm = Ollama(model="qwen2.5:7b", request_timeout=60.0)  

# 核心路由函数
def run_llm_router(query: str, choices: list) -> Answer:
    """
    执行LLM路由，返回包含选择结果和理由的Answer对象
    :param query: 用户问题字符串
    :param choices: 路由选项列表
    :return: 封装好的Answer对象
    """
    formatted_choices = format_choices(choices)
    filled_prompt = router_prompt.format(
        query_str=query,
        choices_str=formatted_choices
    )
    response = llm.complete(filled_prompt)
    response_text = response.text.strip()
    
    try:
        # 分割编号和理由
        lines = response_text.split("\n", 1) 
        choice_num = int(lines[0].strip()) 
        reason = lines[1].strip() if len(lines) > 1 else "未提供理由"
        
        # 校验选项范围
        if choice_num < 1 or choice_num > len(choices):
            raise ValueError(f"选择的编号{choice_num}超出选项范围（1-{len(choices)}）")
        
        return Answer(choice=choice_num, reason=reason)
    
    except (ValueError, IndexError) as e:
        raise RuntimeError(f"解析LLM路由结果失败：{str(e)}，原始响应：{response_text}")

# 测试执行
if __name__ == "__main__":
    test_query = "What color are oranges?"  
    try:
        answer = run_llm_router(test_query, route_choices)
        print("=== LLM路由结果 ===")
        print(f"选中选项编号：{answer.choice}")
        print(f"选择理由：{answer.reason}")
    except RuntimeError as e:
        print("路由执行失败：", e)

增强版路由效果

2.5 最终版：构建路由器查询引擎

核心目标：整合本地 RAG 索引（VectorStoreIndex 用于事实查询、SummaryIndex 用于总结查询），实现端到端的智能路由查询。

第一步：准备测试文档（flag.txt）

### 28.7Blog的网址域名为https://blog.kong.college，其站长大学专业是网络安全，在课后学习了LLM，以及工作流，并打算进行自动化渗透流开发
***
### 临时碎片
- s9k2pql&@￥
- 7xntf∽∝≠
```markdown
[占位链接](https://)
**粗体占位** __下划线占位__
> 无意义引用行
> 零散字符拼接：j8d7gh2m
***
| 空表格1 | 空表格2 | 空表格3 |
| ------ | ------ | ------ |
|        |        |        |
|        |        |        |

### 未整理片段
* d5f8g9j2k
* 碎瓷片-松针-石砾
***
> 随机字符流：p3r7y9bn2v cxz8kjhgf dsawq12367
# 空命令行占位
ls -l 
cd / 
*  * 半角空格占位 ▏▎▍▌▋▊▉ 方块符号占位
### 临时标记
- 9s6a8z7x
- 无逻辑短句：风过窗台纸卷边
***
{
  "key1": "",
  "key2": []
}
> 重复字符：rrrrrrrrr ttttttttt yyyyyyyyy
- 7894561230
- 符号拼接：!@#$%^&*()_+-=[]{};':",./<>?
***
### 空标题
#### 
##### 
纯文本代码块占位，无任何实际意义，仅作为干扰字符填充，搭配md标签形成无逻辑的视觉干扰，穿插零散数字与字母组合，让文本结构杂乱无章，同时保留核心信息的完整性，仅在周边增加无关联的md格式内容与随机字符，满足500字左右的干扰文本需求，此处继续补充零散字符：m2n3b4v5c6x7z8a9s0d1f2g3h4j5k6l7，直至干扰内容字数达标，无任何实际语义与逻辑关联，仅为填充干扰所用。

第二步：完整代码实现

from dataclasses import dataclass
from llama_index.core import PromptTemplate, Settings 
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding 
from llama_index.core.types import BaseOutputParser
import json 
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.query_engine import CustomQueryEngine 
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.base.response.schema import Response 
from pydantic import Field

# 全局配置：初始化嵌入模型和LLM
Settings.embed_model = OllamaEmbedding(
    model_name="nomic-embed-text:latest",
    base_url="http://localhost:11434",
    request_timeout=360.0
)

Settings.llm = Ollama(
    model="qwen2.5:7b", 
    base_url="http://localhost:11434",
    request_timeout=360.0,
    additional_kwargs={"temperature": 0.0, "num_ctx": 2048},
)

# 加载本地文档
loader = PyMuPDFReader()
pdf_path = Path("D:\\LlamaIndex\\LLmaRouter\\data\\flag.txt")
if not pdf_path.exists():
    raise FileNotFoundError(f"文件不存在：{pdf_path.absolute()}")

documents = loader.load(file_path=str(pdf_path))
splitter = SentenceSplitter(chunk_size=1024)  # 文本分割器

# 构建两种索引
vector_index = VectorStoreIndex.from_documents(
    documents, transformations=[splitter], embed_model=Settings.embed_model
)  # 向量索引：用于事实性查询
summary_index = SummaryIndex.from_documents(
    documents, transformations=[splitter]
)  # 摘要索引：用于总结性查询

# 定义 Answer 数据类
@dataclass
class Answer:
    choice: int  # 选中的索引编号
    reason: str  # 选择理由

# 路由选项（索引映射）
route_choices = [
    'Useful for specific/factual questions (e.g., "What is Llama2\'s context window?") → use VectorStoreIndex',
    'Useful for summary/general questions (e.g., "Summarize Llama2\'s key features") → use SummaryIndex',
]

# 格式化选项
def format_choices(choices):
    choices_str = "\n\n".join([f"{idx+1}. {c}" for idx, c in enumerate(choices)])
    print(f"路由选项：{choices_str}")
    return choices_str

# JSON 输出格式模板
JSON_FORMAT_STR = """
The output should be formatted as a JSON instance that conforms to the JSON schema below.
Do NOT add any extra text, explanation, or comments—only output the JSON string.

JSON Schema:
{
  "type": "object",
  "properties": {
    "choice": {"type": "integer"},
    "reason": {"type": "string"}
  },
  "required": ["choice", "reason"],
  "additionalProperties": false
}
"""

# 自定义 JSON 解析器
class RouterOutputParser(BaseOutputParser):
    def parse(self, output: str) -> Answer:
        try:
            output_dict = json.loads(output.strip())
            print(f"LLM路由原始输出：{output_dict}")
            
            # 类型校验
            if not isinstance(output_dict.get("choice"), int):
                raise ValueError(f"choice必须是整数，实际是{type(output_dict.get('choice'))}")
            if not isinstance(output_dict.get("reason"), str):
                raise ValueError(f"reason必须是字符串，实际是{type(output_dict.get('reason'))}")
            
            # 范围校验
            choice_num = output_dict["choice"]
            if choice_num < 1 or choice_num > len(route_choices):
                raise ValueError(f"选择的编号{choice_num}超出选项范围（1-{len(route_choices)}）")
            
            return Answer(choice=choice_num, reason=output_dict["reason"])
        
        except (json.JSONDecodeError, KeyError, ValueError) as e:
            raise RuntimeError(f"解析LLM输出失败：{str(e)}，原始响应：{output}")

# 自定义路由查询引擎
class CustomRouterQueryEngine(CustomQueryEngine):
    vector_index: VectorStoreIndex = Field(description="向量索引，用于事实性查询")
    summary_index: SummaryIndex = Field(description="摘要索引，用于总结性查询")
    llm: Ollama = Field(description="Ollama LLM实例")
    parser: RouterOutputParser = Field(description="路由输出解析器")
    route_choices: list = Field(description="路由选项列表")
    
    def custom_query(self, query_str: str) -> Response:
        try:
            # 1. 格式化路由选项
            formatted_choices = format_choices(self.route_choices)
            
            # 2. 构建并填充提示词
            router_prompt = PromptTemplate(
                f"""
                你的任务是根据用户问题类型，选择最匹配的索引，并严格按照指定JSON格式返回结果。
                
                【用户问题】：{{query_str}}
                【可选索引】：
                {formatted_choices}
                
                {JSON_FORMAT_STR}
                """
            )   
            filled_prompt = router_prompt.format(query_str=query_str)
            
            # 3. 调用 LLM 做路由决策
            llm_response = self.llm.complete(filled_prompt)
            route_answer = self.parser.parse(llm_response.text.strip())
            
            print(f"\n=== 路由决策 ===")
            print(f"选中索引编号：{route_answer.choice}")
            print(f"路由理由：{route_answer.reason}")
            
            # 4. 根据路由结果选择对应索引执行查询
            if route_answer.choice == 1:
                query_engine = self.vector_index.as_query_engine(llm=Settings.llm)
                used_index = "VectorStoreIndex（事实查询）"
            elif route_answer.choice == 2:
                summarizer = TreeSummarize(llm=Settings.llm)
                query_engine = self.summary_index.as_query_engine(
                    response_synthesizer=summarizer,
                    llm=Settings.llm
                )
                used_index = "SummaryIndex（总结查询）"
            else:
                raise ValueError(f"无效的索引编号：{route_answer.choice}")
            
            # 5. 执行查询并返回结果
            final_response = query_engine.query(query_str)
            print(f"\n=== 查询执行 ===")
            print(f"使用索引：{used_index}")
            
            return Response(
                response=str(final_response),
                metadata={
                    "route_choice": route_answer.choice,
                    "route_reason": route_answer.reason,
                    "used_index": used_index,
                    "error": False
                }
            )
        
        except Exception as e:
            error_msg = f"路由查询失败：{str(e)}"
            print(error_msg)
            return Response(
                response=error_msg,
                metadata={"error": True, "error_detail": str(e)}
            )

# 主函数：测试路由查询引擎
if __name__ == "__main__":
    router_parser = RouterOutputParser()
    router_query_engine = CustomRouterQueryEngine(
        vector_index=vector_index,
        summary_index=summary_index,
        llm=Settings.llm,  
        parser=router_parser,
        route_choices=route_choices
    )
    
    # 测试查询
    test_queries = [
        "28.7Blog其网址域名是多少？他的大学是什么专业？目前他学习那个方向？",  
    ]
    
    for idx, test_query in enumerate(test_queries, 1):
        print(f"\n===================== 测试查询 {idx} =====================")
        print(f"查询问题：{test_query}")
        response = router_query_engine.query(test_query)
        
        print(f"\n=== 最终回答 ===")
        print(response.response)
        
        print(f"\n=== 路由元信息 ===")
        print(f"选中索引：{response.metadata.get('used_index')}")
        print(f"路由理由：{response.metadata.get('route_reason')}")
        print(f"是否出错：{response.metadata.get('error')}")