Ollama+LlamaIndex 实战：打造具备 RAG 增强能力的智能渗透测试 Agent

28.7的博客2026-01-112026-01-11

LLM 工具调用实践指南

目的

核心技术价值，是通过LLM 工具调用 + RAG 检索增强 + 多轮对话记忆，搭建一个可自主决策、精准执行的智能 Agent；而我做这一切的最终目的，是借助 LLM 的博闻强理解特征，打破传统漏洞检索依赖人工规则、效率低的局限，构建一套能智能匹配漏洞知识库、调用扫描工具、记忆上下文需求的智能漏洞检索系统，当然作为前置知识，这是我们必须的第一步

概述

本教程旨在学习最简单的 LLM 工具调用，让 AI 从只会纸上谈兵进化到拥有”脑子和手”，其中大模型充当大脑，我们定义的代码充当手。

掌握 LLM 调用自定义工具的基础逻辑（为调用漏洞扫描工具打基础）
集成 RAG 检索增强，实现对漏洞知识库的精准查询（替代人工翻查漏洞文档）
支持多轮对话记忆，让 Agent 能记住用户的漏洞检索上下文（如先问漏洞影响，再问修复方案，无需重复输入）
最终落地：将技术能力迁移到漏洞检索场景，构建一个 “理解需求→调用工具→整合结果” 的智能 Agent

术语介绍

这是一份快速指南，介绍了在构建 LLM 应用时会频繁遇到的高级概念。

大语言模型 (LLMs)

LLMs 是 LlamaIndex 诞生的根本创新。它们是一种人工智能 (AI) 计算机系统，能够理解、生成和处理自然语言，包括根据其训练数据或在查询时提供给它们的数据回答问题。

代理应用

当 LLM 在应用程序中使用时，它通常用于做出决策、采取行动和/或与世界交互。这是代理应用的核心定义。

尽管代理应用的定义很广泛，但有几个关键特征：

LLM 增强：LLM 通过工具（即代码中任意可调用的函数）、内存和/或动态提示进行增强。
提示链：使用多个相互构建的 LLM 调用，一个 LLM 调用的输出用作下一个调用的输入。
路由：LLM 用于将应用程序路由到应用程序中的下一个适当的步骤或状态。
并行性：应用程序可以并行执行多个步骤或操作。
编排：使用 LLM 的层级结构来编排较低级别的操作和 LLM。
反思：LLM 用于反思和验证前一步骤或 LLM 调用的输出，这可以用来指导应用程序进入下一个适当的步骤或状态。

代理

我们将代理定义为”代理应用”的一个具体实例。代理是一种软件，通过将 LLMs 与其他工具和内存结合，在推理循环中自主地执行任务，该循环决定接下来使用哪个工具（如果需要）。

这在实践中意味着：

代理接收用户消息
代理使用 LLM，结合先前的聊天历史、工具和最新的用户消息来确定要采取的下一个适当行动
代理可能会调用一个或多个工具来协助处理用户的请求
如果使用了工具，代理将解释工具输出并用其指导下一个行动
一旦代理停止采取行动，它会将最终输出返回给用户

检索增强生成 (RAG)

检索增强生成 (RAG) 是使用 LlamaIndex 构建数据支持 LLM 应用的核心技术。它通过在查询时将您的私有数据提供给 LLM，而不是在您的数据上训练 LLM，从而使 LLMs 能够回答关于您私有数据的问题。为了避免每次都将所有数据发送给 LLM，RAG 会索引您的数据，并仅选择性地将相关部分与您的查询一起发送。

基础 Demo

模型	用途
llama3.1	单模型

本次使用 llama 作为 LLM 构建的基础框架。

框架图

graph TD 
A[工具注册：把add/multiply封装成LLM能识别的描述] --> 
B[用户提问：7+8和9*3] 
B --> C[LLM解析：需要调用add(7,8)和multiply(9,3)]
C --> D[框架执行：调用本地add/multiply函数]
D --> E[LLM整合：把函数返回值转成自然语言回复] 
E --> F[输出结果：7+8=15，9*3=27]

第一步：处理提示词以及 Ollama 部分

定义 Agent 模型：
这一步我们同时定义了四个方法，即加减乘除，这是必须的，我们需要让框架理解这些工具。

agent = FunctionAgent(
    tools=[multiply, add, subtract, divide],
    llm=Ollama(model="llama3.1", request_timeout=360.0),
    system_prompt="You are a math assistant and must prioritize using the provided tool functions (add/subtract/multiply/divide) to complete calculations. Direct calculation using your own abilities is prohibited. For each calculation request, you must call the corresponding tool function and then return a clear calculation result.",
)

框架如何理解方法

优先读取 docstring 作为工具的功能描述
如果没有 docstring，则使用函数名或自动生成描述
好的 docstring 能让 LLM 更准确地理解何时调用该工具

当开发者注册完成方法以后，Llama 优先从函数的 docstring 作为工具描述。当然如果没写 docstring 的话，通常框架会通过优先级进行定义，这也是许多框架为 Agent 开发者做的便利之一，所以一个好的 docstring 描述能让 AI 以及框架更好的理解函数。

第一阶段完整实验

在这个例子中，我们通过方法注册，让 AI 认识到我们有四种工具可选，并且让 AI 调用工具函数完成任务。

import asyncio
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.ollama import Ollama

# 让AI调用multiply方法，但是未使用历史功能
def multiply(a: float, b: float) -> float:
    """Multiply two numbers and return the product."""
    print(f"multiply方法执行({a}, {b})")
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and return the sum."""
    print(f"add方法执行({a}, {b})")
    return a + b

def subtract(a: float, b: float) -> float:
    """Subtract the second number from the first and return the difference."""
    print(f"subtract方法执行({a}, {b})")
    return a - b

def divide(a: float, b: float) -> float:
    """Divide the first number by the second and return the quotient.
    Raises ValueError if the divisor (b) is zero.
    """
    print(f"divide方法执行({a}, {b})")
    if b == 0:
        raise ValueError("除法运算中，除数不能为0！")
    return a / b

agent = FunctionAgent(
    tools=[multiply, add, subtract, divide],
    llm=Ollama(model="llama3.1", request_timeout=360.0),
    system_prompt="You are a math assistant and must prioritize using the provided tool functions (add/subtract/multiply/divide) to complete calculations. Direct calculation using your own abilities is prohibited. For each calculation request, you must call the corresponding tool function and then return a clear calculation result.",
)

async def main():
    response = await agent.run("7+8 and 9*3")
    print(str(response))

if __name__ == "__main__":
    asyncio.run(main())

添加检索模型

模型	用途
llama3.1	生成器
nomic-embed-text:latest	检索器

使用 ollama 专用检索模型 nomic-embed-text。

为什么需要专门的嵌入模型？ 因为模型本身的定位，让模型具有各种擅长的能力：

llama3.1：擅长推理、创作、解答复杂问题，适合作为生成器
nomic-embed-text：擅长文本检索，将文本转化为数学向量，用于检索、聚类、比较，适合作为检索器

目录结构：

2026/01/11  17:07    <DIR>          data
2026/01/11  20:22             1,621 llmaAgentBase.py
2026/01/11  20:46             1,994 llmaAgentContextBase.py
2026/01/11  18:00                 0 llmaAgentLocalContextBase.py

确保 data 目录下有正确的检索数据，例如：

1	作者在大学期间主修计算机科学，同时参与了校园编程竞赛，还创办了一个技术社团，课余时间自学了Python和JavaScript。

测试目标：

1	query = "作者大学学的什么专业？另外，7*8等于多少？"

Demo 2 代码：

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
import asyncio
import os

if os.name == 'nt':
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

Settings.embed_model = OllamaEmbedding(
    model_name="nomic-embed-text:latest",
    base_url="http://localhost:11434",
)

Settings.llm = Ollama(
    model="llama3.1",
    base_url="http://localhost:11434",
    request_timeout=360.0,
    verbose=True  
)

data_dir = "data"
if not os.path.exists(data_dir):
    print(f"{data_dir}directory no found")
    exit(1)

documents = SimpleDirectoryReader(data_dir).load_data()
print(documents)
print(f"加载 {len(documents)} 个文档")
if len(documents) == 0:
    print("No available information detected in the data folder")
    exit(1)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
    top_k=3,  
    verbose=True  
)

def multiply(a: float, b: float) -> float:
    """Multiply two numbers and return the product."""
    print(f"multiply方法执行({a}, {b})")
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and return the sum."""
    print(f"add方法执行({a}, {b})")
    return a + b

def subtract(a: float, b: float) -> float:
    """Subtract the second number from the first and return the difference."""
    print(f"subtract方法执行({a}, {b})")
    return a - b

def divide(a: float, b: float) -> float:
    """Divide the first number by the second and return the quotient.
    Raises ValueError if the divisor (b) is zero.
    """
    print(f"divide方法执行({a}, {b})")
    if b == 0:
        raise ValueError("除法运算中，除数不能为0！")
    return a / b

async def search_documents(query: str) -> str:
    """Search documents and return the results."""
    print(f"search_documents function called-> {query}") 
    response = await query_engine.aquery(query)
    return f"文档检索结果：{str(response)}"

agent = AgentWorkflow.from_tools_or_functions(
    [multiply, search_documents, add, subtract, divide],
    llm=Settings.llm,
    system_prompt="""
    You are a helpful assistant that can perform calculations
    and search through documents to answer questions.
    """, 
    verbose=True  
)

async def main():
    query = "作者大学学的什么专业？另外，7*8等于多少？"
    print(f"提问：{query}")
    response = await agent.run(query)
    print("\n===== 结果 =====")
    print(response)

if __name__ == "__main__":
    asyncio.run(main())

实验结果分析：

通过嵌入模型，llama3.1 总结出了第一个问题的答案
嵌入数据未影响到第二个完全不相干任务的结果
五个方法都正确被识别、理解、调用

测试边界情况：

1	query = "作者的高中母校是哪里？另外，计算8除以0"

问题发现：当模型发现需要检索一个文档中不存在的信息时，它没有选择”不调用检索”或”传入原问题”，而是可能自己生成了一个混乱的文本作为查询词，产生了幻觉。

解决方案：严格限制并更显式地提示，对于检索数据不存在的、错误的明确返回”无结果”。

优化后的提示词：

You are a rigorous document retrieval and mathematical computation assistant, and you must strictly follow the following rules: 
1. Tool usage rules:
- To answer document-related questions: only use the search_documents tool, providing precise Chinese query terms (e.g., "author's high school alma mater");
- To answer mathematical calculation questions: only use the corresponding math tool (add/subtract/multiply/divide);
2. Document retrieval rules:
- If search_documents returns "No relevant information found in the document.", reply directly with "The document does not mention relevant information" and do not fabricate or guess any content;
- Only use the information returned by search_documents to answer, and do not produce unrelated hallucinations (such as self-harm, illegal content, etc.);
3. Mathematical computation rules:
- When using the division tool, if the "divisor cannot be 0" exception is triggered, return the exception message directly without further explanation;
- All calculations must use the tool; self-calculation is prohibited;
4. Handling multiple intents:
- If the user's question contains multiple intents (e.g., "check high school alma mater and calculate 8 ÷ 0"), handle them separately by calling the corresponding tools and then integrate the results.

User's question: {input}

进一步优化：即使更换了提示词，AI 仍然可能出现幻觉，需要细化 search_documents 方法的注释，这可能与 llama3.1 版本太低对工具调用的支持不稳定有关。

记住连续对话

AgentWorkflow 也能够记住之前的消息。这些消息包含在 AgentWorkflow 的 Context 中。

测试场景：四次对话

告诉 AI 我的名字
让 AI 回复我的名字
让 AI 检索文本
让 AI 记忆第一个问题的答案

问题发现：虽然提示词让 AI 记住之前所有对话内容，但出现了边界问题。”My name is Logan” 是一个答案，而非问题，但 AI 却调用了 search_documents 方法，从而产生错误幻觉。

解决方案：明确哪些内容需要调用 search_documents 方法。

更严格的提示词：

YOU MUST OBEY THESE RULES WITH NO EXCEPTIONS (CODE-LEVEL CONSTRAINT):

1. MEMORY TASK (绝对禁止调用任何工具):
   - Questions like "My name is X", "What is my name?", "remember my name" → 
     ONLY use Context, NO tool calls (search_documents/add/multiply etc.).
   - "remember my name" is a memory request, NOT a math calculation.

2. DOCUMENT TASK (仅调用search_documents，且仅查author相关):
   - Only call when query has "author"/"作者" → e.g., "author's high school".
   - If query has no "author"/"作者", DO NOT call search_documents.

3. MATH TASK (仅调用数学工具，且参数必须是数字):
   - Only call add/subtract/multiply/divide for number calculations (e.g., 8/0).
   - NEVER pass non-numeric parameters (e.g., "Lihua") to math tools.

4. ERROR HANDLING:
   - Divide by 0 → return "【除法异常】除数不能为0，无法计算8÷0！".
   - Non-numeric math params → return "数学工具仅接受数字参数！".

数学工具参数校验优化：

def divide(a: float, b: float) -> float:
    """[Mathematical Tool] Division operation, only accepts integer/float parameters, divisor cannot be 0"""
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise TypeError(f"Division tool only accepts numeric parameters! Invalid current parameters: a={a}, b={b}")
    print(f"divide方法执行({a}, {b})")
    if b == 0:
        raise ValueError("【Division Exception】Divisor cannot be 0, cannot calculate 8÷0!")
    return a / b

最终优化版本

通过完善提示词、所有工具函数以及 docstring 描述，解决以下问题：

Context 记忆载体的定位模糊问题
Prompt 规则的一致性约束不足导致记忆任务偶尔回退到工具调用问题
检索词简略化、计算逻辑跑偏的问题

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core.workflow import Context
from llama_index.core.tools import FunctionTool  # 显式封装工具，精准控用途
import asyncio
import os

if os.name == 'nt':
    asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

# 模型配置
Settings.embed_model = OllamaEmbedding(
    model_name="nomic-embed-text:latest",
    base_url="http://localhost:11434",
)

Settings.llm = Ollama(
    model="llama3.1",
    base_url="http://localhost:11434",
    request_timeout=360.0,
    verbose=True  
)

# 文档加载
data_dir = "data"
if not os.path.exists(data_dir):
    print(f"{data_dir} directory not found")
    exit(1)

documents = SimpleDirectoryReader(data_dir).load_data()
print(f"加载 {len(documents)} 个文档")
if len(documents) == 0:
    print("No available information detected in the data folder")
    exit(1)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(top_k=3, verbose=True)

def add(a: float, b: float) -> float:
    """[Mathematical Tool] Addition operation, only accepts integer/float parameters"""
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise TypeError(f"The addition tool only accepts numeric parameters! The current parameter is invalid: a={a}, b={b}")
    print(f"add方法执行({a}, {b})")
    return a + b

def subtract(a: float, b: float) -> float:
    """[Mathematical Tool] Subtraction operation, calculate the difference of a-b, where parameter a is the minuend and b is the subtrahend."""
    print(f"subtract方法执行({a}, {b})")
    return a - b

def multiply(a: float, b: float) -> float:
    """[Mathematical Tool] Multiplication operation, calculate the product of a*b, where parameter a is the multiplicand and b is the multiplier."""
    print(f"multiply方法执行({a}, {b})")
    return a * b

def divide(a: float, b: float) -> float:
    """[Mathematical Tool] Division operation, only accepts integer/float parameters, divisor cannot be 0"""
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise TypeError(f"Division tool only accepts numeric parameters! Invalid current parameters: a={a}, b={b}")
    print(f"divide方法执行({a}, {b})")
    if b == 0:
        raise ValueError("【Division Exception】Divisor cannot be 0, cannot calculate 8÷0!")
    return a / b

async def search_documents(query: str) -> str:
    """[Document Tool] Used only for retrieving information from local documents, such as 'the author's university major.'  
    This tool must not be used for non-document questions (such as names or calculations)!  
    If there are no results, return the standardized prompt.
    """
    print(f"search_documents function called-> {query}") 
    response = await query_engine.aquery(query)
    if not str(response).strip() or "未找到" in str(response):
        return "【Document Exception】No relevant information found in the documents."
    return f"文档检索结果：{str(response)}"

# ========== 2. 显式封装工具：给LLM明确的工具用途标签 ==========
math_tools = [
    FunctionTool.from_defaults(fn=add),
    FunctionTool.from_defaults(fn=subtract),
    FunctionTool.from_defaults(fn=multiply),
    FunctionTool.from_defaults(fn=divide)
]
doc_tool = [FunctionTool.from_defaults(fn=search_documents)]

# ========== 3. 核心：Prompt 精准划分任务边界（重点！） ==========
agent = AgentWorkflow.from_tools_or_functions(
    math_tools + doc_tool,  # 数学工具放前面，优先调用
    llm=Settings.llm,
    system_prompt="""
You are a [Document Retrieval & Math Calculation] assistant with memory capabilities. Follow the clear steps below and do not make comments about rules:

1. Memory Tasks (use your brain directly, do not use any tools)
- Applicable scenarios: storing names, asking for names, instructions like "remember my name";
- Operation: read the Context (this is your memory carrier, not a tool), respond directly, do not mention "violating rules";
- Prohibited: calling any tools like search_documents/add.

2. Document Retrieval Tasks (must be precise, no shortcuts)
- Applicable scenarios: questions like "author's high school alma mater" or anything related to the author;
- Operation: the search term must **exactly match** the user's question (for example, if the user asks "Where is the author's high school alma mater," use this as the search term);
- Prohibited: simplifying the search term (e.g., just using author), or guessing if the document has no information.

3. Math Calculation Tasks (must use tools, do not calculate manually)
- Applicable scenarios: addition, subtraction, multiplication, division (e.g., 8÷0);
- Operation: strictly use the corresponding tools (for division, use divide), if the tool throws an exception, return the exception message directly;
- Prohibited: do not calculate manually and respond with "undefined."

4. Multi-Intent Questions: split according to the above three types of tasks, handle separately, and then integrate the results.
""",
    verbose=True  
)

# ========== 4. 多轮对话测试（全中文提问，减少解析混乱） ==========
async def main():
    ctx = Context(agent)  # 记忆容器

    # 第一轮：告知名字（记忆类，不调用工具）
    print("===== 第一轮对话 =====")
    query1 = "My name is Lihua"
    print(f"提问：{query1}")
    response1 = await agent.run(query1, ctx=ctx)
    print(f"回复：{response1}\n")

    # 第二轮：问名字（记忆类，不调用工具）
    print("===== 第二轮对话 =====")
    query2 = "What is my name?"
    print(f"提问：{query2}")
    response2 = await agent.run(query2, ctx=ctx)
    print(f"回复：{response2}\n")

    # 第三轮：文档+计算（调用对应工具，同时记忆名字）
    print("===== 第三轮对话 =====")
    query3 = "Where is the author's high school alma mater? Also, calculate 8 divided by 0, and remember my name."
    print(f"提问：{query3}")
    response3 = await agent.run(query3, ctx=ctx)
    print(f"回复：{response3}\n")

if __name__ == "__main__":
    asyncio.run(main())