
https://bambooai.org
BambooAI 是一个开源库,它利用大语言模型 (LLMs) 实现基于自然语言的数据分析。它既可用于本地数据集,也能从外部数据源和 API 获取数据。
BambooAI 是一个实验性工具,它通过允许用户使用自然语言对话与数据交互,使数据分析变得更加容易。它旨在:
创建一个机器学习模型来预测泰坦尼克号乘客生存率的演示:
https://github.com/user-attachments/assets/59ef810c-80d8-4ef1-8edf-82ba64178b85
各种体育数据分析查询的示例:
https://github.com/user-attachments/assets/7b9c9cd6-56e3-46ee-a6c6-c32324a0c5ef
pip install bambooai
或者克隆仓库并安装依赖:
git clone https://github.com/pgalko/BambooAI.git
pip install -r requirements.txt
安装 BambooAI:
bash
pip install bambooai
配置环境:
bash
cp .env.example .env
# 使用你的设置编辑 .env 文件
配置智能体/模型:
bash
cp LLM_CONFIG_sample.json LLM_CONFIG.json
# 使用你期望的智能体、模型和参数组合编辑 LLM_CONFIG.json
运行:
```python
import pandas as pd
from bambooai import BambooAI
import plotly.io as pio
pio.renderers.default = 'jupyterlab'
df = pd.read_csv('titanic.csv')
bamboo = BambooAI(df=df, planning=True, vector_db=False, search_tool=True)
bamboo.pd_agent_converse()
```
BambooAI 通过六个关键步骤运行:
初始化
- 以用户问题或提示启动
- 在对话循环中持续进行直到退出
任务路由
- 使用 LLM 对问题进行分类
- 路由到适当的处理程序(文本响应或代码生成)
用户反馈
- 如果指令模糊或不清晰,模型将暂停并向用户寻求反馈
- 如果模型在解决过程中遇到任何歧义,它将暂停并提供几个选项以寻求方向
动态提示构建
- 评估数据需求
- 如果需要更多上下文,则请求反馈或使用工具
- 制定分析计划
- 执行语义搜索以查找类似问题
- 使用选定的 LLM 生成代码
调试与执行
- 执行生成的代码
- 使用基于 LLM 的纠正处理错误
- 重试直到成功或达到限制
结果与知识库
- 对答案质量进行排序
- 将高质量解决方案存储在向量数据库中
- 呈现格式化的结果或可视化图表

BambooAI 接受以下初始化参数:
bamboo = BambooAI(
df=None, # 要分析的 DataFrame
auxiliary_datasets=None, # 辅助数据集路径列表
max_conversations=4, # 内存中保留的对话对数量
search_tool=False, # 启用互联网搜索功能
planning=False, # 启用用于复杂任务的规划智能体
webui=False, # 以 Web 应用模式运行
vector_db=False, # 启用用于知识存储的向量数据库
df_ontology=False, # 使用自定义数据框本体
exploratory=True, # 启用专家选择以处理查询
custom_prompt_file=None # 启用自定义/修改的提示模板
)
df (pd.DataFrame, 可选)如果未提供,BambooAI 将尝试从互联网或辅助数据源获取数据
auxiliary_datasets (列表, 默认=None)
用于补充主数据框
max_conversations (整数, 默认=4)
影响上下文窗口和令牌使用量
search_tool (布尔值, 默认=False)
启用时需要相应的 API 密钥
planning (布尔值, 默认=False)
提高复杂查询的解决方案质量
webui (布尔值, 默认=False)
使用 Flask API 提供 Web 界面
vector_db (布尔值, 默认=False)
支持两种嵌入模型 text-embedding-3-small(OpenAI) 和 all-MiniLM-L6-v2(HF)
df_ontology (字符串, 默认=None)
.ttl 文件。参数接受 TTL 文件的路径。显著提高解决方案质量
exploratory (布尔值, 默认=True)
在研究专家和数据分析师角色之间进行选择
custom_prompt_file (字符串, 默认=None)
BambooAI 使用多智能体系统,其中不同的专业智能体处理数据分析过程的特定方面。每个智能体可以根据其特定需求配置为使用不同的 LLM 模型和参数。
LLM 配置存储在 LLM_CONFIG.json 中。以下是完整的配置结构:
{
"agent_configs": [
{"agent": "Expert Selector", "details": {"model": "gpt-4.1", "provider":"openai","max_tokens": 2000, "temperature": 0}},
{"agent": "Analyst Selector", "details": {"model": "claude-3-7-sonnet-20250219", "provider":"anthropic","max_tokens": 2000, "temperature": 0}},
{"agent": "Theorist", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Dataframe Inspector", "details": {"model": "gemini-2.0-flash", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Planner", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Code Generator", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
{"agent": "Error Corrector", "details": {"model": "claude-3-5-sonnet-20241022", "provider":"anthropic","max_tokens": 8000, "temperature": 0}},
{"agent": "Reviewer", "details": {"model": "gemini-2.5-pro-preview-03-25", "provider":"gemini","max_tokens": 8000, "temperature": 0}},
{"agent": "Solution Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Google Search Executor", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}},
{"agent": "Google Search Summarizer", "details": {"model": "gemini-2.5-flash-preview-04-17", "provider":"gemini","max_tokens": 4000, "temperature": 0}}
],
"model_properties": {
"gpt-4o": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.0025, "completion_tokens": 0.010},
"gpt-4.1": {"capability":"base","multimodal":"true", "templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.008},
"gpt-4o-mini": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.00015, "completion_tokens": 0.0006},
"gpt-4.1-mini": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.0004, "completion_tokens": 0.0016},
"o1-mini": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.003, "completion_tokens": 0.012},
"o3-mini": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.0011, "completion_tokens": 0.0044},
"o1": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.015, "completion_tokens": 0.06},
"gemini-2.0-flash": {"capability":"base", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.0001, "completion_tokens": 0.0004},
"gemini-2.5-flash-preview-04-17": {"capability":"reasoning", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.00015, "completion_tokens": 0.0035},
"gemini-2.0-flash-thinking-exp-01-21": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.0, "completion_tokens": 0.0},
"gemini-2.5-pro-exp-03-25": {"capability":"reasoning", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.0, "completion_tokens": 0.0},
"gemini-2.5-pro-preview-03-25": {"capability":"reasoning", "multimodal":"true","templ_formating":"text", "prompt_tokens": 0.00125, "completion_tokens": 0.01},
"claude-3-5-haiku-20241022": {"capability":"base", "multimodal":"true","templ_formating":"xml", "prompt_tokens": 0.0008, "completion_tokens": 0.004},
"claude-3-5-sonnet-20241022": {"capability":"base", "multimodal":"true","templ_formating":"xml", "prompt_tokens": 0.003, "completion_tokens": 0.015},
"claude-3-7-sonnet-20250219": {"capability":"base", "multimodal":"true","templ_formating":"xml", "prompt_tokens": 0.003, "completion_tokens": 0.015},
"open-mixtral-8x7b": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.0007, "completion_tokens": 0.0007},
"mistral-small-latest": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.003},
"codestral-latest": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.001, "completion_tokens": 0.003},
"open-mixtral-8x22b": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.002, "completion_tokens": 0.006},
"mistral-large-2407": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.003, "completion_tokens": 0.009},
"deepseek-chat": {"capability":"base", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00014, "completion_tokens": 0.00028},
"deepseek-reasoner": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00055, "completion_tokens": 0.00219},
"/mnt/c/Users/pgalk/vllm/models/DeepSeek-R1-Distill-Qwen-14B": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"deepseek-r1-distill-llama-70b": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"deepseek-r1:32b": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00},
"deepseek-ai/deepseek-r1": {"capability":"reasoning", "multimodal":"false","templ_formating":"text", "prompt_tokens": 0.00, "completion_tokens": 0.00}
}
}
LLM_CONFIG.json 配置文件需要位于 BambooAI 的工作目录中,例如 /Users/palogalko/AI_Experiments/Bamboo_AI/web_app/LLM_CONFIG.json,并且指定模型的所有 API 密钥也需要存在于工作目录中的 .env 文件中。
根据我们截至 2025 年 4 月 22 日使用体育和性能数据集的测试,上述智能体/模型组合是性能最佳的。强烈建议你尝试这些设置,以找到最适合你特定用例的组合。