私有知识库搭建整理
- 资源: 从零搭建基于本地大语言模型构建的私有知识库系统(下)软件篇_哔哩哔哩_bilibili
一. 私有知识库选型:
- 主程序 + 前端:
- TestGeneration-WebUI: oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (ggml), Llama models. (github.com)
- Langchain-Chatchat: chatchat-space/Langchain-Chatchat: Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM) QA app with langchain | 基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答 (github.com)
- Langchan + Gradio UI 开发
- 向量数据库:
- PGvector
- Milvus
- Faiss
- 模型与知识库
- ChatGLM2-6B-32K THUDM/chatglm2-6b · Hugging Face THUDM/chatglm2-6b-32k · Hugging Face
- LLaMA2 meta-llama/Llama-2-7b · Hugging Face
- LLM + 微调
- 本地知识库
二. 安装笔记
-
下载资源:
git clone https://huggingface.co/THUDM/chatglm2-6b-32k git clone https://huggingface.co/moka-ai/m3e-base git clone https://github.com/chatchat-space/Langchain-Chatchat.git cd Langchain-Chatchat
-
conda环境
conda create -n chatchat python=3.10 conda activate chatchat pip install --upgrade pip pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # pip install -pre torch torchvision torchautio --index-url https://download.pytorch.org/whl/nightly/cu121 conda install spacy pip install cchardet pip install accelerate
-
chatchat构建
pip install -r requirements.txt cd configs cp ./model_config.py.example ./model_config.py # embedding_model_dict 中 # "m3e-base":"D:\Files\projects\chatchat\models\m3e-base" # llm_model_dict 中 # "local_model_path":"D:\Files\projects\chatchat\models\chatglm2-6b-32k" cp ./server_config.py.example ./ server_config.py
-
向量数据库配置
git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git cd pgvector # Postgresql + PGVector # https://www.enterprisedb.com/downloads/postgres-postgresql-downloads # 下载并安装Postgresql15 # cmd中执行以下代码 set PGROOT=C:\Program Files\PostgreSQL\15 call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat" nmake /F MakeFile.win nmake /F MakeFile.win install
-- .\psql.exe --username=postgres 登录root账户 CREATE DATABASE TEST; CREATE EXTENSION IF NOT EXISTS vector;
python -m spacy download en_core_web_sm python -m spacy download zh_core_web_sm pip install psycopg2 pgvector flask-mysqldb protobuf==3.20 filemagic pip install -r requirements.txt # pgvector报错处理: # 错误: KeyError: 'answer'错误 # Langchain-Chatchat/server/knowledge_base/km_service/base.py # 119行: docs = self.do_search(query, top_k, embeddings) python init_database.py # python init_database.py --recreate-vs
-
启动运行:
python startup.py --all-webui