私有知识库搭建整理

NoobSir / 2023-08-27 / 原文

  • 资源: 从零搭建基于本地大语言模型构建的私有知识库系统(下)软件篇_哔哩哔哩_bilibili

一. 私有知识库选型:

  • 主程序 + 前端:
    • TestGeneration-WebUI: oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, llama.cpp (ggml), Llama models. (github.com)
    • Langchain-Chatchat: chatchat-space/Langchain-Chatchat: Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM) QA app with langchain | 基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答 (github.com)
    • Langchan + Gradio UI 开发
  • 向量数据库:
    • PGvector
    • Milvus
    • Faiss
  • 模型与知识库
    • ChatGLM2-6B-32K THUDM/chatglm2-6b · Hugging Face THUDM/chatglm2-6b-32k · Hugging Face
    • LLaMA2 meta-llama/Llama-2-7b · Hugging Face
    • LLM + 微调
    • 本地知识库

二. 安装笔记

  • 下载资源:

    git clone https://huggingface.co/THUDM/chatglm2-6b-32k
    git clone https://huggingface.co/moka-ai/m3e-base
    git clone https://github.com/chatchat-space/Langchain-Chatchat.git 
    cd Langchain-Chatchat
    
  • conda环境

    conda create -n chatchat python=3.10
    conda activate chatchat
    pip install --upgrade pip
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    # pip install -pre torch torchvision torchautio --index-url https://download.pytorch.org/whl/nightly/cu121
    conda install spacy
    pip install cchardet
    pip install accelerate
    
  • chatchat构建

    pip install -r requirements.txt
    cd configs
    cp ./model_config.py.example ./model_config.py
    # embedding_model_dict 中
    # "m3e-base":"D:\Files\projects\chatchat\models\m3e-base"
    # llm_model_dict 中
    # "local_model_path":"D:\Files\projects\chatchat\models\chatglm2-6b-32k"
    cp ./server_config.py.example ./ server_config.py
    
  • 向量数据库配置

    git clone --branch v0.4.4 https://github.com/pgvector/pgvector.git
    cd pgvector
    
    # Postgresql + PGVector
    # https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
    # 下载并安装Postgresql15
    # cmd中执行以下代码
    set PGROOT=C:\Program Files\PostgreSQL\15
    call "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
    nmake /F MakeFile.win
    nmake /F MakeFile.win install
    
    -- .\psql.exe --username=postgres 登录root账户
    CREATE DATABASE TEST;
    CREATE EXTENSION IF NOT EXISTS vector;
    
    python -m spacy download en_core_web_sm
    python -m spacy download zh_core_web_sm
    pip install psycopg2 pgvector flask-mysqldb protobuf==3.20 filemagic
    pip install -r requirements.txt
    # pgvector报错处理:
    # 错误: KeyError: 'answer'错误
    # Langchain-Chatchat/server/knowledge_base/km_service/base.py
    # 119行: docs = self.do_search(query, top_k, embeddings)
    python init_database.py
    # python init_database.py --recreate-vs
    
  • 启动运行:

    python startup.py --all-webui