Ubuntu 22.04 安装 chatglm3-6B 环境

RakanLiu / 2024-11-07 / 原文

chatglm3-6B 下载

建议从 https://modelscope.cn/models/ZhipuAI/chatglm3-6b 中下载模型到本地；

其他参考文档：
https://zhipu-ai.feishu.cn/wiki/WvQbwIJ9tiPAxGk8ywDck6yfnof
https://zhipu-ai.feishu.cn/wiki/OwCDwJkKbidEL8kpfhPcTGHcnVc

环境配置

ubuntu 22.04
推理的话，我的 CPU 内存为 24GB，GPU 内存为 24GB，运行的半精度模型;
感觉 CPU 不能再少了，GPU 可以为 16GB（估算）;

conda create -n chatglm3 python=3.11
conda activate chatglm3

pip install modelscope

# pip install protobuf 'transformers>=4.30.2' cpm_kernels 'torch>=2.0' gradio mdtex2html sentencepiece accelerate

pip install protobuf 'transformers==4.41.2' cpm_kernels 'torch>=2.0' gradio mdtex2html sentencepiece accelerate

如果你不用 ipynb 的话，可以略过下面的命令

pip install --user ipykernel
python -m ipykernel install --user --name=chatglm3

环境bug修理
transformers 安装 4.41.2 版本

运行（仅推理）

from modelscope import AutoTokenizer, AutoModel

# 模型文件夹路径
model_dir = "****/ZhipuAI/chatglm3-6b"

tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()  # half()代表半精度，cuda()代表在 GPU 上运行
# model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half()
model = model.eval()

response, history = model.chat(tokenizer, "你好", history=[])
print(response)

response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)