Ollama单机多(无)卡部署DeepSeek蒸馏版构建本地知识库

发表于 2025-04-22 更新于 2025-10-31 阅读次数：本文字数： 4.9k 阅读时长 ≈ 18 分钟

本文介绍通过Ollama部署DeepSeek蒸馏版，并通过FastGPT构建本地知识库的详细步骤，适合个人私有化部署体验。

环境信息

配置	操作系统	IP地址
2核8G无GPU卡	CentOS7.5	10.211.55.5

安装Ollama

说明
Ollama从0.5.13版本开始，需要较新版本的glibc，但是升级glibc操作复杂，风险较大，因此如果没有特殊要求建议安装0.5.12及以下版本。官方已经有人提了相关issue，看后续是否会有优化

在线安装

1 2	yum install pciutils -y curl -fsSL https://ollama.com/install.sh \| sh

说明
如果需要安装指定版本，可参考以下命令

1 2	yum install pciutils -y curl -fsSL https://ollama.com/install.sh \| OLLAMA_VERSION=0.5.12 sh

离线安装

到https://github.com/ollama/ollama/releases下载对应版本的压缩包，上传到/root并执行以下命令解压

1 2	cd /root/ sudo tar -C /usr -xzf ollama-linux-amd64.tgz

创建Ollama用户和组

1 2	sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama sudo usermod -a -G ollama $(whoami)

创建Service文件

cat > /etc/systemd/system/ollama.service << EOF
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"

[Install]
WantedBy=multi-user.target
EOF

启动Ollama并设置开机自启

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

配置Ollama环境变量

Ollama默认监听本机地址127.0.0.1，这意味着服务只能在本机访问，如果需要从外部网络访问，可以通过配置Ollama环境变量OLLAMA_HOST为0.0.0.0，让服务监听所有可用的网络接口。以下是一些常用的环境变量配置与解析

环境变量	功能说明	示例
OLLAMA_HOST	设置API服务监听地址与端口，`0.0.0.0`表示允许所有IP访问	`0.0.0.0:11434`
OLLAMA_ORIGINS	允许跨域请求的域名列表，`*`为通配符	`*`
OLLAMA_MODELS	自定义模型存储路径，避免占用系统盘空间	`/usr/share/ollama/`
OLLAMA_KEEP_ALIVE	控制模型在内存中的保留时间，减少重复加载开销	`24h`（24小时）
OLLAMA_DEBUG	启用调试日志，排查服务异常	`1`（开启）
OLLAMA_FLASH_ATTENTION	启用Flash Attention	`1`（开启）
OLLAMA_NUM_PARALLEL	并行处理请求数，提升高并发场景下的吞吐量	`2`
OLLAMA_GPU_OVERHEAD	扩展显存不足时，利用RAM/VRAM混合加载大模型（需手动计算显存值）	`81920000000`（80GB）

mkdir -p /etc/systemd/system/ollama.service.d

cat > /etc/systemd/system/ollama.service.d/override.conf << EOF
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_NUM_PARALLEL=2"
EOF

sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama

加载DeepSeek大模型

说明
本文以1.5B模型为例，你可以根据自己的硬件资源情况选择加载其他参数量的模型

在线拉取

1 2	ollama pull deepseek-r1:1.5b ollama run deepseek-r1:1.5b

离线导入

到魔搭社区搜索并下载模型的GGUF文件，上传到/root

创建Modelfile

说明
FROM后面的路径修改为GGUF文件的实际路径
以下Modelfile文件内容是通过ollama show --modelfile deepseek-r1:1.5b查询在线拉取的模型的Modelfile文件内容修改得到的

cat > /root/Modelfile << EOF
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM deepseek-r1:1.5b

FROM /root/DeepSeek-R1-Distill-Qwen-1.5B-Q4_K_M.gguf
TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range \$i, \$_ := .Messages }}
{{- \$last := eq (len (slice $.Messages \$i)) 1}}
{{- if eq .Role "user" }}<｜User｜>{{ .Content }}
{{- else if eq .Role "assistant" }}<｜Assistant｜>{{ .Content }}{{- if not \$last }}<｜end▁of▁sentence｜>{{- end }}
{{- end }}
{{- if and \$last (ne .Role "assistant") }}<｜Assistant｜>{{- end }}
{{- end }}"""
PARAMETER stop <｜begin▁of▁sentence｜>
PARAMETER stop <｜end▁of▁sentence｜>
PARAMETER stop <｜User｜>
PARAMETER stop <｜Assistant｜>
LICENSE """MIT License

Copyright (c) 2023 DeepSeek

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
"""
EOF

导入模型

1 2	ollama create deepseek-r1:1.5b -f /root/Modelfile ollama run deepseek-r1:1.5b

安装Docker

二进制包下载地址：https://download.docker.com/linux/static/stable/

到对应平台的目录下载所需版本的Docker二进制包，并上传到/root目录下（本文以x86平台下的28.0.4为例），然后执行以下命令安装Docker

# 解压并拷贝二进制文件到对应目录下
cd /root/
tar zxf docker-28.0.4.tgz
chmod 755 docker/*
cp -a docker/* /usr/bin/

# 创建Docker的Service文件
cat > /usr/lib/systemd/system/docker.service << EOF
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service containerd.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP \$MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
StartLimitBurst=3
StartLimitInterval=60s
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
EOF

# 配置Cgroup驱动程序和镜像加速器
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
  "registry-mirrors": ["https://lerc8rqe.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

# 设置Docker开机自启并启动
systemctl daemon-reload
systemctl start docker
systemctl enable docker
systemctl status docker

安装Docker Compose

说明
FastGPT建议Docker Compose版本最好在2.17以上

二进制包下载地址：https://github.com/docker/compose/releases

下载所需版本的Docker Compose二进制包，并上传到/root目录下（本文以x86平台下的v2.34.0为例），然后执行以下命令安装Docker Compose

mv docker-compose-linux-x86_64  docker-compose
mkdir -p /usr/libexec/docker/cli-plugins
cp -a docker-compose  /usr/libexec/docker/cli-plugins/
chmod +x /usr/libexec/docker/cli-plugins/docker-compose 
docker compose

安装FastGPT

环境	最低配置（单节点）	推荐配置
测试（可以把计算进程设置少一些）	2c4g	2c8g
100w 组向量	4c8g 50GB	4c16g 50GB
500w 组向量	8c32g 200GB	16c64g 200GB

环境	最低配置（单节点）	推荐配置
测试	2c8g	4c16g

FastGPT安装

项目地址：https://github.com/labring/FastGPT
开发与部署指南：https://doc.tryfastgpt.ai/docs/development/docker/
Release包下载链接：https://github.com/labring/FastGPT/releases

说明
本文介绍的方法会安装操作时的最新版本（本文发表时最新版本为4.9.6），如果你需要安装其他版本可以到https://github.com/labring/FastGPT/releases下载对应版本的源码包进行安装。配置文件config.json在projects/app/data/目录下，各版本docker-compose.yml模板文件在deploy/docker/目录下

创建FastGPT安装目录

1
2
3

cd /root/
# mkdir -p FastGPT-<verson>
mkdir -p FastGPT

准备配置文件和对应版本的docker-compose.yml文件

cd FastGPT
curl -O https://raw.githubusercontent.com/labring/FastGPT/main/projects/app/data/config.json
# pgvector 版本(测试推荐，简单快捷)
curl -o docker-compose.yml https://raw.githubusercontent.com/labring/FastGPT/main/deploy/docker/docker-compose-pgvector.yml
# milvus 版本
# curl -o docker-compose.yml https://raw.githubusercontent.com/labring/FastGPT/main/deploy/docker/docker-compose-milvus.yml
# zilliz 版本
# curl -o docker-compose.yml https://raw.githubusercontent.com/labring/FastGPT/main/deploy/docker/docker-compose-zilliz.yml
# oceanbase 版本（需要将init.sql和docker-compose.yml放在同一个文件夹，方便挂载）
# curl -o docker-compose.yml https://raw.githubusercontent.com/labring/FastGPT/main/deploy/docker/docker-compose-oceanbase/docker-compose.yml
# curl -o init.sql https://raw.githubusercontent.com/labring/FastGPT/main/deploy/docker/docker-compose-oceanbase/init.sql

编辑docker-compose.yml文件，将镜像地址修改为阿里云镜像地址

说明
从FastGPT 4.8.23版本开始，引入AI Proxy来进一步方便模型的配置，并且从FastGPT 4.9.0版本开始作为默认的接入方式。

...
    # image: pgvector/pgvector:0.8.0-pg15 # docker hub
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.8.0-pg15 # 阿里云
    # image: mongo:5.0.18 # dockerhub
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18 # 阿里云
    # image: mongo:4.4.29 # cpu不支持AVX时候使用
    image: redis:7.2-alpine
    # image: ghcr.io/labring/fastgpt-sandbox:v4.9.6 # git
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:v4.9.6 # 阿里云
    # image: ghcr.io/labring/fastgpt-mcp_server:v4.9.6 # git
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-mcp_server:v4.9.6 # 阿里云
    # image: ghcr.io/labring/fastgpt:v4.9.6 # git
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.9.6 # 阿里云
    # image: ghcr.io/labring/aiproxy:v0.1.7
    image: registry.cn-hangzhou.aliyuncs.com/labring/aiproxy:v0.1.7 # 阿里云
    # image: pgvector/pgvector:0.8.0-pg15 # docker hub
    image: registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.8.0-pg15 # 阿里云
...

在docker-compose.yml同级目录下执行以下命令启动容器

1	docker compose up -d

通过http://ip:3000直接访问(注意开放防火墙)。登录用户名为root，密码为docker-compose.yml环境变量里设置的DEFAULT_ROOT_PSW，默认为1234。

配置FastGPT

说明
系统至少需要一个语言模型和一个索引模型才能正常使用。

配置语言模型

选择“账号”-“模型提供商”-“模型渠道”，进入渠道配置页面，点击右上角的“新增渠道”

以添加刚刚安装的deepseek-r1:1.5b模型为例，配置如下图

渠道名：展示在外部的渠道名称，仅作标识;
厂商：模型对应的厂商，不同厂商对应不同的默认地址和API密钥格式;
模型：当前渠道具体可以使用的模型，系统内置了主流的一些模型，如果下拉框中没有想要的选项，可以点击“新增模型”，增加自定义模型;
模型映射：将FastGPT请求的模型，映射到具体提供的模型上，详细说明可点击⍰查看；
代理地址：具体请求的地址，系统给每个主流渠道配置了默认的地址，如果无需改动则不用;
API密钥：从模型厂商处获取的API凭证。注意部分厂商需要提供多个密钥组合，可以根据提示进行输入；

以下为新增deepseek-r1:1.5b自定义模型的配置截图

新增完成后就能在“模型渠道”下看到刚刚配置的渠道

点击“...”-“模型测试”，可以对渠道中的所有模型进行批量测试，确保配置的模型有效

测试完成后会输出每个模型的测试结果以及请求时长

部署并配置索引模型

部署索引模型

FastGPT默认使用了openai的embedding向量模型，私有化部署的话，可以使用M3E向量模型进行替换。M3E向量模型属于小模型，资源使用不高，CPU也可以运行。下面教程是基于 “睡大觉” 同学提供的一个镜像。

镜像名：stawky/m3e-large-api:latest
国内镜像：registry.cn-hangzhou.aliyuncs.com/fastgpt_docker/m3e-large-api:latest

说明
端口号：6008
环境变量：sk-key默认值：sk-aaabbbcccdddeeefffggghhhiiijjjkkk，也可以自定义并通过Docker环境变量引入。

执行以下命令运行M3E向量模型

1	docker run -d --name m3e --restart=always -p 6008:6008 -e sk-key="sk-aaabbbcccdddeeefffggghhhiiijjjkkk" registry.cn-hangzhou.aliyuncs.com/fastgpt_docker/m3e-large-api

测试API接口是否正常

说明
<host>修改为主机IP
<sk-key>修改为引入的sk-key值

curl --location --request POST 'http://<host>:6008/v1/embeddings' \
--header 'Authorization: Bearer <sk-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "model": "m3e",
  "input": ["laf是什么"]
}'