本地化部署DeepSeek 全攻略(linux、windows、mac系统部署)_F11

一、Linux 系统部署

准备工作

硬件要求：服务器需具备充足计算资源。推荐使用 NVIDIA GPU，如 A100、V100 等，能加快模型推理速度。内存至少 32GB，存储建议采用高速固态硬盘（SSD），保障数据读写高效。
软件环境：安装 Linux 操作系统，如 Ubuntu 20.04。同时，安装 Python 3.8 及以上版本，以及相关依赖库，如 PyTorch、transformers 等。以 CUDA 11.7 为例，安装 PyTorch 的命令如下：

1	pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

安装 transformers 库：

1	pip install transformers

2.下载 DeepSeek 模型
访问 DeepSeek 官方模型下载地址，依据需求选择合适的模型版本。目前 DeepSeek 有不同参数规模的模型可选，如 DeepSeek-7B、DeepSeek-13B 等。
使用wget命令下载模型文件，示例如下：

1	wget https://download.deepseek.com/DeepSeek-7B.tar.gz

下载完成后，解压模型文件：

1	tar -zxvf DeepSeek-7B.tar.gz

3.部署步骤
创建项目目录：在本地创建新的项目目录，用于存放部署相关文件和脚本。

1 2	mkdir deepseek_deployment cd deepseek_deployment

编写推理脚本：使用 Python 编写推理脚本，如inference.py。在脚本中导入必要库，加载 DeepSeek 模型和分词器，实现推理功能。示例代码如下：

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

# 加载分词器和模型

tokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).cuda()

# 定义推理函数

def generate_text(prompt, max_length=100):

input_ids = tokenizer.encode(prompt, return_tensors='pt').cuda()

output = model.generate(input_ids, max_length=max_length, num_beams=5, early_stopping=True)

return tokenizer.decode(output[0], skip_special_tokens=True)

# 示例使用

prompt = "请介绍一下人工智能的发展趋势"

generated_text = generate_text(prompt)

print(generated_text)

请将path/to/DeepSeek-7B替换为实际的模型路径。

启动服务：若需将模型部署为服务，可使用 FastAPI 等框架。首先安装 FastAPI 和 uvicorn：

pip install fastapi uvicorn

然后编写服务脚本，如app.py：

from fastapi import FastAPI

from pydantic import BaseModel

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()

# 加载分词器和模型

tokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).cuda()

class PromptRequest(BaseModel):

prompt: str

max_length: int = 100

@app.post("/generate")

def generate_text(request: PromptRequest):

input_ids = tokenizer.encode(request.prompt, return_tensors='pt').cuda()

output = model.generate(input_ids, max_length=request.max_length, num_beams=5, early_stopping=True)

return {"generated_text": tokenizer.decode(output[0], skip_special_tokens=True)}

同样，将path/to/DeepSeek-7B替换为实际路径。
启动服务：

1	uvicorn app.py:app --host 0.0.0.0 --port 8000

二、Windows 系统部署

1.准备工作

硬件要求：与 Linux 系统类似，推荐配备 NVIDIA GPU，如 RTX 30 系列及以上，以获得较好的推理性能。内存建议 32GB 及以上，存储使用高速固态硬盘。
软件环境：安装 Python 3.8 及以上版本，可从 Python 官网下载安装包进行安装。安装时勾选 “Add Python to PATH” 选项，方便后续命令行操作。同时，安装 PyTorch 和 transformers 库。由于 Windows 下 CUDA 安装较为复杂，建议使用 conda 进行环境管理。首先安装 Anaconda，然后创建一个新的 conda 环境并安装依赖：

conda create -n deepseek_env python=3.8

conda activate deepseek_env

conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

pip install transformers

2.下载 DeepSeek 模型

访问 DeepSeek 官方模型下载地址，选择合适的模型版本。
可使用浏览器直接下载模型文件，也可以在命令行中使用wget（需提前安装）或curl工具下载。例如，使用curl下载 DeepSeek-7B 模型：

1	curl -O https://download.deepseek.com/DeepSeek-7B.tar.gz

下载完成后，解压模型文件，可使用 7-Zip 等解压工具。

3. 部署步骤

创建项目目录：在文件资源管理器中创建一个新的文件夹，例如 “deepseek_deployment”，用于存放部署相关文件。
编写推理脚本：使用文本编辑器（如 Notepad++、VS Code 等）编写 Python 推理脚本inference.py，内容与 Linux 版本类似：

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

# 加载分词器和模型

tokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda')

# 定义推理函数

def generate_text(prompt, max_length=100):

input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')

output = model.generate(input_ids, max_length=max_length, num_beams=5, early_stopping=True)

return tokenizer.decode(output[0], skip_special_tokens=True)

# 示例使用

prompt = "请介绍一下人工智能的发展趋势"

generated_text = generate_text(prompt)

print(generated_text)

请将path/to/DeepSeek-7B替换为实际的模型路径。

启动服务：若要部署为服务，同样可以使用 FastAPI 和 uvicorn。在命令行中激活 conda 环境后安装相关库：

pip install fastapi uvicorn

编写服务脚本app.py，内容与 Linux 版本类似：

from fastapi import FastAPI

from pydantic import BaseModel

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()

# 加载分词器和模型

tokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda')

class PromptRequest(BaseModel):

prompt: str

max_length: int = 100

@app.post("/generate")

def generate_text(request: PromptRequest):

input_ids = tokenizer.encode(request.prompt, return_tensors='pt').to('cuda')

output = model.generate(input_ids, max_length=request.max_length, num_beams=5, early_stopping=True)

return {"generated_text": tokenizer.decode(output[0], skip_special_tokens=True)}

将path/to/DeepSeek-7B替换为实际路径。
启动服务：

1	uvicorn app.py:app --host 0.0.0.0 --port 8000

三、Mac 系统部署

1.准备工作

硬件要求：如果是配备 M1 或 M2 芯片的 Mac，可利用其强大的计算能力进行部署。对于 Intel 芯片的 Mac，建议配备较好的显卡（如果有独立显卡）。内存至少 16GB，存储使用高速固态硬盘。
软件环境：安装 Python 3.8 及以上版本，可通过 Homebrew 安装。首先安装 Homebrew，然后安装 Python 和相关依赖库：

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

brew install python

pip install torch torchvision torchaudio

pip install transformers

如果是 M1 或 M2 芯片的 Mac，安装 PyTorch 时需注意选择适配 ARM 架构的版本：

1	pip install torch torchvision torchaudio -f https://download.pytorch.org/whl/torch_stable.html

2.下载 DeepSeek 模型

访问 DeepSeek 官方模型下载地址，选择合适的模型版本。
使用curl命令下载模型文件，例如：

1	curl -O https://download.deepseek.com/DeepSeek-7B.tar.gz

下载完成后，解压模型文件：

1	tar -zxvf DeepSeek-7B.tar.gz

3.部署步骤

创建项目目录：在终端中使用以下命令创建项目目录：

1 2	mkdir deepseek_deployment cd deepseek_deployment

编写推理脚本：使用文本编辑器（如 TextEdit、VS Code 等）编写 Python 推理脚本inference.py，内容与前面类似：

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

# 加载分词器和模型

tokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")

if torch.backends.mps.is_available():

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('mps')

else:

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda' if torch.cuda.is_available() else 'cpu')

# 定义推理函数

def generate_text(prompt, max_length=100):

input_ids = tokenizer.encode(prompt, return_tensors='pt').to(model.device)

output = model.generate(input_ids, max_length=max_length, num_beams=5, early_stopping=True)

return tokenizer.decode(output[0], skip_special_tokens=True)

# 示例使用

prompt = "请介绍一下人工智能的发展趋势"

generated_text = generate_text(prompt)

print(generated_text)

请将path/to/DeepSeek-7B替换为实际的模型路径。

启动服务：若要部署为服务，安装 FastAPI 和 uvicorn：

pip install fastapi uvicorn

编写服务脚本app.py，内容与前面类似：

from fastapi import FastAPI

from pydantic import BaseModel

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

app = FastAPI()

# 加载分词器和模型

tokenizer = AutoTokenizer.from_pretrained("path/to/DeepSeek-7B")

if torch.backends.mps.is_available():

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('mps')

else:

model = AutoModelForCausalLM.from_pretrained("path/to/DeepSeek-7B", torch_dtype=torch.float16).to('cuda' if torch.cuda.is_available() else 'cpu')

class PromptRequest(BaseModel):

prompt: str

max_length: int = 100

@app.post("/generate")

def generate_text(request: PromptRequest):

input_ids = tokenizer.encode(request.prompt, return_tensors='pt').to(model.device)

output = model.generate(input_ids, max_length=request.max_length, num_beams=5, early_stopping=True)

return {"generated_text": tokenizer.decode(output[0], skip_special_tokens=True)}

将path/to/DeepSeek-7B替换为实际路径。
启动服务：

1	uvicorn app.py:app --host 0.0.0.0 --port 8000

四、优化与注意事项

模型量化：为减少内存占用和提高推理速度，可对模型进行量化处理，如使用 INT8 量化。
安全设置：部署服务时，注意设置合理的访问权限和安全策略，防止模型被恶意调用。
性能监控：在 Linux 和 Windows 系统中，可使用 NVIDIA System Management Interface（nvidia-smi）监控 GPU 使用情况；在 Mac 系统中，对于 M1/M2 芯片，可使用top命令等监控系统资源使用情况，确保模型运行在最佳状态。