vLLM DeepSeek-V3.1 运用指南

NWI · 发表于 2025-8-25 09:23:10

vLLM DeepSeek-V3.1 使用指北

中心重心：DeepSeek-V3.1 是一个撑持思惟情势战非思惟情势切换的混淆模子，原文介绍怎样正在 vLLM 中静态切换那二种情势

简介

DeepSeek-V3.1 是一个混淆模子，撑持思惟情势（thinking mode）战非思惟情势（non-thinking mode）。原指北将具体介绍怎样正在 vLLM 中静态切换那二种情势。
装置 vLLM

起首，创立假造情况并装置 vLLM：
uv venvsource .venv/bin/activateuv pip install -U vllm --torch-backend auto

留神事变：

保证已经装置 uv 保证理器倡议使用假造情况断绝依靠

启用 DeepSeek-V3.1

正在 8xH200（或者 H20）GPU 上布置（141GB × 8）

vllm serve deepseek-ai/DeepSeek-V3.1 \ --enable-expert-parallel \ --tensor-parallel-size 8 \ --served-model-name ds31

参数分析：

--enable-expert-parallel：启动大师并止--tensor-parallel-size 8：树立弛质并止巨细为 8--served-model-name ds31：指定效劳模子称呼

模子使用办法

OpenAI 客户端示例

您能够使用 OpenAI 客户端去挪用模子，颠末 extra_body={"chat_template_kwargs": {"thinking": False}} 掌握可否启动思惟情势，此中 True 启动思惟情势，False 禁用思惟情势（非思惟情势）。
from openai import OpenAI
openai_api_key = "EMPTY"openai_api_base = "http://localhost:8000/v1"
client = OpenAI( api_key=openai_api_key, base_url=openai_api_base,)
models = client.models.list()model = models.data[0].id
messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Who are you?"}, {"role": "assistant", "content": "H妹妹 I am DeepSeek"}, {"role": "user", "content": "9.11 and 9.8, which is greater?"},]
# 树立思惟情势extra_body = {"chat_template_kwargs": {"thinking": False}}response = client.chat.completions.create( model=model, messages=messages, extra_body=extra_body)content = response.choices[0].message.contentprint("content:\n", content)输出示例

thinking=True（启动思惟情势）

当启动思惟情势时，输出成果包罗思考历程：

H妹妹, the user is asking which number is greater between 9.11 and 9.8. This seems straightforward, but I should be careful because decimals can sometimes confuse people.

I recall that comparing decimals involves looking at each digit from left to right. Both numbers have the same whole number part (9), so I need to compare the decimal parts. 0.11 is greater than 0.8 because 0.11 is equivalent to 0.110 and 0.8 is 0.800, so 110 thousandths is greater than 800 thousandths? Wait no, that's wrong.

Actually, 0.8 is the same as 0.80, and 0.11 is less than 0.80. So 9.11 is actually less than 9.8. I should double-check that. Yes, 9.8 is larger because 0.8 > 0.11.

I'll explain it clearly by comparing the tenths place: 9.8 has 8 tenths, while 9.11 has 1 tenth and 1 hundredth, so 8 tenths is indeed larger.

The answer is 9.8 is greater. I'll state it confidently and offer further help if needed. 9.8 is greater than 9.11.

To compare them:

9.8 is equivalent to 9.809.80 has 8 tenths, while 9.11 has only 1 tenthSince 8 tenths (0.8) is greater than 1 tenth (0.1), 9.8 > 9.11

Let me know if you need further clarification! 😊

thinking=False（禁用思惟情势）

当禁用思惟情势时，输出更间接：

The number 9.11 is greater than 9.8.

To compare them:

9.11 = 9 + 11/1009.8 = 9 + 80/100

Since 11/100 (0.11) is less than 80/100 (0.80), 9.11 is actually smaller than 9.8. Wait, let me correct that:

Actually, 9.8 is greater than 9.11.

9.8 can be thought of as 9.80Comparing 9.80 and 9.11: 80 hundredths is greater than 11 hundredths.

So, 9.8 > 9.11.

Apologies for the initial confusion! 😅

使用 curl 号令挪用

您也能够使用 curl 号令去挪用模子：
curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "ds31", "messages": [ { "role": "user", "content": "9.11 and 9.8, which is greater?" } ], "chat_template_kwargs": { "thinking": true } }'