ZyphrZero commited on
Commit
02896ca
·
1 Parent(s): ee092c1

✨ feat(API): implement OpenAI function call support

Browse files

- implemented full OpenAI-compatible Function Call support for chat completions
- **Tool Prompt Injection**: dynamically generates and injects tool definitions into the system prompt, enabling the model to understand and utilize available functions
- **Tool Call Extraction**: developed robust regex-based logic (`extract_tool_invocations`) to parse tool call JSON from model responses, handling both fenced code blocks and inline JSON
- **Streaming Response Handling**: enhanced `StreamResponseHandler` to buffer content and extract tool calls at the end of the stream, ensuring proper `tool_calls` and `finish_reason` in streamed chunks
- **Non-Streaming Response Handling**: updated `NonStreamResponseHandler` to process full responses, extract tool calls, and format the final JSON response according to OpenAI specifications (null content when `tool_calls` are present)
- **Message Model Enhancements**: added `tool_calls` field to `Message` and `Delta` models, and `tools`, `tool_choice` to `OpenAIRequest` to support OpenAI tool specifications
- **Error Handling**: wrapped the main `chat_completions` logic in a `try-except` block for better error management

✅ test(tooling): add weather query function call test

- added `test_weather_query.py` to demonstrate and verify the functionality of OpenAI-compatible function calls
- the test sends a chat completion request with a `get_weather` tool definition and asserts the model's response for tool invocation

Files changed (4) hide show
  1. .gitignore +23 -29
  2. README.md +215 -0
  3. main.py +483 -152
  4. tests/test_weather.py +70 -0
.gitignore CHANGED
@@ -1,3 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Byte-compiled / optimized / DLL files
2
  __pycache__/
3
  *.py[cod]
@@ -94,12 +116,6 @@ ipython_config.py
94
  # install all needed dependencies.
95
  #Pipfile.lock
96
 
97
- # UV
98
- # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99
- # This is especially recommended for binary packages to ensure reproducibility, and is more
100
- # commonly ignored for libraries.
101
- #uv.lock
102
-
103
  # poetry
104
  # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105
  # This is especially recommended for binary packages to ensure reproducibility, and is more
@@ -112,10 +128,8 @@ ipython_config.py
112
  #pdm.lock
113
  # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
114
  # in version control.
115
- # https://pdm.fming.dev/latest/usage/project/#working-with-version-control
116
  .pdm.toml
117
- .pdm-python
118
- .pdm-build/
119
 
120
  # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121
  __pypackages__/
@@ -159,23 +173,3 @@ dmypy.json
159
 
160
  # Cython debug symbols
161
  cython_debug/
162
-
163
- # PyCharm
164
- # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165
- # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166
- # and can be added to the global gitignore or merged into this file. For a more nuclear
167
- # option (not recommended) you can uncomment the following to ignore the entire idea folder.
168
- #.idea/
169
-
170
- # Ruff stuff:
171
- .ruff_cache/
172
-
173
- # PyPI configuration file
174
- .pypirc
175
-
176
- # Cursor
177
- # Cursor is an AI-powered code editor.`.cursorignore` specifies files/directories to
178
- # exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
179
- # refer to https://docs.cursor.com/context/ignore-files
180
- .cursorignore
181
- .cursorindexingignore
 
1
+ # Custom
2
+ .vs/
3
+ .vscode/
4
+ .idea/
5
+ .conda/
6
+ *.zip
7
+ *.txt
8
+ docs/
9
+ output/
10
+ main.build/
11
+ main.dist/
12
+ main.onefile-build/
13
+ *report.xml
14
+ *.yaml
15
+ logs/
16
+
17
+ # AI Toolset
18
+ .augment/
19
+ .cursor/
20
+ .claude/
21
+ CLAUDE.md
22
+
23
  # Byte-compiled / optimized / DLL files
24
  __pycache__/
25
  *.py[cod]
 
116
  # install all needed dependencies.
117
  #Pipfile.lock
118
 
 
 
 
 
 
 
119
  # poetry
120
  # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
121
  # This is especially recommended for binary packages to ensure reproducibility, and is more
 
128
  #pdm.lock
129
  # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
130
  # in version control.
131
+ # https://pdm.fming.dev/#use-with-ide
132
  .pdm.toml
 
 
133
 
134
  # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
135
  __pypackages__/
 
173
 
174
  # Cython debug symbols
175
  cython_debug/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -12,6 +12,7 @@
12
  - **多种模型支持**:支持 GLM-4.5 基础版、思考版和搜索版
13
  - **调试模式**:详细的请求/响应日志记录,便于开发调试
14
  - **CORS 支持**:内置跨域资源共享支持
 
15
 
16
  ## 使用场景
17
 
@@ -94,6 +95,219 @@
94
 
95
  注意:请将 `api_key` 替换为您在 `main.py` 中配置的 `DEFAULT_KEY` 值。
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ### 使用 Docker Compose
98
 
99
  1. 启动服务:
@@ -139,6 +353,7 @@
139
  | `DEBUG_MODE` | 调试模式开关 | `true` |
140
  | `THINK_TAGS_MODE` | 思考内容处理策略 | `think` (可选: `strip`, `raw`) |
141
  | `ANON_TOKEN_ENABLED` | 是否使用匿名 token | `true` |
 
142
 
143
  ### 思考内容处理策略说明
144
 
 
12
  - **多种模型支持**:支持 GLM-4.5 基础版、思考版和搜索版
13
  - **调试模式**:详细的请求/响应日志记录,便于开发调试
14
  - **CORS 支持**:内置跨域资源共享支持
15
+ - **Function Call 支持**:完整支持 OpenAI 格式的工具调用功能,通过智能提示注入实现,支持流式响应时的工具调用缓冲机制
16
 
17
  ## 使用场景
18
 
 
95
 
96
  注意:请将 `api_key` 替换为您在 `main.py` 中配置的 `DEFAULT_KEY` 值。
97
 
98
+ ### Function Call 使用示例
99
+
100
+ 本项目完整支持 OpenAI 格式的工具调用功能,包括流式和非流式响应。实现原理是将 OpenAI 的工具定义转换为特殊的系统提示,让模型理解并生成符合格式的工具调用。
101
+
102
+ #### 基本工具调用
103
+
104
+ ```python
105
+ import openai
106
+
107
+ # 初始化客户端
108
+ client = openai.OpenAI(
109
+ base_url="http://localhost:8080/v1",
110
+ api_key="sk-tbkFoKzk9a531YyUNNF5"
111
+ )
112
+
113
+ # 定义天气查询工具
114
+ tools = [
115
+ {
116
+ "type": "function",
117
+ "function": {
118
+ "name": "get_weather",
119
+ "description": "获取指定城市的天气信息",
120
+ "parameters": {
121
+ "type": "object",
122
+ "properties": {
123
+ "city": {
124
+ "type": "string",
125
+ "description": "城市名称"
126
+ },
127
+ "unit": {
128
+ "type": "string",
129
+ "enum": ["celsius", "fahrenheit"],
130
+ "description": "温度单位",
131
+ "default": "celsius"
132
+ }
133
+ },
134
+ "required": ["city"]
135
+ }
136
+ }
137
+ }
138
+ ]
139
+
140
+ # 使用工具调用
141
+ response = client.chat.completions.create(
142
+ model="GLM-4.5",
143
+ messages=[{"role": "user", "content": "北京今天天气怎么样?"}],
144
+ tools=tools,
145
+ tool_choice="auto"
146
+ )
147
+
148
+ message = response.choices[0].message
149
+ if message.tool_calls:
150
+ print("模型请求调用工具:")
151
+ for tool_call in message.tool_calls:
152
+ print(f"工具名称: {tool_call.function.name}")
153
+ print(f"参数: {tool_call.function.arguments}")
154
+ print(f"调用ID: {tool_call.id}")
155
+ else:
156
+ print(f"回复: {message.content}")
157
+ ```
158
+
159
+ #### 流式工具调用
160
+
161
+ ```python
162
+ # 流式工具调用示例
163
+ response = client.chat.completions.create(
164
+ model="GLM-4.5",
165
+ messages=[{"role": "user", "content": "帮我计算 2 的 10 次方"}],
166
+ tools=[{
167
+ "type": "function",
168
+ "function": {
169
+ "name": "calculate",
170
+ "description": "执行数学计算",
171
+ "parameters": {
172
+ "type": "object",
173
+ "properties": {
174
+ "expression": {
175
+ "type": "string",
176
+ "description": "数学表达式"
177
+ }
178
+ },
179
+ "required": ["expression"]
180
+ }
181
+ }
182
+ }],
183
+ stream=True
184
+ )
185
+
186
+ # 注意:工具调用模式下,流式响应会缓冲所有内容,
187
+ # 在最后一次性返回工具调用信息
188
+ tool_calls = None
189
+ content = ""
190
+
191
+ for chunk in response:
192
+ delta = chunk.choices[0].delta
193
+ if delta.tool_calls:
194
+ tool_calls = delta.tool_calls
195
+ if delta.content:
196
+ content += delta.content
197
+
198
+ if tool_calls:
199
+ print("工具调用:")
200
+ for tool_call in tool_calls:
201
+ print(f"函数: {tool_call.function.name}")
202
+ print(f"参数: {tool_call.function.arguments}")
203
+ else:
204
+ print("回复:", content)
205
+ ```
206
+
207
+ #### 强制使用特定工具
208
+
209
+ ```python
210
+ # 强制使用特定工具
211
+ response = client.chat.completions.create(
212
+ model="GLM-4.5",
213
+ messages=[{"role": "user", "content": "今天是什么日子"}],
214
+ tools=[{
215
+ "type": "function",
216
+ "function": {
217
+ "name": "get_current_date",
218
+ "description": "获取当前日期和时间",
219
+ "parameters": {
220
+ "type": "object",
221
+ "properties": {},
222
+ "required": []
223
+ }
224
+ }
225
+ }],
226
+ tool_choice={"type": "function", "function": {"name": "get_current_date"}}
227
+ )
228
+
229
+ message = response.choices[0].message
230
+ print(f"完成原因: {response.choices[0].finish_reason}") # tool_calls
231
+ if message.tool_calls:
232
+ print("工具调用结果:", message.tool_calls[0].function.arguments)
233
+ ```
234
+
235
+ #### 多工具协作
236
+
237
+ ```python
238
+ # 定义多个工具
239
+ tools = [
240
+ {
241
+ "type": "function",
242
+ "function": {
243
+ "name": "search_web",
244
+ "description": "搜索网络信息",
245
+ "parameters": {
246
+ "type": "object",
247
+ "properties": {
248
+ "query": {
249
+ "type": "string",
250
+ "description": "搜索关键词"
251
+ }
252
+ },
253
+ "required": ["query"]
254
+ }
255
+ }
256
+ },
257
+ {
258
+ "type": "function",
259
+ "function": {
260
+ "name": "summarize_text",
261
+ "description": "总结文本内容",
262
+ "parameters": {
263
+ "type": "object",
264
+ "properties": {
265
+ "text": {
266
+ "type": "string",
267
+ "description": "要总结的文本"
268
+ },
269
+ "max_length": {
270
+ "type": "integer",
271
+ "description": "最大长度",
272
+ "default": 100
273
+ }
274
+ },
275
+ "required": ["text"]
276
+ }
277
+ }
278
+ }
279
+ ]
280
+
281
+ # 使用多工具
282
+ response = client.chat.completions.create(
283
+ model="GLM-4.5",
284
+ messages=[{"role": "user", "content": "搜索一下最新的 AI 新闻并总结"}],
285
+ tools=tools,
286
+ tool_choice="auto"
287
+ )
288
+
289
+ message = response.choices[0].message
290
+ if message.tool_calls:
291
+ for tool_call in message.tool_calls:
292
+ print(f"调用工具: {tool_call.function.name}")
293
+ # 在实际应用中,这里需要执行相应的函数
294
+ # 并将结果通过工具消息返回给模型
295
+ ```
296
+
297
+ ### 运行 Function Call 演示
298
+
299
+ 项目包含一个完整的 Function Call 演示脚本:
300
+
301
+ ```bash
302
+ python function_call_demo.py
303
+ ```
304
+
305
+ 该脚本将演示:
306
+ 1. 基本的工具调用
307
+ 2. 数学计算工具
308
+ 3. 强制使用特定工具
309
+ 4. 流式工具调用响应
310
+
311
  ### 使用 Docker Compose
312
 
313
  1. 启动服务:
 
353
  | `DEBUG_MODE` | 调试模式开关 | `true` |
354
  | `THINK_TAGS_MODE` | 思考内容处理策略 | `think` (可选: `strip`, `raw`) |
355
  | `ANON_TOKEN_ENABLED` | 是否使用匿名 token | `true` |
356
+ | `FUNCTION_CALL_ENABLED` | 是否启用 Function Call 功能 | `true` |
357
 
358
  ### 思考内容处理策略说明
359
 
main.py CHANGED
@@ -24,34 +24,41 @@ from pydantic import BaseModel, Field
24
  # Configuration Constants
25
  # =============================================================================
26
 
27
- class Config:
28
- """Centralized configuration constants"""
29
 
30
  # API Configuration
31
- UPSTREAM_URL: str = "https://chat.z.ai/api/chat/completions"
32
- DEFAULT_KEY: str = "sk-tbkFoKzk9a531YyUNNF5"
33
- UPSTREAM_TOKEN: str = "eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjMxNmJjYjQ4LWZmMmYtNGExNS04NTNkLWYyYTI5YjY3ZmYwZiIsImVtYWlsIjoiR3Vlc3QtMTc1NTg0ODU4ODc4OEBndWVzdC5jb20ifQ.PktllDySS3trlyuFpTeIZf-7hl8Qu1qYF3BxjgIul0BrNux2nX9hVzIjthLXKMWAf9V0qM8Vm_iyDqkjPGsaiQ"
34
 
35
  # Model Configuration
36
- DEFAULT_MODEL_NAME: str = "GLM-4.5"
37
- THINKING_MODEL_NAME: str = "GLM-4.5-Thinking"
38
- SEARCH_MODEL_NAME: str = "GLM-4.5-Search"
39
 
40
  # Server Configuration
41
- PORT: int = 8080
42
- DEBUG_MODE: bool = True
43
 
44
  # Feature Configuration
45
- THINK_TAGS_MODE: str = "think" # strip: 去除<details>标签;think: 转为<think>标签;raw: 保留原样
46
- ANON_TOKEN_ENABLED: bool = True
 
 
47
 
48
  # Browser Headers
49
- X_FE_VERSION: str = "prod-fe-1.0.70"
50
- BROWSER_UA: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36 Edg/139.0.0.0"
51
- SEC_CH_UA: str = '"Not;A=Brand";v="99", "Microsoft Edge";v="139", "Chromium";v="139"'
52
- SEC_CH_UA_MOB: str = "?0"
53
- SEC_CH_UA_PLAT: str = '"Windows"'
54
- ORIGIN_BASE: str = "https://chat.z.ai"
 
 
 
 
 
55
 
56
 
57
  # =============================================================================
@@ -61,8 +68,9 @@ class Config:
61
  class Message(BaseModel):
62
  """Chat message model"""
63
  role: str
64
- content: str
65
  reasoning_content: Optional[str] = None
 
66
 
67
 
68
  class OpenAIRequest(BaseModel):
@@ -72,6 +80,8 @@ class OpenAIRequest(BaseModel):
72
  stream: Optional[bool] = False
73
  temperature: Optional[float] = None
74
  max_tokens: Optional[int] = None
 
 
75
 
76
 
77
  class ModelItem(BaseModel):
@@ -103,6 +113,7 @@ class Delta(BaseModel):
103
  role: Optional[str] = None
104
  content: Optional[str] = None
105
  reasoning_content: Optional[str] = None
 
106
 
107
 
108
  class Choice(BaseModel):
@@ -195,7 +206,10 @@ class SSEParser:
195
  def debug_log(self, format_str: str, *args) -> None:
196
  """Log debug message if debug mode is enabled"""
197
  if self.debug_mode:
198
- print(f"[SSE_PARSER] {format_str % args}")
 
 
 
199
 
200
  def iter_events(self) -> Generator[Dict[str, Any], None, None]:
201
  """Iterate over SSE events
@@ -307,14 +321,236 @@ class SSEParser:
307
  self.close()
308
 
309
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
310
  # =============================================================================
311
  # Utility Functions
312
  # =============================================================================
313
 
314
  def debug_log(message: str, *args) -> None:
315
  """Log debug message if debug mode is enabled"""
316
- if Config.DEBUG_MODE:
317
- print(f"[DEBUG] {message % args}")
 
 
 
318
 
319
 
320
  def generate_request_ids() -> Tuple[str, str]:
@@ -327,20 +563,10 @@ def generate_request_ids() -> Tuple[str, str]:
327
 
328
  def get_browser_headers(referer_chat_id: str = "") -> Dict[str, str]:
329
  """Get browser headers for API requests"""
330
- headers = {
331
- "Content-Type": "application/json",
332
- "Accept": "application/json, text/event-stream",
333
- "User-Agent": Config.BROWSER_UA,
334
- "Accept-Language": "zh-CN",
335
- "sec-ch-ua": Config.SEC_CH_UA,
336
- "sec-ch-ua-mobile": Config.SEC_CH_UA_MOB,
337
- "sec-ch-ua-platform": Config.SEC_CH_UA_PLAT,
338
- "X-FE-Version": Config.X_FE_VERSION,
339
- "Origin": Config.ORIGIN_BASE,
340
- }
341
 
342
  if referer_chat_id:
343
- headers["Referer"] = f"{Config.ORIGIN_BASE}/c/{referer_chat_id}"
344
 
345
  return headers
346
 
@@ -351,12 +577,12 @@ def get_anonymous_token() -> str:
351
  headers.update({
352
  "Accept": "*/*",
353
  "Accept-Language": "zh-CN,zh;q=0.9",
354
- "Referer": f"{Config.ORIGIN_BASE}/",
355
  })
356
 
357
  try:
358
  response = requests.get(
359
- f"{Config.ORIGIN_BASE}/api/v1/auths/",
360
  headers=headers,
361
  timeout=10.0
362
  )
@@ -377,7 +603,7 @@ def get_anonymous_token() -> str:
377
 
378
  def get_auth_token() -> str:
379
  """Get authentication token (anonymous or fixed)"""
380
- if Config.ANON_TOKEN_ENABLED:
381
  try:
382
  token = get_anonymous_token()
383
  debug_log(f"匿名token获取成功: {token[:10]}...")
@@ -385,7 +611,7 @@ def get_auth_token() -> str:
385
  except Exception as e:
386
  debug_log(f"匿名token获取失败,回退固定token: {e}")
387
 
388
- return Config.UPSTREAM_TOKEN
389
 
390
 
391
  def transform_thinking_content(content: str) -> str:
@@ -396,10 +622,10 @@ def transform_thinking_content(content: str) -> str:
396
  content = content.replace("</thinking>", "").replace("<Full>", "").replace("</Full>", "")
397
  content = content.strip()
398
 
399
- if Config.THINK_TAGS_MODE == "think":
400
  content = re.sub(r'<details[^>]*>', '<think>', content)
401
  content = content.replace("</details>", "</think>")
402
- elif Config.THINK_TAGS_MODE == "strip":
403
  content = re.sub(r'<details[^>]*>', '', content)
404
  content = content.replace("</details>", "")
405
 
@@ -435,7 +661,7 @@ def handle_upstream_error(error: UpstreamError) -> Generator[str, None, None]:
435
 
436
  # Send end chunk
437
  end_chunk = create_openai_response_chunk(
438
- model=Config.DEFAULT_MODEL_NAME,
439
  finish_reason="stop"
440
  )
441
  yield f"data: {end_chunk.model_dump_json()}\n\n"
@@ -451,11 +677,11 @@ def call_upstream_api(
451
  headers = get_browser_headers(chat_id)
452
  headers["Authorization"] = f"Bearer {auth_token}"
453
 
454
- debug_log(f"调用上游API: {Config.UPSTREAM_URL}")
455
  debug_log(f"上游请求体: {upstream_req.model_dump_json()}")
456
 
457
  response = requests.post(
458
- Config.UPSTREAM_URL,
459
  json=upstream_req.model_dump(exclude_none=True),
460
  headers=headers,
461
  timeout=60.0,
@@ -489,13 +715,19 @@ class ResponseHandler:
489
  def _handle_upstream_error(self, response: requests.Response) -> None:
490
  """Handle upstream error response"""
491
  debug_log(f"上游返回错误状态: {response.status_code}")
492
- if Config.DEBUG_MODE:
493
  debug_log(f"上游错误响应: {response.text}")
494
 
495
 
496
  class StreamResponseHandler(ResponseHandler):
497
  """Handler for streaming responses"""
498
 
 
 
 
 
 
 
499
  def handle(self) -> Generator[str, None, None]:
500
  """Handle streaming response"""
501
  debug_log(f"开始处理流式响应 (chat_id={self.chat_id})")
@@ -513,7 +745,7 @@ class StreamResponseHandler(ResponseHandler):
513
 
514
  # Send initial role chunk
515
  first_chunk = create_openai_response_chunk(
516
- model=Config.DEFAULT_MODEL_NAME,
517
  delta=Delta(role="assistant")
518
  )
519
  yield f"data: {first_chunk.model_dump_json()}\n\n"
@@ -522,7 +754,7 @@ class StreamResponseHandler(ResponseHandler):
522
  debug_log("开始读取上游SSE流")
523
  sent_initial_answer = False
524
 
525
- with SSEParser(response, debug_mode=Config.DEBUG_MODE) as parser:
526
  for event in parser.iter_json_data(UpstreamData):
527
  upstream_data = event['data']
528
 
@@ -566,42 +798,50 @@ class StreamResponseHandler(ResponseHandler):
566
  sent_initial_answer: bool
567
  ) -> Generator[str, None, None]:
568
  """Process content from upstream data"""
569
- # Handle initial answer content
570
- if (not sent_initial_answer and
571
- upstream_data.data.edit_content and
572
- upstream_data.data.phase == "answer"):
573
-
574
- content = self._extract_edit_content(upstream_data.data.edit_content)
575
- if content:
576
- debug_log(f"发送普通内容: {content}")
577
- chunk = create_openai_response_chunk(
578
- model=Config.DEFAULT_MODEL_NAME,
579
- delta=Delta(content=content)
580
- )
581
- yield f"data: {chunk.model_dump_json()}\n\n"
582
- sent_initial_answer = True
583
 
584
- # Handle delta content
585
- if upstream_data.data.delta_content:
586
- content = upstream_data.data.delta_content
587
-
588
- if upstream_data.data.phase == "thinking":
589
- content = transform_thinking_content(content)
590
- if content:
591
- debug_log(f"发送思考内容: {content}")
592
- chunk = create_openai_response_chunk(
593
- model=Config.DEFAULT_MODEL_NAME,
594
- delta=Delta(reasoning_content=content)
595
- )
596
- yield f"data: {chunk.model_dump_json()}\n\n"
597
- else:
 
 
 
598
  if content:
599
  debug_log(f"发送普通内容: {content}")
600
  chunk = create_openai_response_chunk(
601
- model=Config.DEFAULT_MODEL_NAME,
602
  delta=Delta(content=content)
603
  )
604
  yield f"data: {chunk.model_dump_json()}\n\n"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
605
 
606
  def _extract_edit_content(self, edit_content: str) -> str:
607
  """Extract content from edit_content field"""
@@ -610,9 +850,44 @@ class StreamResponseHandler(ResponseHandler):
610
 
611
  def _send_end_chunk(self) -> Generator[str, None, None]:
612
  """Send end chunk and DONE signal"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
613
  end_chunk = create_openai_response_chunk(
614
- model=Config.DEFAULT_MODEL_NAME,
615
- finish_reason="stop"
616
  )
617
  yield f"data: {end_chunk.model_dump_json()}\n\n"
618
  yield "data: [DONE]\n\n"
@@ -622,6 +897,10 @@ class StreamResponseHandler(ResponseHandler):
622
  class NonStreamResponseHandler(ResponseHandler):
623
  """Handler for non-streaming responses"""
624
 
 
 
 
 
625
  def handle(self) -> JSONResponse:
626
  """Handle non-streaming response"""
627
  debug_log(f"开始处理非流式响应 (chat_id={self.chat_id})")
@@ -640,7 +919,7 @@ class NonStreamResponseHandler(ResponseHandler):
640
  full_content = []
641
  debug_log("开始收集完整响应内容")
642
 
643
- with SSEParser(response, debug_mode=Config.DEBUG_MODE) as parser:
644
  for event in parser.iter_json_data(UpstreamData):
645
  upstream_data = event['data']
646
 
@@ -660,19 +939,35 @@ class NonStreamResponseHandler(ResponseHandler):
660
  final_content = "".join(full_content)
661
  debug_log(f"内容收集完成,最终长度: {len(final_content)}")
662
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
663
  # Build response
664
  response_data = OpenAIResponse(
665
  id=f"chatcmpl-{int(time.time())}",
666
  object="chat.completion",
667
  created=int(time.time()),
668
- model=Config.DEFAULT_MODEL_NAME,
669
  choices=[Choice(
670
  index=0,
671
  message=Message(
672
  role="assistant",
673
- content=final_content
 
674
  ),
675
- finish_reason="stop"
676
  )],
677
  usage=Usage()
678
  )
@@ -729,17 +1024,17 @@ async def list_models():
729
  response = ModelsResponse(
730
  data=[
731
  Model(
732
- id=Config.DEFAULT_MODEL_NAME,
733
  created=current_time,
734
  owned_by="z.ai"
735
  ),
736
  Model(
737
- id=Config.THINKING_MODEL_NAME,
738
  created=current_time,
739
  owned_by="z.ai"
740
  ),
741
  Model(
742
- id=Config.SEARCH_MODEL_NAME,
743
  created=current_time,
744
  owned_by="z.ai"
745
  ),
@@ -756,75 +1051,111 @@ async def chat_completions(
756
  """Handle chat completion requests"""
757
  debug_log("收到chat completions请求")
758
 
759
- # Validate API key
760
- if not authorization.startswith("Bearer "):
761
- debug_log("缺少或无效的Authorization头")
762
- raise HTTPException(status_code=401, detail="Missing or invalid Authorization header")
763
-
764
- api_key = authorization[7:]
765
- if api_key != Config.DEFAULT_KEY:
766
- debug_log(f"无效的API key: {api_key}")
767
- raise HTTPException(status_code=401, detail="Invalid API key")
768
-
769
- debug_log("API key验证通过")
770
- debug_log(f"请求解析成功 - 模型: {request.model}, 流式: {request.stream}, 消息数: {len(request.messages)}")
771
-
772
- # Generate IDs
773
- chat_id, msg_id = generate_request_ids()
774
-
775
- # Determine model features
776
- is_thinking = request.model == Config.THINKING_MODEL_NAME
777
- is_search = request.model == Config.SEARCH_MODEL_NAME
778
- search_mcp = "deep-web-search" if is_search else ""
779
-
780
- # Build upstream request
781
- upstream_req = UpstreamRequest(
782
- stream=True, # Always use streaming from upstream
783
- chat_id=chat_id,
784
- id=msg_id,
785
- model="0727-360B-API", # Actual upstream model ID
786
- messages=request.messages,
787
- params={},
788
- features={
789
- "enable_thinking": is_thinking,
790
- "web_search": is_search,
791
- "auto_web_search": is_search,
792
- },
793
- background_tasks={
794
- "title_generation": False,
795
- "tags_generation": False,
796
- },
797
- mcp_servers=[search_mcp] if search_mcp else [],
798
- model_item=ModelItem(
799
- id="0727-360B-API",
800
- name="GLM-4.5",
801
- owned_by="openai"
802
- ),
803
- tool_servers=[],
804
- variables={
805
- "{{USER_NAME}}": "User",
806
- "{{USER_LOCATION}}": "Unknown",
807
- "{{CURRENT_DATETIME}}": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
808
- }
809
- )
810
-
811
- # Get authentication token
812
- auth_token = get_auth_token()
813
-
814
- # Handle response based on stream flag
815
- if request.stream:
816
- handler = StreamResponseHandler(upstream_req, chat_id, auth_token)
817
- return StreamingResponse(
818
- handler.handle(),
819
- media_type="text/event-stream",
820
- headers={
821
- "Cache-Control": "no-cache",
822
- "Connection": "keep-alive",
 
 
 
 
 
 
 
823
  }
824
  )
825
- else:
826
- handler = NonStreamResponseHandler(upstream_req, chat_id, auth_token)
827
- return handler.handle()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
828
 
829
 
830
  # =============================================================================
@@ -833,4 +1164,4 @@ async def chat_completions(
833
 
834
  if __name__ == "__main__":
835
  import uvicorn
836
- uvicorn.run("main:app", host="0.0.0.0", port=Config.PORT, reload=True)
 
24
  # Configuration Constants
25
  # =============================================================================
26
 
27
+ class ServerConfig:
28
+ """Centralized server configuration"""
29
 
30
  # API Configuration
31
+ API_ENDPOINT: str = "https://chat.z.ai/api/chat/completions"
32
+ AUTH_TOKEN: str = "sk-tbkFoKzk9a531YyUNNF5"
33
+ BACKUP_TOKEN: str = "eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjMxNmJjYjQ4LWZmMmYtNGExNS04NTNkLWYyYTI5YjY3ZmYwZiIsImVtYWlsIjoiR3Vlc3QtMTc1NTg0ODU4ODc4OEBndWVzdC5jb20ifQ.PktllDySS3trlyuFpTeIZf-7hl8Qu1qYF3BxjgIul0BrNux2nX9hVzIjthLXKMWAf9V0qM8Vm_iyDqkjPGsaiQ"
34
 
35
  # Model Configuration
36
+ PRIMARY_MODEL: str = "GLM-4.5"
37
+ THINKING_MODEL: str = "GLM-4.5-Thinking"
38
+ SEARCH_MODEL: str = "GLM-4.5-Search"
39
 
40
  # Server Configuration
41
+ LISTEN_PORT: int = 8080
42
+ DEBUG_LOGGING: bool = True
43
 
44
  # Feature Configuration
45
+ THINKING_PROCESSING: str = "think" # strip: 去除<details>标签;think: 转为</think>标签;raw: 保留原样
46
+ ANONYMOUS_MODE: bool = True
47
+ TOOL_SUPPORT: bool = True
48
+ SCAN_LIMIT: int = 200000
49
 
50
  # Browser Headers
51
+ CLIENT_HEADERS: Dict[str, str] = {
52
+ "Content-Type": "application/json",
53
+ "Accept": "application/json, text/event-stream",
54
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36 Edg/139.0.0.0",
55
+ "Accept-Language": "zh-CN",
56
+ "sec-ch-ua": '"Not;A=Brand";v="99", "Microsoft Edge";v="139", "Chromium";v="139"',
57
+ "sec-ch-ua-mobile": "?0",
58
+ "sec-ch-ua-platform": '"Windows"',
59
+ "X-FE-Version": "prod-fe-1.0.70",
60
+ "Origin": "https://chat.z.ai",
61
+ }
62
 
63
 
64
  # =============================================================================
 
68
  class Message(BaseModel):
69
  """Chat message model"""
70
  role: str
71
+ content: Optional[str] = None
72
  reasoning_content: Optional[str] = None
73
+ tool_calls: Optional[List[Dict[str, Any]]] = None
74
 
75
 
76
  class OpenAIRequest(BaseModel):
 
80
  stream: Optional[bool] = False
81
  temperature: Optional[float] = None
82
  max_tokens: Optional[int] = None
83
+ tools: Optional[List[Dict[str, Any]]] = None
84
+ tool_choice: Optional[Any] = None
85
 
86
 
87
  class ModelItem(BaseModel):
 
113
  role: Optional[str] = None
114
  content: Optional[str] = None
115
  reasoning_content: Optional[str] = None
116
+ tool_calls: Optional[List[Dict[str, Any]]] = None
117
 
118
 
119
  class Choice(BaseModel):
 
206
  def debug_log(self, format_str: str, *args) -> None:
207
  """Log debug message if debug mode is enabled"""
208
  if self.debug_mode:
209
+ if args:
210
+ print(f"[SSE_PARSER] {format_str % args}")
211
+ else:
212
+ print(f"[SSE_PARSER] {format_str}")
213
 
214
  def iter_events(self) -> Generator[Dict[str, Any], None, None]:
215
  """Iterate over SSE events
 
321
  self.close()
322
 
323
 
324
+ # =============================================================================
325
+ # Function Call Utilities
326
+ # =============================================================================
327
+
328
+ def generate_tool_prompt(tools: List[Dict[str, Any]]) -> str:
329
+ """Generate tool injection prompt with enhanced formatting"""
330
+ if not tools:
331
+ return ""
332
+
333
+ tool_definitions = []
334
+ for tool in tools:
335
+ if tool.get("type") != "function":
336
+ continue
337
+
338
+ function_spec = tool.get("function", {}) or {}
339
+ function_name = function_spec.get("name", "unknown")
340
+ function_description = function_spec.get("description", "")
341
+ parameters = function_spec.get("parameters", {}) or {}
342
+
343
+ # Create structured tool definition
344
+ tool_info = [f"## {function_name}", f"**Purpose**: {function_description}"]
345
+
346
+ # Add parameter details
347
+ parameter_properties = parameters.get("properties", {}) or {}
348
+ required_parameters = set(parameters.get("required", []) or [])
349
+
350
+ if parameter_properties:
351
+ tool_info.append("**Parameters**:")
352
+ for param_name, param_details in parameter_properties.items():
353
+ param_type = (param_details or {}).get("type", "any")
354
+ param_desc = (param_details or {}).get("description", "")
355
+ requirement_flag = "**Required**" if param_name in required_parameters else "*Optional*"
356
+ tool_info.append(f"- `{param_name}` ({param_type}) - {requirement_flag}: {param_desc}")
357
+
358
+ tool_definitions.append("\n".join(tool_info))
359
+
360
+ if not tool_definitions:
361
+ return ""
362
+
363
+ # Build comprehensive tool prompt
364
+ prompt_template = (
365
+ "\n\n# AVAILABLE FUNCTIONS\n" +
366
+ "\n\n---\n".join(tool_definitions) +
367
+ "\n\n# USAGE INSTRUCTIONS\n"
368
+ "When you need to execute a function, respond ONLY with a JSON object containing tool_calls:\n"
369
+ "```json\n"
370
+ "{\n"
371
+ ' "tool_calls": [\n'
372
+ " {\n"
373
+ ' "id": "call_" + unique_id,\n'
374
+ ' "type": "function",\n'
375
+ ' "function": {\n'
376
+ ' "name": "function_name",\n'
377
+ ' "arguments": {\n'
378
+ ' "param1": "value1"\n'
379
+ ' }\n'
380
+ " }\n"
381
+ " }\n"
382
+ " ]\n"
383
+ "}\n"
384
+ "```\n"
385
+ "Important: No explanatory text before or after the JSON.\n"
386
+ )
387
+
388
+ return prompt_template
389
+
390
+
391
+ def process_messages_with_tools(
392
+ messages: List[Dict[str, Any]],
393
+ tools: Optional[List[Dict[str, Any]]] = None,
394
+ tool_choice: Optional[Any] = None
395
+ ) -> List[Dict[str, Any]]:
396
+ """Process messages and inject tool prompts"""
397
+ processed: List[Dict[str, Any]] = []
398
+
399
+ if tools and ServerConfig.TOOL_SUPPORT and (tool_choice != "none"):
400
+ tools_prompt = generate_tool_prompt(tools)
401
+ has_system = any(m.get("role") == "system" for m in messages)
402
+
403
+ if has_system:
404
+ for m in messages:
405
+ if m.get("role") == "system":
406
+ mm = dict(m)
407
+ content = mm.get("content", "")
408
+ if content is None:
409
+ content = ""
410
+ mm["content"] = content + tools_prompt
411
+ processed.append(mm)
412
+ else:
413
+ processed.append(m)
414
+ else:
415
+ processed = [{"role": "system", "content": "你是一个有用的助手。" + tools_prompt}] + messages
416
+
417
+ # Add tool choice hints
418
+ if tool_choice in ("required", "auto"):
419
+ if processed and processed[-1].get("role") == "user":
420
+ last = dict(processed[-1])
421
+ content = last.get("content", "")
422
+ if content is None:
423
+ content = ""
424
+ last["content"] = content + "\n\n请根据需要使用提供的工具函数。"
425
+ processed[-1] = last
426
+ elif isinstance(tool_choice, dict) and tool_choice.get("type") == "function":
427
+ fname = (tool_choice.get("function") or {}).get("name")
428
+ if fname and processed and processed[-1].get("role") == "user":
429
+ last = dict(processed[-1])
430
+ content = last.get("content", "")
431
+ if content is None:
432
+ content = ""
433
+ last["content"] = content + f"\n\n请使用 {fname} 函数来处理这个请求。"
434
+ processed[-1] = last
435
+ else:
436
+ processed = list(messages)
437
+
438
+ # Handle tool/function messages
439
+ final_msgs: List[Dict[str, Any]] = []
440
+ for m in processed:
441
+ role = m.get("role")
442
+ if role in ("tool", "function"):
443
+ tool_name = m.get("name", "unknown")
444
+ tool_content = m.get("content", "")
445
+ if isinstance(tool_content, dict):
446
+ tool_content = json.dumps(tool_content, ensure_ascii=False)
447
+ elif tool_content is None:
448
+ tool_content = ""
449
+
450
+ # 确保内容不为空且不包含 None
451
+ content = f"工具 {tool_name} 返回结果:\n```json\n{tool_content}\n```"
452
+ if not content.strip():
453
+ content = f"工具 {tool_name} 执行完成"
454
+
455
+ final_msgs.append({
456
+ "role": "assistant",
457
+ "content": content,
458
+ })
459
+ else:
460
+ final_msgs.append(m)
461
+
462
+ return final_msgs
463
+
464
+
465
+ # Tool Extraction Patterns
466
+ TOOL_CALL_FENCE_PATTERN = re.compile(r"```json\s*(\{.*?\})\s*```", re.DOTALL)
467
+ TOOL_CALL_INLINE_PATTERN = re.compile(r"(\{[^{}]{0,10000}\"tool_calls\".*?\})", re.DOTALL)
468
+ FUNCTION_CALL_PATTERN = re.compile(r"调用函数\s*[::]\s*([\w\-\.]+)\s*(?:参数|arguments)[::]\s*(\{.*?\})", re.DOTALL)
469
+
470
+
471
+ def extract_tool_invocations(text: str) -> Optional[List[Dict[str, Any]]]:
472
+ """Extract tool invocations from response text"""
473
+ if not text:
474
+ return None
475
+
476
+ # Limit scan size for performance
477
+ scannable_text = text[:ServerConfig.SCAN_LIMIT]
478
+
479
+ # Attempt 1: Extract from JSON code blocks
480
+ json_blocks = TOOL_CALL_FENCE_PATTERN.findall(scannable_text)
481
+ for json_block in json_blocks:
482
+ try:
483
+ parsed_data = json.loads(json_block)
484
+ tool_calls = parsed_data.get("tool_calls")
485
+ if tool_calls and isinstance(tool_calls, list):
486
+ return tool_calls
487
+ except (json.JSONDecodeError, AttributeError):
488
+ continue
489
+
490
+ # Attempt 2: Extract inline JSON objects
491
+ inline_match = TOOL_CALL_INLINE_PATTERN.search(scannable_text)
492
+ if inline_match:
493
+ try:
494
+ inline_json = inline_match.group(1)
495
+ parsed_data = json.loads(inline_json)
496
+ tool_calls = parsed_data.get("tool_calls")
497
+ if tool_calls and isinstance(tool_calls, list):
498
+ return tool_calls
499
+ except (json.JSONDecodeError, AttributeError):
500
+ pass
501
+
502
+ # Attempt 3: Parse natural language function calls
503
+ natural_lang_match = FUNCTION_CALL_PATTERN.search(scannable_text)
504
+ if natural_lang_match:
505
+ function_name = natural_lang_match.group(1).strip()
506
+ arguments_str = natural_lang_match.group(2).strip()
507
+ try:
508
+ # Validate JSON format
509
+ json.loads(arguments_str)
510
+ return [{
511
+ "id": f"invoke_{int(time.time() * 1000000)}",
512
+ "type": "function",
513
+ "function": {
514
+ "name": function_name,
515
+ "arguments": arguments_str
516
+ }
517
+ }]
518
+ except json.JSONDecodeError:
519
+ return None
520
+
521
+ return None
522
+
523
+
524
+ def remove_tool_json_content(text: str) -> str:
525
+ """Remove tool JSON content from response text"""
526
+ def remove_tool_call_block(match: re.Match) -> str:
527
+ json_content = match.group(1)
528
+ try:
529
+ parsed_data = json.loads(json_content)
530
+ if "tool_calls" in parsed_data:
531
+ return ""
532
+ except (json.JSONDecodeError, AttributeError):
533
+ pass
534
+ return match.group(0)
535
+
536
+ # Remove fenced tool JSON blocks
537
+ cleaned_text = TOOL_CALL_FENCE_PATTERN.sub(remove_tool_call_block, text)
538
+ # Remove inline tool JSON
539
+ cleaned_text = TOOL_CALL_INLINE_PATTERN.sub("", cleaned_text)
540
+ return cleaned_text.strip()
541
+
542
+
543
  # =============================================================================
544
  # Utility Functions
545
  # =============================================================================
546
 
547
  def debug_log(message: str, *args) -> None:
548
  """Log debug message if debug mode is enabled"""
549
+ if ServerConfig.DEBUG_LOGGING:
550
+ if args:
551
+ print(f"[DEBUG] {message % args}")
552
+ else:
553
+ print(f"[DEBUG] {message}")
554
 
555
 
556
  def generate_request_ids() -> Tuple[str, str]:
 
563
 
564
  def get_browser_headers(referer_chat_id: str = "") -> Dict[str, str]:
565
  """Get browser headers for API requests"""
566
+ headers = ServerConfig.CLIENT_HEADERS.copy()
 
 
 
 
 
 
 
 
 
 
567
 
568
  if referer_chat_id:
569
+ headers["Referer"] = f"{ServerConfig.CLIENT_HEADERS['Origin']}/c/{referer_chat_id}"
570
 
571
  return headers
572
 
 
577
  headers.update({
578
  "Accept": "*/*",
579
  "Accept-Language": "zh-CN,zh;q=0.9",
580
+ "Referer": f"{ServerConfig.CLIENT_HEADERS['Origin']}/",
581
  })
582
 
583
  try:
584
  response = requests.get(
585
+ f"{ServerConfig.CLIENT_HEADERS['Origin']}/api/v1/auths/",
586
  headers=headers,
587
  timeout=10.0
588
  )
 
603
 
604
  def get_auth_token() -> str:
605
  """Get authentication token (anonymous or fixed)"""
606
+ if ServerConfig.ANONYMOUS_MODE:
607
  try:
608
  token = get_anonymous_token()
609
  debug_log(f"匿名token获取成功: {token[:10]}...")
 
611
  except Exception as e:
612
  debug_log(f"匿名token获取失败,回退固定token: {e}")
613
 
614
+ return ServerConfig.BACKUP_TOKEN
615
 
616
 
617
  def transform_thinking_content(content: str) -> str:
 
622
  content = content.replace("</thinking>", "").replace("<Full>", "").replace("</Full>", "")
623
  content = content.strip()
624
 
625
+ if ServerConfig.THINKING_PROCESSING == "think":
626
  content = re.sub(r'<details[^>]*>', '<think>', content)
627
  content = content.replace("</details>", "</think>")
628
+ elif ServerConfig.THINKING_PROCESSING == "strip":
629
  content = re.sub(r'<details[^>]*>', '', content)
630
  content = content.replace("</details>", "")
631
 
 
661
 
662
  # Send end chunk
663
  end_chunk = create_openai_response_chunk(
664
+ model=ServerConfig.PRIMARY_MODEL,
665
  finish_reason="stop"
666
  )
667
  yield f"data: {end_chunk.model_dump_json()}\n\n"
 
677
  headers = get_browser_headers(chat_id)
678
  headers["Authorization"] = f"Bearer {auth_token}"
679
 
680
+ debug_log(f"调用上游API: {ServerConfig.API_ENDPOINT}")
681
  debug_log(f"上游请求体: {upstream_req.model_dump_json()}")
682
 
683
  response = requests.post(
684
+ ServerConfig.API_ENDPOINT,
685
  json=upstream_req.model_dump(exclude_none=True),
686
  headers=headers,
687
  timeout=60.0,
 
715
  def _handle_upstream_error(self, response: requests.Response) -> None:
716
  """Handle upstream error response"""
717
  debug_log(f"上游返回错误状态: {response.status_code}")
718
+ if ServerConfig.DEBUG_LOGGING:
719
  debug_log(f"上游错误响应: {response.text}")
720
 
721
 
722
  class StreamResponseHandler(ResponseHandler):
723
  """Handler for streaming responses"""
724
 
725
+ def __init__(self, upstream_req: UpstreamRequest, chat_id: str, auth_token: str, has_tools: bool = False):
726
+ super().__init__(upstream_req, chat_id, auth_token)
727
+ self.has_tools = has_tools
728
+ self.buffered_content = ""
729
+ self.tool_calls = None
730
+
731
  def handle(self) -> Generator[str, None, None]:
732
  """Handle streaming response"""
733
  debug_log(f"开始处理流式响应 (chat_id={self.chat_id})")
 
745
 
746
  # Send initial role chunk
747
  first_chunk = create_openai_response_chunk(
748
+ model=ServerConfig.PRIMARY_MODEL,
749
  delta=Delta(role="assistant")
750
  )
751
  yield f"data: {first_chunk.model_dump_json()}\n\n"
 
754
  debug_log("开始读取上游SSE流")
755
  sent_initial_answer = False
756
 
757
+ with SSEParser(response, debug_mode=ServerConfig.DEBUG_LOGGING) as parser:
758
  for event in parser.iter_json_data(UpstreamData):
759
  upstream_data = event['data']
760
 
 
798
  sent_initial_answer: bool
799
  ) -> Generator[str, None, None]:
800
  """Process content from upstream data"""
801
+ content = upstream_data.data.delta_content or upstream_data.data.edit_content
 
 
 
 
 
 
 
 
 
 
 
 
 
802
 
803
+ if not content:
804
+ return
805
+
806
+ # Transform thinking content
807
+ if upstream_data.data.phase == "thinking":
808
+ content = transform_thinking_content(content)
809
+
810
+ # Buffer content if tools are enabled
811
+ if self.has_tools:
812
+ self.buffered_content += content
813
+ else:
814
+ # Handle initial answer content
815
+ if (not sent_initial_answer and
816
+ upstream_data.data.edit_content and
817
+ upstream_data.data.phase == "answer"):
818
+
819
+ content = self._extract_edit_content(upstream_data.data.edit_content)
820
  if content:
821
  debug_log(f"发送普通内容: {content}")
822
  chunk = create_openai_response_chunk(
823
+ model=ServerConfig.PRIMARY_MODEL,
824
  delta=Delta(content=content)
825
  )
826
  yield f"data: {chunk.model_dump_json()}\n\n"
827
+ sent_initial_answer = True
828
+
829
+ # Handle delta content
830
+ if upstream_data.data.delta_content:
831
+ if content:
832
+ if upstream_data.data.phase == "thinking":
833
+ debug_log(f"发送思考内容: {content}")
834
+ chunk = create_openai_response_chunk(
835
+ model=ServerConfig.PRIMARY_MODEL,
836
+ delta=Delta(reasoning_content=content)
837
+ )
838
+ else:
839
+ debug_log(f"发送普通内容: {content}")
840
+ chunk = create_openai_response_chunk(
841
+ model=ServerConfig.PRIMARY_MODEL,
842
+ delta=Delta(content=content)
843
+ )
844
+ yield f"data: {chunk.model_dump_json()}\n\n"
845
 
846
  def _extract_edit_content(self, edit_content: str) -> str:
847
  """Extract content from edit_content field"""
 
850
 
851
  def _send_end_chunk(self) -> Generator[str, None, None]:
852
  """Send end chunk and DONE signal"""
853
+ if self.has_tools:
854
+ # Try to extract tool calls from buffered content
855
+ self.tool_calls = extract_tool_invocations(self.buffered_content)
856
+
857
+ if self.tool_calls:
858
+ # Send tool calls
859
+ tool_calls_list = []
860
+ for i, tc in enumerate(self.tool_calls):
861
+ tool_calls_list.append({
862
+ "index": i,
863
+ "id": tc.get("id"),
864
+ "type": tc.get("type", "function"),
865
+ "function": tc.get("function", {}),
866
+ })
867
+
868
+ out_chunk = create_openai_response_chunk(
869
+ model=ServerConfig.PRIMARY_MODEL,
870
+ delta=Delta(tool_calls=tool_calls_list)
871
+ )
872
+ yield f"data: {out_chunk.model_dump_json()}\n\n"
873
+ finish_reason = "tool_calls"
874
+ else:
875
+ # Send regular content
876
+ trimmed_content = remove_tool_json_content(self.buffered_content)
877
+ if trimmed_content:
878
+ content_chunk = create_openai_response_chunk(
879
+ model=ServerConfig.PRIMARY_MODEL,
880
+ delta=Delta(content=trimmed_content)
881
+ )
882
+ yield f"data: {content_chunk.model_dump_json()}\n\n"
883
+ finish_reason = "stop"
884
+ else:
885
+ finish_reason = "stop"
886
+
887
+ # Send final chunk
888
  end_chunk = create_openai_response_chunk(
889
+ model=ServerConfig.PRIMARY_MODEL,
890
+ finish_reason=finish_reason
891
  )
892
  yield f"data: {end_chunk.model_dump_json()}\n\n"
893
  yield "data: [DONE]\n\n"
 
897
  class NonStreamResponseHandler(ResponseHandler):
898
  """Handler for non-streaming responses"""
899
 
900
+ def __init__(self, upstream_req: UpstreamRequest, chat_id: str, auth_token: str, has_tools: bool = False):
901
+ super().__init__(upstream_req, chat_id, auth_token)
902
+ self.has_tools = has_tools
903
+
904
  def handle(self) -> JSONResponse:
905
  """Handle non-streaming response"""
906
  debug_log(f"开始处理非流式响应 (chat_id={self.chat_id})")
 
919
  full_content = []
920
  debug_log("开始收集完整响应内容")
921
 
922
+ with SSEParser(response, debug_mode=ServerConfig.DEBUG_LOGGING) as parser:
923
  for event in parser.iter_json_data(UpstreamData):
924
  upstream_data = event['data']
925
 
 
939
  final_content = "".join(full_content)
940
  debug_log(f"内容收集完成,最终长度: {len(final_content)}")
941
 
942
+ # Handle tool calls for non-streaming
943
+ tool_calls = None
944
+ finish_reason = "stop"
945
+ message_content = final_content
946
+
947
+ if self.has_tools:
948
+ tool_calls = extract_tool_invocations(final_content)
949
+ if tool_calls:
950
+ # Content must be null when tool_calls are present (OpenAI spec)
951
+ message_content = None
952
+ finish_reason = "tool_calls"
953
+ else:
954
+ # Remove tool JSON from content
955
+ message_content = remove_tool_json_content(final_content)
956
+
957
  # Build response
958
  response_data = OpenAIResponse(
959
  id=f"chatcmpl-{int(time.time())}",
960
  object="chat.completion",
961
  created=int(time.time()),
962
+ model=ServerConfig.PRIMARY_MODEL,
963
  choices=[Choice(
964
  index=0,
965
  message=Message(
966
  role="assistant",
967
+ content=message_content,
968
+ tool_calls=tool_calls
969
  ),
970
+ finish_reason=finish_reason
971
  )],
972
  usage=Usage()
973
  )
 
1024
  response = ModelsResponse(
1025
  data=[
1026
  Model(
1027
+ id=ServerConfig.PRIMARY_MODEL,
1028
  created=current_time,
1029
  owned_by="z.ai"
1030
  ),
1031
  Model(
1032
+ id=ServerConfig.THINKING_MODEL,
1033
  created=current_time,
1034
  owned_by="z.ai"
1035
  ),
1036
  Model(
1037
+ id=ServerConfig.SEARCH_MODEL,
1038
  created=current_time,
1039
  owned_by="z.ai"
1040
  ),
 
1051
  """Handle chat completion requests"""
1052
  debug_log("收到chat completions请求")
1053
 
1054
+ try:
1055
+ # Validate API key
1056
+ if not authorization.startswith("Bearer "):
1057
+ debug_log("缺少或无效的Authorization")
1058
+ raise HTTPException(status_code=401, detail="Missing or invalid Authorization header")
1059
+
1060
+ api_key = authorization[7:]
1061
+ if api_key != ServerConfig.AUTH_TOKEN:
1062
+ debug_log(f"无效的API key: {api_key}")
1063
+ raise HTTPException(status_code=401, detail="Invalid API key")
1064
+
1065
+ debug_log("API key验证通过")
1066
+ debug_log(f"请求解析成功 - 模型: {request.model}, 流式: {request.stream}, 消息数: {len(request.messages)}")
1067
+
1068
+ # Generate IDs
1069
+ chat_id, msg_id = generate_request_ids()
1070
+
1071
+ # Process messages with tools
1072
+ processed_messages = process_messages_with_tools(
1073
+ [m.model_dump() for m in request.messages],
1074
+ request.tools,
1075
+ request.tool_choice
1076
+ )
1077
+
1078
+ # Convert back to Message objects
1079
+ upstream_messages = []
1080
+ for msg in processed_messages:
1081
+ content = msg.get("content")
1082
+ # Ensure content is not None for Message model
1083
+ if content is None:
1084
+ content = ""
1085
+
1086
+ upstream_messages.append(Message(
1087
+ role=msg["role"],
1088
+ content=content,
1089
+ reasoning_content=msg.get("reasoning_content")
1090
+ ))
1091
+
1092
+ # Determine model features
1093
+ is_thinking = request.model == ServerConfig.THINKING_MODEL
1094
+ is_search = request.model == ServerConfig.SEARCH_MODEL
1095
+ search_mcp = "deep-web-search" if is_search else ""
1096
+
1097
+ # Build upstream request
1098
+ upstream_req = UpstreamRequest(
1099
+ stream=True, # Always use streaming from upstream
1100
+ chat_id=chat_id,
1101
+ id=msg_id,
1102
+ model="0727-360B-API", # Actual upstream model ID
1103
+ messages=upstream_messages,
1104
+ params={},
1105
+ features={
1106
+ "enable_thinking": is_thinking,
1107
+ "web_search": is_search,
1108
+ "auto_web_search": is_search,
1109
+ },
1110
+ background_tasks={
1111
+ "title_generation": False,
1112
+ "tags_generation": False,
1113
+ },
1114
+ mcp_servers=[search_mcp] if search_mcp else [],
1115
+ model_item=ModelItem(
1116
+ id="0727-360B-API",
1117
+ name="GLM-4.5",
1118
+ owned_by="openai"
1119
+ ),
1120
+ tool_servers=[],
1121
+ variables={
1122
+ "{{USER_NAME}}": "User",
1123
+ "{{USER_LOCATION}}": "Unknown",
1124
+ "{{CURRENT_DATETIME}}": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
1125
  }
1126
  )
1127
+
1128
+ # Get authentication token
1129
+ auth_token = get_auth_token()
1130
+
1131
+ # Check if tools are enabled and present
1132
+ has_tools = (ServerConfig.TOOL_SUPPORT and
1133
+ request.tools and
1134
+ len(request.tools) > 0 and
1135
+ request.tool_choice != "none")
1136
+
1137
+ # Handle response based on stream flag
1138
+ if request.stream:
1139
+ handler = StreamResponseHandler(upstream_req, chat_id, auth_token, has_tools)
1140
+ return StreamingResponse(
1141
+ handler.handle(),
1142
+ media_type="text/event-stream",
1143
+ headers={
1144
+ "Cache-Control": "no-cache",
1145
+ "Connection": "keep-alive",
1146
+ }
1147
+ )
1148
+ else:
1149
+ handler = NonStreamResponseHandler(upstream_req, chat_id, auth_token, has_tools)
1150
+ return handler.handle()
1151
+
1152
+ except HTTPException:
1153
+ raise
1154
+ except Exception as e:
1155
+ debug_log(f"处理请求时发生错误: {str(e)}")
1156
+ import traceback
1157
+ debug_log(f"错误堆栈: {traceback.format_exc()}")
1158
+ raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
1159
 
1160
 
1161
  # =============================================================================
 
1164
 
1165
  if __name__ == "__main__":
1166
  import uvicorn
1167
+ uvicorn.run("main:app", host="0.0.0.0", port=ServerConfig.LISTEN_PORT, reload=True)
tests/test_weather.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+
3
+ import json
4
+ import requests
5
+
6
+ # API 配置
7
+ API_BASE = "http://localhost:8080"
8
+ API_KEY = "sk-tbkFoKzk9a531YyUNNF5"
9
+
10
+ def test_weather_query():
11
+ """测试天气查询"""
12
+ print("=" * 50)
13
+ print("上海天气查询测试")
14
+ print("=" * 50)
15
+
16
+ # 工具定义
17
+ tool = {
18
+ "type": "function",
19
+ "function": {
20
+ "name": "get_weather",
21
+ "description": "查询指定城市的天气信息",
22
+ "parameters": {
23
+ "type": "object",
24
+ "properties": {
25
+ "city": {"type": "string", "description": "城市名称"},
26
+ "date": {"type": "string", "description": "查询日期(可选)"}
27
+ },
28
+ "required": ["city"]
29
+ }
30
+ }
31
+ }
32
+
33
+ # 发送请求
34
+ headers = {
35
+ "Content-Type": "application/json",
36
+ "Authorization": f"Bearer {API_KEY}"
37
+ }
38
+
39
+ data = {
40
+ "model": "GLM-4.5",
41
+ "messages": [
42
+ {"role": "user", "content": "查询上海2025年9月3日的天气"}
43
+ ],
44
+ "tools": [tool]
45
+ }
46
+
47
+ print("\n发送请求...")
48
+ response = requests.post(f"{API_BASE}/v1/chat/completions",
49
+ headers=headers,
50
+ json=data)
51
+
52
+ if response.status_code == 200:
53
+ result = response.json()
54
+ message = result["choices"][0]["message"]
55
+
56
+ print("\n模型响应:")
57
+ if message.get("tool_calls"):
58
+ print("检测到工具调用:")
59
+ for tc in message["tool_calls"]:
60
+ print(f" - 工具: {tc['function']['name']}")
61
+ print(f" - 参数: {tc['function']['arguments']}")
62
+ else:
63
+ print("未检测到工具调用")
64
+ print(f"内容: {message.get('content', '无内容')[:100]}...")
65
+ else:
66
+ print(f"请求失败: {response.status_code}")
67
+ print(f"错误信息: {response.text}")
68
+
69
+ if __name__ == "__main__":
70
+ test_weather_query()