职贝云数AI新零售门户
标题:
本地AI 大模型联网查询
[打印本页]
作者:
NWI
时间:
2025-2-10 10:04
标题:
本地AI 大模型联网查询
知识库构建最近很多小伙伴对AI感兴味,想构建本人或者公司层面的知识库。这里大致阐明下完成的流程。1.导入word,提取数据(chunk切片)以及RAG2.embedding 计算向量,存入向量数据库3.接入用户央求,计算央求embedding,从向量数据库查询前K个数据4.喂给AI模型,根据提供的数据以及promt生成回答联网查询而这里我们将阐明如何联网查询,本来想做成tool call,但是限于本地部署的ds版本不支持,索性直接做成常规版本。大致流程就是根据用户的央求,去查询,取前N个相关的查询结果,停止爬虫,剔除其他数据,保留有用的数据,结合promt喂给模型分析输入。你可以对这部分数据停止embedding存入数据库缓存上去,下次用户发问反复相似知识库的步骤即可,但需求保证时效性。这里我是部署了searxng作为搜索引擎,一次搜索多个阅读器,根据前往的结果,取其中一个链接停止爬虫。import ollamafrom openai import OpenAI import requests import sysimport iofrom bs4 import BeautifulSoup as BSapi_key = "ollama"openai = OpenAI(api_key=api_key,base_url='http://192.168.214.1:11434/v1')ollama_client = ollama.Client(host="http://192.168.214.1:11434")sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')# searxng查询以及解析urldef searxng_search(prompt): response = requests.get(f"http://192.168.214.1:9876/search?q={prompt}&language=zh-CN&time_range=&safesearch=0&categories=general") soup = BS(response.content) body = soup.body mainpage = body.main urls_result = [] urls = mainpage.find_all('a',class_='url_header') for item in urls: urls_result.append(item['href']) for item in urls_result: if str(item).endswith('htm') or str(item).endswith('html'): return spider(item)# 根据URL爬虫def spider(url): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" } proxies = { "http": "http://127.0.0.1:7897", "https": "http://127.0.0.1:7897", } response = requests.get(url,headers=headers, proxies=proxies) soup = BS(response.content) # 删除不需求的标签 for tag in soup(["script", "source", "style", "head", "img", "svg", "a", "form", "link", "iframe"]): tag.decompose() # 删除一切元素的 data-* 属性 for element in soup.find_all(): attrs = list(element.attrs.keys()) # 获取一切属性 for attr in attrs: if attr.startswith("data-"): del element[attr] # 提取次要内容 return soup.get_text(separator="", strip=True).replace('\\','').replace('/','') def oneline_search(prompt): data = searxng_search(prompt) messages=[{'role': 'user', 'content': f'{prompt}'}] messages.insert(0,{'role': 'user', 'content': f'{data}'}) final_response = ollama_client.chat( model="deepseek-r1:32b", messages=messages ) print('ollama 调用') print(final_response.message.content) final_response = openai.chat.completions.create( model="deepseek-r1:32b", messages = messages) print('openai 调用') print(final_response.choices[0].message.content)if __name__=='__main__': oneline_search(prompt="请问台湾演员大S是怎样死的")
大家一同等待满血版的ds吧,想必强的可怕。网上说的注册送token的不要随便置信,本人部署阉割版一样调试,等待官网放开吧
欢迎光临 职贝云数AI新零售门户 (https://www.taojin168.com/cloud/)
Powered by Discuz! X3.5