常识库建立近来许多小同伴对于AI感兴致,念建立自己大概公司层里的常识库。那里大抵分析下完毕的过程。1.导进word,提炼数据(chunk切片)和RAG2.embedding 计较背质,存进背质数据库3.交进用户恳求,计较恳求embedding,从背质数据库盘问前K个数据4.喂给AI模子,按照供给的数据和promt天生答复联网盘问而那里咱们将分析怎样联网盘问,原来念干成tool call,可是限于当地布置的ds版原没有撑持,干脆间接干成通例版原。大抵过程即是按照用户的恳求,来盘问,与前N个相干的盘问成果,截至爬虫,剔除其余数据,保存有效的数据,分离promt喂给模子阐发输出。您能够对于那部门数据截至embedding存进数据库慢存下来,下次用户提问重复类似常识库的步调便可,但是需要包管实效性。那里尔是布置了searxng动作搜刮引擎,一次搜刮多个浏览器,按照前去的成果,与此中一个链交截至爬虫。import ollamafrom openai import OpenAI import requests import sysimport iofrom bs4 import BeautifulSoup as BSapi_key = "ollama"openai = OpenAI(api_key=api_key,base_url='http://192.168.214.1:11434/v1')ollama_client = ollama.Client(host="http://192.168.214.1:11434")sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')# searxng盘问和剖析urldef searxng_search(prompt): response = requests.get(f"http://192.168.214.1:9876/search?q={prompt}&language=zh-CN&time_range=&safesearch=0&categories=general") soup = BS(response.content) body = soup.body mainpage = body.main urls_result = [] urls = mainpage.find_all('a',class_='url_header') for item in urls: urls_result.append(item['href']) for item in urls_result: if str(item).endswith('htm') or str(item).endswith('html'): return spider(item)#依据 URL爬虫def spider(url): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" } proxies = { "http": "http://127.0.0.1:7897", "https": "http://127.0.0.1:7897", } response = requests.get(url,headers=headers, proxies=proxies) soup = BS(response.content) # 简略没有需要的标签 for tag in soup(["script", "source", "style", "head", "img", "svg", "a", "form", "link", "iframe"]): tag.decompose() # 简略统统元艳的 data-* 属性 for element in soup.find_all(): attrs = list(element.attrs.keys()) # 获得统统属性 for attr in attrs: if attr.startswith("data-"): del element[attr] # 提炼主要实质 return soup.get_text(separator="", strip=True).replace('\\','').replace('/','') def oneline_search(prompt): data = searxng_search(prompt) messages=[{'role': 'user', 'content': f'{prompt}'}] messages.insert(0,{'role': 'user', 'content': f'{data}'}) final_response = ollama_client.chat( model="deepseek-r1:32b", messages=messages ) print('ollama 挪用') print(final_response.message.content) final_response = openai.chat.completions.create( model="deepseek-r1:32b", messages = messages) print('openai 挪用') print(final_response.choices[0].message.content)if __name__=='__main__': oneline_search(prompt="叨教台湾演员年夜S是如何逝世的")
各人共同等候谦血版的ds吧,念必强的恐怖。网上道的备案收token的没有要随意相信,自己布置阉割版一致调试,等候民网铺开吧 |