这个对应lean声明(formal state)就是import Mathlib.Data.Nat.Basicimport Mathlib.Algebra.BigOperators.Basic
open BigOperators
/--The sum of the first `n` odd numbers equals `n^2`:`∑_{k = 0}^{n-1} (2k+1) = n^2`.-/theorem odd_sum_square (n : ℕ) : ∑ k in Finset.range n, (2 * k + 1) = n ^ 2 := by -- proof to be filled admit
然后把这个东西改成prompts发给prover2里的V3(prover2是两个模型组成的一个V3,一个7B的子处理器)V3接到一个这样的prompts[System] You are a mathematical reasoning expert and Lean 4 proof designer.Given a math problem and its Lean formal statement, you should:
1. Explain the solution in natural language (chain-of-thought).2. Decompose the proof into clear subgoals.3. Produce a Lean 4 proof skeleton which follows your plan and contains `sorry` placeholders for subgoals.
[User]Problem (informal):Prove that the sum of the first n odd numbers equals n^2.
Formal statement (Lean 4):theorem odd_sum_square (n : ℕ) : ∑ k in Finset.range n, (2 * k + 1) = n ^ 2 := by ?_
处理完了,就成了一个COT的解题的视角+lean的骨架了,紧紧差一些细节的subgoal没去证明[DeepSeek-V3]
(1) Natural-language chain-of-thought (high-level plan):
We prove by induction on n.
Base case: n = 0. The sum over an empty range is 0, and 0^2 = 0.
Inductive step: assume that for some n we have ∑_{k < n} (2k + 1) = n^2.Then ∑_{k < n+1} (2k + 1)= (∑_{k < n} (2k + 1)) + (2n + 1)= n^2 + (2n + 1)= (n + 1)^2.Thus, by induction, the equality holds for all n.
(2) Subgoal decomposition (informal):
Subgoal 1 (base case): Show: ∑_{k < 0} (2k+1) = 0^2.
Subgoal 2 (range decomposition): For all n, ∑_{k < n+1} (2k+1) = (∑_{k < n} (2k+1)) + (2n+1).
Subgoal 3 (inductive hypothesis usage): Assume IH: ∑_{k < n} (2k+1) = n^2. Show: ∑_{k < n+1} (2k+1) = (n+1)^2.
Subgoal 4 (algebraic identity): Show: n^2 + (2n+1) = (n+1)^2.
(3) Lean 4 proof skeleton with `sorry`:
import Mathlib.Data.Nat.Basicimport Mathlib.Algebra.BigOperators.Basic
open BigOperators
theorem odd_sum_square (n : ℕ) : ∑ k in Finset.range n, (2 * k + 1) = n ^ 2 := by -- induction on n induction' n with n ih · -- Subgoal 1: base case n = 0 -- ∑_{k < 0} (2k+1) = 0^2 -- (use simp to evaluate the empty sum) sorry · -- Subgoal 2 + 3 + 4: inductive step -- We rewrite the sum over `range n.succ`. have h_decompose : ∑ k in Finset.range n.succ, (2 * k + 1) = (∑ k in Finset.range n, (2 * k + 1)) + (2 * n + 1) := by -- Subgoal 2: range decomposition sorry
-- Use the induction hypothesis to replace the partial sum by n^2. have h_using_IH : (∑ k in Finset.range n, (2 * k + 1)) + (2 * n + 1) = n ^ 2 + (2 * n + 1) := by -- Subgoal 3: apply IH sorry
-- Now prove the algebraic identity n^2 + (2n+1) = (n+1)^2. have h_algebra : n ^ 2 + (2 * n + 1) = (Nat.succ n) ^ 2 := by -- Subgoal 4: algebraic manipulation sorry
-- Combine the equalities to close the goal. -- target: ∑_{k < n.succ} (2k+1) = (n.succ)^2 -- by transitivity using h_decompose, h_using_IH, h_algebra sorry
下面open Bigoperators的就是lean4的基本骨架了,然后每一个-- 是一个subgoal,sorry留着当占位符,交给7b的小模型去证明了举一个子目的的例子:[System] You are a Lean 4 theorem prover. Your task is to fill in the proofof the given lemma, without changing its statement.
[User]Context:
import Mathlib.Data.Nat.Basicimport Mathlib.Algebra.BigOperators.Basic
open BigOperators
-- `odd_sum_square` is being proved by induction on `n`.-- In the inductive step, we have the following subgoal:
theorem range_decompose_subgoal (n : ℕ) : ∑ k in Finset.range n.succ, (2 * k + 1) = (∑ k in Finset.range n, (2 * k + 1)) + (2 * n + 1) := by ?_
Please fill in the proof in Lean 4.
7B的就把这个给解了,解完之后的被带入到原来的V3的结果了,sorry被交换成这个结果[Prover-7B]
theorem range_decompose_subgoal (n : ℕ) : ∑ k in Finset.range n.succ, (2 * k + 1) = (∑ k in Finset.range n, (2 * k + 1)) + (2 * n + 1) := by -- Use the standard lemma `Finset.range_succ`. -- It tells us: -- range (n.succ) = range n ∪ {n}, -- and the sum over `range (n.succ)` splits accordingly. simpa [Finset.range_succ, Nat.succ_eq_add_one] using Finset.sum_range_succ (fun k => (2 * k + 1 : ℕ)) n
那么prm是不是个好的思绪,但是prm难训啊。Deepseek math V2的做法是什么?其实可以总结为笼统PRM不同于每一步都要验证正确与否的PRM,你针对输入中间结果一致确认个好坏,这个还是可以完成的吧!所以思绪就来了首先,我要有一个小镇做题家,一个给做题家批卷子的教师,但是这个教师能够本身就不咋滴,所以我还要找个教委的来监督教师的程度。
于是就有了下面这3个模型
1. 证明生成器 ,做题家(Proof Generator) (πθ)
这是担任实践生成数学证明的言语模型。
• 功能: 根据给定的成绩 (X) 生成证明 (Y)。
• 训练目的: 运用训练好的验证器作为奖励模型 (RY) 对其停止强化学习训练,以最大化证明质量。
• 自我验证: 为了加强推理才能,生成器还被激励产生证明 Y 之后,紧接着停止自我分析 (Z),即自我评价和纠错。这种自我评价的准确性也遭到奖励,简单说就是你想高考获得高分,光做选择题和填空是不行的,你得做大题证明题,而且你的证明步骤得写清楚,这部分会有额外的分数。