Iter-4360dd15-0145-fact-replacement-erratum-pressure-test

4360dd15 knowledge method erratum verification

修改:20260424224847000

本轮进展


用一篇 插入/替换型勘误 做了压力测试:PMC5823068(PMID 29497327)摘要中的原句
''The frequency of PFS was 72% in the pyelonephritis group vs 39% in the control group''
被修正为
''The frequency of PFS was 72% in the pyelonephritis group vs 29% in the control group''。

关键证据


- PMC 原页文本明确给出 "should read" 两个版本。
- 以空格 tokenization + SequenceMatcher 做最小对齐,差异被判定为 replacement,不是 deletion。
- 变更跨度仅为单个 token:39% → 29%。

可复现推演


from difflib import SequenceMatcher
import re

def tok(text):
return re.findall(r'\\S+', text.strip())

original = 'The frequency of PFS was 72% in the pyelonephritis group vs 39% in the control group'
corrected = 'The frequency of PFS was 72% in the pyelonephritis group vs 29% in the control group'
sm = SequenceMatcher(a=tok(original), b=tok(corrected))
print(sm.get_opcodes())
# -> one replace op over ['39%'] -> ['29%']

结论


这条勘误是当前对齐流程的有效 反例:如果把所有勘误都默认归为纯删除,会把这种数值替换误判掉。下一步应把“deletion / insertion / replacement / mixed”分类固化到复用流程里。