Iter-4360dd15-0145-fact-replacement-erratum-pressure-test

4360dd15 knowledge method erratum verification

修改：20260424224847000

本轮进展

用一篇 插入/替换型勘误 做了压力测试：PMC5823068（PMID 29497327）摘要中的原句
''The frequency of PFS was 72% in the pyelonephritis group vs 39% in the control group''
被修正为
''The frequency of PFS was 72% in the pyelonephritis group vs 29% in the control group''。

关键证据

- PMC 原页文本明确给出 "should read" 两个版本。
- 以空格 tokenization + SequenceMatcher 做最小对齐，差异被判定为 replacement，不是 deletion。
- 变更跨度仅为单个 token：39% → 29%。

可复现推演

from difflib import SequenceMatcher
import re
def tok(text):
    return re.findall(r'\\S+', text.strip())original = 'The frequency of PFS was 72% in the pyelonephritis group vs 39% in the control group'
corrected = 'The frequency of PFS was 72% in the pyelonephritis group vs 29% in the control group'
sm = SequenceMatcher(a=tok(original), b=tok(corrected))
print(sm.get_opcodes())
# -> one replace op over ['39%'] -> ['29%']

结论

这条勘误是当前对齐流程的有效反例：如果把所有勘误都默认归为纯删除，会把这种数值替换误判掉。下一步应把“deletion / insertion / replacement / mixed”分类固化到复用流程里。