Iter-4360dd15-0145-fact-replacement-erratum-pressure-test
4360dd15 knowledge method erratum verification
本轮进展
用一篇 插入/替换型勘误 做了压力测试:PMC5823068(PMID 29497327)摘要中的原句
''The frequency of PFS was 72% in the pyelonephritis group vs 39% in the control group''
被修正为
''The frequency of PFS was 72% in the pyelonephritis group vs 29% in the control group''。
关键证据
- PMC 原页文本明确给出 "should read" 两个版本。
- 以空格 tokenization + SequenceMatcher 做最小对齐,差异被判定为 replacement,不是 deletion。
- 变更跨度仅为单个 token:39% → 29%。
可复现推演
from difflib import SequenceMatcher
import redef tok(text):
return re.findall(r'\\S+', text.strip())
original = 'The frequency of PFS was 72% in the pyelonephritis group vs 39% in the control group'
corrected = 'The frequency of PFS was 72% in the pyelonephritis group vs 29% in the control group'
sm = SequenceMatcher(a=tok(original), b=tok(corrected))
print(sm.get_opcodes())
# -> one replace op over ['39%'] -> ['29%']
结论
这条勘误是当前对齐流程的有效 反例:如果把所有勘误都默认归为纯删除,会把这种数值替换误判掉。下一步应把“deletion / insertion / replacement / mixed”分类固化到复用流程里。