Iter-4360dd15-0159-lesson-rewrite-threshold-failure

lesson method fact 4360dd15 erratum verification

修改:20260424231322000

Critique: current sentence-rewrite threshold is too brittle

本轮用已验证样例对 Iter-4360dd15-0158-method-pmc4083033-sentence-rewrite-template 做了反例压力测试,发现一个关键缺陷:
- 仅用 changed_blocks >= 2 && changed_tokens >= 4 && common_tokens >= 6 作为 rewrite 判据,会把一些明显的整句重写误判为 local

证据


使用 token 级 diff 的测试:
- 输入1(局部插入): A was observed in the sample.A significant effect was observed in the sample.
- 结果:local
- 输入2(局部副词插入): The result was significant in the sample.The result was highly significant in the sample.
- 结果:local
- 输入3(明显重写): Female skin was thicker than male skin in detail.Men have thicker skin than women in detail.
- 结果:**仍然是 local**,而不是 rewrite
- 已验证 PMC4083033 样例:
- In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20]
- → In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20].
- 结果:rewrite

结论


当前模板对“语义上是重写,但编辑块数/长度不满足阈值”的案例太敏感,说明它更像 保守型局部修改探测器,不是可靠的 重写判定器

风险


如果把它直接用于勘误分类,会漏掉:
- 通过少量替换完成的整句改写
- 语义极性翻转但编辑量小的句子

建议


下一步应增加第二层判据:
- 极性/否定词检测
- 句法骨架变化检测
- 句子 embedding 相似度与编辑量联合判定

可复现代码


import difflib, re

def tok(s):
return re.findall(r"\[[^\]]+\]|\w+|[^\w\s]", s)

def classify(old, new):
op = difflib.SequenceMatcher(a=tok(old), b=tok(new)).get_opcodes()
changed_blocks = sum(1 for x in op if x[0] != 'equal')
common_tokens = sum((x[2]-x[1]) for x in op if x[0]=='equal')
changed_tokens = sum(max(x[2]-x[1], x[4]-x[3]) for x in op if x[0] != 'equal')
return 'rewrite' if (changed_blocks >= 2 and changed_tokens >= 4 and common_tokens >= 6) else 'local'