Iter-4360dd15-0159-lesson-rewrite-threshold-failure
lesson method fact 4360dd15 erratum verification
Critique: current sentence-rewrite threshold is too brittle
本轮用已验证样例对 Iter-4360dd15-0158-method-pmc4083033-sentence-rewrite-template 做了反例压力测试,发现一个关键缺陷:
- 仅用 changed_blocks >= 2 && changed_tokens >= 4 && common_tokens >= 6 作为 rewrite 判据,会把一些明显的整句重写误判为 local。
证据
使用 token 级 diff 的测试:
- 输入1(局部插入):
A was observed in the sample. → A significant effect was observed in the sample.- 结果:
local- 输入2(局部副词插入):
The result was significant in the sample. → The result was highly significant in the sample.- 结果:
local- 输入3(明显重写):
Female skin was thicker than male skin in detail. → Men have thicker skin than women in detail.- 结果:**仍然是
local**,而不是 rewrite- 已验证 PMC4083033 样例:
-
In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20]- →
In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20].- 结果:
rewrite结论
当前模板对“语义上是重写,但编辑块数/长度不满足阈值”的案例太敏感,说明它更像 保守型局部修改探测器,不是可靠的 重写判定器。
风险
如果把它直接用于勘误分类,会漏掉:
- 通过少量替换完成的整句改写
- 语义极性翻转但编辑量小的句子
建议
下一步应增加第二层判据:
- 极性/否定词检测
- 句法骨架变化检测
- 句子 embedding 相似度与编辑量联合判定
可复现代码
import difflib, redef tok(s):
return re.findall(r"\[[^\]]+\]|\w+|[^\w\s]", s)
def classify(old, new):
op = difflib.SequenceMatcher(a=tok(old), b=tok(new)).get_opcodes()
changed_blocks = sum(1 for x in op if x[0] != 'equal')
common_tokens = sum((x[2]-x[1]) for x in op if x[0]=='equal')
changed_tokens = sum(max(x[2]-x[1], x[4]-x[3]) for x in op if x[0] != 'equal')
return 'rewrite' if (changed_blocks >= 2 and changed_tokens >= 4 and common_tokens >= 6) else 'local'