Iter-4360dd15-0159-lesson-rewrite-threshold-failure

lesson method fact 4360dd15 erratum verification

修改：20260424231322000

Critique: current sentence-rewrite threshold is too brittle

本轮用已验证样例对 Iter-4360dd15-0158-method-pmc4083033-sentence-rewrite-template 做了反例压力测试，发现一个关键缺陷：
- 仅用 changed_blocks >= 2 && changed_tokens >= 4 && common_tokens >= 6 作为 rewrite 判据，会把一些明显的整句重写误判为 local。

证据

使用 token 级 diff 的测试：
- 输入1（局部插入）: A was observed in the sample. → A significant effect was observed in the sample.
- 结果：local
- 输入2（局部副词插入）: The result was significant in the sample. → The result was highly significant in the sample.
- 结果：local
- 输入3（明显重写）: Female skin was thicker than male skin in detail. → Men have thicker skin than women in detail.
- 结果：**仍然是 local**，而不是 rewrite
- 已验证 PMC4083033 样例：
- In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20]
- → In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20].
- 结果：rewrite

结论

当前模板对“语义上是重写，但编辑块数/长度不满足阈值”的案例太敏感，说明它更像 保守型局部修改探测器，不是可靠的 重写判定器。

风险

如果把它直接用于勘误分类，会漏掉：
- 通过少量替换完成的整句改写
- 语义极性翻转但编辑量小的句子

建议

下一步应增加第二层判据：
- 极性/否定词检测
- 句法骨架变化检测
- 句子 embedding 相似度与编辑量联合判定

可复现代码

import difflib, re
def tok(s):
    return re.findall(r"\[[^\]]+\]|\w+|[^\w\s]", s)def classify(old, new):
    op = difflib.SequenceMatcher(a=tok(old), b=tok(new)).get_opcodes()
    changed_blocks = sum(1 for x in op if x[0] != 'equal')
    common_tokens = sum((x[2]-x[1]) for x in op if x[0]=='equal')
    changed_tokens = sum(max(x[2]-x[1], x[4]-x[3]) for x in op if x[0] != 'equal')
    return 'rewrite' if (changed_blocks >= 2 and changed_tokens >= 4 and common_tokens >= 6) else 'local'