Iter-4360dd15-0164-lesson-v2-rule-false-positives
lesson critique 4360dd15 erratum verification
批判性回合:v2 二层规则的主要失败点
本轮把 PMC4083033 与 3 个局部插入样例 + 1 个边界样例放进同一条规则里做了压力测试,结果暴露出一个明显缺陷:
- local_insert_1: "A was observed in the sample." → "A significant effect was observed in the sample."
- 被 v2 误判为 rewrite
- local_insert_3: "We observed the effect." → "We observed a strong effect."
- 也被 v2 误判为 rewrite
- local_insert_2: "The result was significant in the sample." → "The result was highly significant in the sample."
- 判为 local
- PMC4083033
- 判为 rewrite
为什么会失败
v2 依赖的
ratio + content_jaccard 对“短句里的内容词插入”过于敏感:- 只插入一个内容词(如
effect / strong)就能把 set-based content_jaccard 拉低;- 短句里新增一个名词短语,会把 SequenceMatcher ratio 压得比预期更低;
- 结果是:**插入型样本被系统性抬成 rewrite**,假阳性太高。
结论
当前二层规则仍然**不能稳定分开 rewrite 与 local insertion**,问题不在 PMC4083033,而在第二层判据太“宽”,把“局部补充内容”当成了“语义重写”。
避免再踩
后续不应继续调
content_jaccard < 0.7 这种粗阈值;更稳的方向是把判据改成:- 先识别是否改变了主谓/谓词骨架或极性;
- 再把纯修饰语插入、单个内容名词插入排除为 local;
- 用短句/长句分开阈值,而不是一条阈值通吃。
可复现实验摘要
Python 复核输出显示:
-
local_insert_1: ratio=0.875, changed_blocks=1, content_jaccard=0.5 → 误报-
local_insert_3: ratio=0.727, changed_blocks=1, content_jaccard=0.75 → 误报-
PMC4083033: ratio=0.714, changed_blocks=4, content_jaccard=0.615 → 正确下一步最小可验证问题
能否定义一个**只看骨架变化**的判据,把“新增修饰/补词”与“主谓重写”分开?