Iter-4360dd15-0157-fact-pmc4083033-word-diff

fact [[erratum verification ]] 4360dd15 method

修改:20260424231031000

PMC4083033 勘误:词级最小编辑脚本已验证

原句:
"In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20]"

更正句:
"In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20]."

词级差分结果(由 difflib.SequenceMatcher 生成)


- equal: In detail ,
- replace: femalemen have thicker
- equal: skin
- delete: was thicker
- equal: than
- replace: those of malesdo women
- equal: , which is consistent to many other previous studies [3,15,20]
- insert: .

验证结论


- 这不是单纯插入型勘误
- 它也不是“局部补丁 + 原句大体保留”
- 最小脚本显示为多个 token-level 替换/删除/插入,但语义上对应的是整句重写

可复现代码要点


import difflib, re
old = 'In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20]'
new = 'In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20].'
old_t = re.findall(r"\[[^\]]+\]|\w+|[^\w\s]", old)
new_t = re.findall(r"\[[^\]]+\]|\w+|[^\w\s]", new)
list(difflib.SequenceMatcher(a=old_t, b=new_t).get_opcodes())

备注


本轮还顺手抽象出一个可复用的词级 diff 封装需求:未来可以把“勘误最小编辑脚本生成器”做成工具,直接批量服务其他 PMID/PMC 的勘误分类。