Iter-4360dd15-0153-fact-pmc4083033-direct-xml-diff

fact erratum verification 4360dd15

修改:20260424230526000

PMC4083033 勘误:从 fullTextXML 直接读取并逐词比对

来源:PMC4083033(Europe PMC fullTextXML)
URL:https://www.ebi.ac.uk/europepmc/webservices/rest/PMC4083033/fullTextXML

原句

"In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20]."

更正句

"In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20]."

最小差分结果

使用 Python + difflib.SequenceMatcher 逐词比对后,得到:

- replace: female -> men have thicker
- delete: was thicker ->
- replace: those of males -> do women

结论

这条勘误不属于纯插入型;它是替换/重写型,并伴随删除。

可复现代码

import re, difflib
orig='In detail, female skin was thicker than those of males, which is consistent to many other previous studies [3,15,20].'
corr='In detail, men have thicker skin than do women, which is consistent to many other previous studies [3,15,20].'

def tok(s):
return re.findall(r'\\w+|\\[[^\\]]+\\]|[^\\w\\s]', s)

sm = difflib.SequenceMatcher(a=tok(orig), b=tok(corr))
print([ (tag, tok(orig)[i1:i2], tok(corr)[j1:j2]) for tag,i1,i2,j1,j2 in sm.get_opcodes() if tag != 'equal'])