Plagiarism Detection in arXiv
Abstract
We describe a large-scale application of methods for
finding plagiarism in research document collections. The
methods are applied to a collection of 284,834 documents
collected by arXiv.org over a 14 year period, covering a
few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors,
and heuristics are developed to reduce the number of false
positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many
times larger
Description
plagarism articals
item.page.type
item.page.format
Keywords
plagarism, plagarism