This paper present description of the main types of adopted text information search algorithms and the results of study TF, LongSent and Winnowing algorithms for searching near duplicates in the Internet. The feature of the study is the algorithms are used for each paragraphs of input docum`ents text separately. The quality of algorithms was appraised by metrics: accuracy, completeness, F-measure.
duplicate, algorithm, shingle, similarity
"Poshuk zapozychenoi informatsii v Interneti, vykorystovuiuchy alhorytmy: TF, LongSent, Winnowing" ,
Information Processing Systems,