Browse Source

trigram -> 5-gram

Ben Kurtovic 8 years ago
1 changed files with 4 additions and 4 deletions
  1. +4

+ 4
- 4
_posts/ View File

@@ -34,10 +34,10 @@ Sources are fetched and then parsed differently depending on the document type
handled by [pdfminer](, and
normalized to a plain text form. We then create multiple
[Markov chains]( – the *article
chain* is built from word trigrams from the article text, and a *source chain*
is built from each source text. A *delta chain* is created for each source
chain, representing the intersection of it and the article chain by examining
which nodes are shared.
chain* is built from word [5-grams]( from
the article text, and a *source chain* is built from each source text. A *delta
chain* is created for each source chain, representing the intersection of it
and the article chain by examining which nodes are shared.

But how do we use these chains to decide whether a violation is present?
