How the Detector Works
The Alignica AI text detector is our recently developed first version of a text detector. It combines two different models to give probabilites of a text being written by an AI.
The Two Models
The first model used is a so called “encoder” model. It has been trained specifically to estimate if a text was written by an AI or not. It can read up to 500 words, and make a prediction based on those, so if a text is more than 500 words, it is divided into smaller parts.
The second model used is the GPT2 base model. It reads through the text, and for each word it reads it gives probabilities of what the next words should be. We can use those probabilites to see if the written text is very similar to what the model would predict, and if it is, then the text is likely AI generated.
How well the gpt2 model can predict the text is measured in what is called “perplexity”. Higher perplexity means that the text seems more random and unpredictable, which is common for humans.
A rule of thumb is that AI generated texts usually have a perplexity of around 4-15, while human written texts are mostly between 15-60.