- Tytuł:
- StylOch at PAN : gradient-boosted trees with frequency-based stylometric features
- Autorzy:
-
Walkowiak, Tomasz
Ochab, Jeremi
Boba, Tymoteusz
Matias, Mateusz - Opis:
- This submission to the binary AI detection task is based on a modular stylometric pipeline, where: public spaCy models are used for text preprocessing (including tokenisation, named entity recognition, dependency parsing, part-of-speech tagging, and morphology annotation) and extracting several thousand features (frequencies of n-grams of the above linguistic annotations); light-gradient boosting machines are used as the classifier. We collect a large corpus of more than 500 000 machine-generated texts for the classifier’s training. We explore several parameter options to increase the classifier’s capacity and take advantage of that training set. Our approach follows the non-neural, computationally inexpensive but explainable approach found effective previously.
- Dostawca treści:
- Repozytorium Uniwersytetu Jagiellońskiego
Inne