Stylometric Fingerprinting

Last generated: 2026-06-25 22:20:50 AEST. Detector: translation_guidance_scan_v4.

This page tracks exploratory stylometric work for Stephanos, especially whether formula, gloss, grammar, and source-form features can expose different epitomising layers. The first implemented feature family is the existing translation-guidance recogniser output: each entry gets a vector of rule hits and non-hits, then the vectors are embedded and clustered.

3,570lemmas with at least one active guidance scan
1,516lemmas in the broad formula-vector UMAP slice
320Kappa entries in that UMAP slice
13Parisinus/non-epitomised entries in that UMAP slice
formula: 32, gloss: 70, proper_noun: 75active recogniser rules by kind
0.056KMeans silhouette on broad formula vectors
Current interpretation: the broad formula-vector slice shows an apparent Kappa signal, but the complete-formula slice does not. That means the first visible separation is probably mixed with scan-history and rule-coverage effects. Treat this as a progress marker, not as a demonstrated epitomiser fingerprint.

UMAP

The plotted vectors use formula recogniser rows only, with occurrence counts capped at 5 and transformed by log1p. The broad slice requires at least 21 of the 32 active formula rules to have been checked. Embedding method: UMAP.

Cluster Summary

ClusterRowsKappaParisinusMedian entryMean matched rulesTop over-represented formulae
4699125 (17.9%)1732.8διὰ τοῦ + «X» (GREEK LETTER) + [FORM OF γράφειν] (+0.00); Y (EPITHET) + X (DEITY) (+0.00)
121532 (14.9%)4814.7X (AUTHOR NAME) + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.74); Χ (AUTHOR NAME) + ἐν + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.45); X (AUTHOR NAME) + ἐν + Y (dative BOOK NAME) (+0.35); X (AUTHOR NAME) + Y (NUMERAL) (+0.06)
021135 (16.6%)3806.4X (nominative PROPER NOUN) ... + ἀπό + Y (genitive ETYMON) (+0.87); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.55); X (nominative PROPER NOUN)... + ἀπό + Y (genitive ARTICLE + genitive ETYMON) (+0.42); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.30)
717847 (26.4%)3806.7X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.78); ὡς + X (nominative ETYMON) + Y (nominative DERIVED NOUN) (+0.74); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.22); ὡς + X (AUTHOR NAME) (+0.21)
210024 (24.0%)0788.0ὡς + X (definite ARTICLE + ETYMON) + Y (nominative DERIVED NOUN) (+0.99); ὡς + X (nominative ETYMON) + Y (nominative DERIVED NOUN) (+0.64); X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.59); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.25)
36722 (32.8%)1635.6X... πλησίον + Y. (genitive) (+1.00); τὸ ἐθνικὸν + X (nominative ETHNONYM) (+0.10); X (nominative PROPER NOUN)... + ἀπό + Y (genitive ARTICLE + genitive ETYMON) (+0.01); X (AUTHOR NAME) + Y (NUMERAL) (+0.01)
82218 (81.8%)011411.8ὡς + X (AUTHOR NAME) + Y (dative NUMBER) + Z (genitive BOOK NAME) (+1.00); ὡς + X (AUTHOR NAME) (+0.71); X (AUTHOR NAME) + Y (NUMERAL) (+0.58); X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.37)
51010 (100.0%)012013.4X (NEUTER ETYMON) + ἀφ' οὗ + Y (DERIVED NOUN) (+1.00); X (MASCULINE ETYMON) + ἀφ' οὗ + Y (DERIVED NOUN) (+1.00); [NO ANTECEDENT] + ἀφ' οὗ + Y (DERIVED NOUN) (+0.90); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.64)
675 (71.4%)021710.3εἰς + «X» (GREEK LETTER) (+1.00); ὡς + X (AUTHOR NAME) (+0.36); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.31); X (nominative PROPER NOUN) ... + ἀπό + Y (genitive ETYMON) (+0.23)
972 (28.6%)110712.6ἐκάλειτο X (nominative PROPER NOUN)... + κέκληται Y (nominative PROPER NOUN) (+1.00); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.35); Χ (AUTHOR NAME) + ἐν + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.33); τὸ ἐθνικὸν + X (nominative ETHNONYM) (+0.32)

Separation Checks

CheckRowsPositive classResultInterpretation
Kappa vs non-Kappa, broad formula slice 1,516 320 Kappa 0.739 balanced accuracy across folds [0.761, 0.758, 0.701, 0.708, 0.766] Useful as a warning flag, but confounded by recogniser coverage and rule history.
Kappa vs non-Kappa, complete formula slice 370 320 Kappa 0.554 balanced accuracy across folds [0.655, 0.492, 0.602, 0.608, 0.416] The near-baseline result is evidence against claiming a robust Kappa fingerprint from current formula hits alone.
Parisinus/non-epitomised comparison 370 complete formula rows 1 Parisinus descriptive only The current non-epitomised sample is too small for a serious classifier; use it as a qualitative control set.

Coverage

Corpus Versions

VersionRowsUsableWith Greek text
epitome3,5513,5513,551
parisinus191919

Guidance Scan Counts

KindScan rowsLemmas checkedRules checkedLemmas matchedOccurrences
formula46,2201,517321,4428,099
gloss35,2623,570707281,445
proper_noun5715712

Formula Coverage Distribution

Formula rules checkedLemmas
02,053
171
211,136
2910
32370

Sentence Grammar Feature Tables

TableRows
sentence_grammar_runs8
sentence_grammar_evaluations111
sentence_grammar_tokens495

Next Implementation Steps

  1. Freeze a coverage-balanced formula/gloss feature matrix and rerun the UMAP after every nightly guidance scan.
  2. Add non-recogniser stylometric baselines: character n-grams, function-word rates, particles, clause connectors, entry length, and normalized type-token measures.
  3. Populate the sentence-grammar tables over a coverage-balanced sample, then test morphosyntactic vectors separately from formula vectors.
  4. Expand the non-epitomised control set beyond the current Parisinus rows before making any claim about epitomiser layers.
  5. Validate clusters by close reading: each cluster needs formula examples and counterexamples before it becomes an argument.