Stylometric Fingerprinting
Last generated: 2026-06-25 22:20:50 AEST. Detector: translation_guidance_scan_v4.
This page tracks exploratory stylometric work for Stephanos, especially whether formula,
gloss, grammar, and source-form features can expose different epitomising layers. The first
implemented feature family is the existing translation-guidance recogniser output: each entry
gets a vector of rule hits and non-hits, then the vectors are embedded and clustered.
3,570lemmas with at least one active guidance scan
1,516lemmas in the broad formula-vector UMAP slice
320Kappa entries in that UMAP slice
13Parisinus/non-epitomised entries in that UMAP slice
formula: 32, gloss: 70, proper_noun: 75active recogniser rules by kind
0.056KMeans silhouette on broad formula vectors
Current interpretation: the broad formula-vector slice shows an apparent Kappa
signal, but the complete-formula slice does not. That means the first visible separation is
probably mixed with scan-history and rule-coverage effects. Treat this as a progress marker,
not as a demonstrated epitomiser fingerprint.
UMAP
The plotted vectors use formula recogniser rows only, with occurrence counts capped at 5 and
transformed by log1p. The broad slice requires at least 21 of the 32
active formula rules to have been checked. Embedding method: UMAP.
Cluster Summary
| Cluster | Rows | Kappa | Parisinus | Median entry | Mean matched rules | Top over-represented formulae |
|---|
| 4 | 699 | 125 (17.9%) | 1 | 73 | 2.8 | διὰ τοῦ + «X» (GREEK LETTER) + [FORM OF γράφειν] (+0.00); Y (EPITHET) + X (DEITY) (+0.00) |
| 1 | 215 | 32 (14.9%) | 4 | 81 | 4.7 | X (AUTHOR NAME) + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.74); Χ (AUTHOR NAME) + ἐν + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.45); X (AUTHOR NAME) + ἐν + Y (dative BOOK NAME) (+0.35); X (AUTHOR NAME) + Y (NUMERAL) (+0.06) |
| 0 | 211 | 35 (16.6%) | 3 | 80 | 6.4 | X (nominative PROPER NOUN) ... + ἀπό + Y (genitive ETYMON) (+0.87); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.55); X (nominative PROPER NOUN)... + ἀπό + Y (genitive ARTICLE + genitive ETYMON) (+0.42); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.30) |
| 7 | 178 | 47 (26.4%) | 3 | 80 | 6.7 | X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.78); ὡς + X (nominative ETYMON) + Y (nominative DERIVED NOUN) (+0.74); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.22); ὡς + X (AUTHOR NAME) (+0.21) |
| 2 | 100 | 24 (24.0%) | 0 | 78 | 8.0 | ὡς + X (definite ARTICLE + ETYMON) + Y (nominative DERIVED NOUN) (+0.99); ὡς + X (nominative ETYMON) + Y (nominative DERIVED NOUN) (+0.64); X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.59); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.25) |
| 3 | 67 | 22 (32.8%) | 1 | 63 | 5.6 | X... πλησίον + Y. (genitive) (+1.00); τὸ ἐθνικὸν + X (nominative ETHNONYM) (+0.10); X (nominative PROPER NOUN)... + ἀπό + Y (genitive ARTICLE + genitive ETYMON) (+0.01); X (AUTHOR NAME) + Y (NUMERAL) (+0.01) |
| 8 | 22 | 18 (81.8%) | 0 | 114 | 11.8 | ὡς + X (AUTHOR NAME) + Y (dative NUMBER) + Z (genitive BOOK NAME) (+1.00); ὡς + X (AUTHOR NAME) (+0.71); X (AUTHOR NAME) + Y (NUMERAL) (+0.58); X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.37) |
| 5 | 10 | 10 (100.0%) | 0 | 120 | 13.4 | X (NEUTER ETYMON) + ἀφ' οὗ + Y (DERIVED NOUN) (+1.00); X (MASCULINE ETYMON) + ἀφ' οὗ + Y (DERIVED NOUN) (+1.00); [NO ANTECEDENT] + ἀφ' οὗ + Y (DERIVED NOUN) (+0.90); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.64) |
| 6 | 7 | 5 (71.4%) | 0 | 217 | 10.3 | εἰς + «X» (GREEK LETTER) (+1.00); ὡς + X (AUTHOR NAME) (+0.36); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.31); X (nominative PROPER NOUN) ... + ἀπό + Y (genitive ETYMON) (+0.23) |
| 9 | 7 | 2 (28.6%) | 1 | 107 | 12.6 | ἐκάλειτο X (nominative PROPER NOUN)... + κέκληται Y (nominative PROPER NOUN) (+1.00); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.35); Χ (AUTHOR NAME) + ἐν + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.33); τὸ ἐθνικὸν + X (nominative ETHNONYM) (+0.32) |
Separation Checks
| Check | Rows | Positive class | Result | Interpretation |
| Kappa vs non-Kappa, broad formula slice |
1,516 |
320 Kappa |
0.739 balanced accuracy across folds [0.761, 0.758, 0.701, 0.708, 0.766] |
Useful as a warning flag, but confounded by recogniser coverage and rule history. |
| Kappa vs non-Kappa, complete formula slice |
370 |
320 Kappa |
0.554 balanced accuracy across folds [0.655, 0.492, 0.602, 0.608, 0.416] |
The near-baseline result is evidence against claiming a robust Kappa fingerprint from current formula hits alone. |
| Parisinus/non-epitomised comparison |
370 complete formula rows |
1 Parisinus |
descriptive only |
The current non-epitomised sample is too small for a serious classifier; use it as a qualitative control set. |
Coverage
Corpus Versions
| Version | Rows | Usable | With Greek text |
|---|
| epitome | 3,551 | 3,551 | 3,551 |
| parisinus | 19 | 19 | 19 |
Guidance Scan Counts
| Kind | Scan rows | Lemmas checked | Rules checked | Lemmas matched | Occurrences |
|---|
| formula | 46,220 | 1,517 | 32 | 1,442 | 8,099 |
| gloss | 35,262 | 3,570 | 70 | 728 | 1,445 |
| proper_noun | 57 | 1 | 57 | 1 | 2 |
Formula Coverage Distribution
| Formula rules checked | Lemmas |
|---|
| 0 | 2,053 |
| 17 | 1 |
| 21 | 1,136 |
| 29 | 10 |
| 32 | 370 |
Sentence Grammar Feature Tables
| Table | Rows |
|---|
| sentence_grammar_runs | 8 |
| sentence_grammar_evaluations | 111 |
| sentence_grammar_tokens | 495 |
Next Implementation Steps
- Freeze a coverage-balanced formula/gloss feature matrix and rerun the UMAP after every nightly guidance scan.
- Add non-recogniser stylometric baselines: character n-grams, function-word rates, particles, clause connectors, entry length, and normalized type-token measures.
- Populate the sentence-grammar tables over a coverage-balanced sample, then test morphosyntactic vectors separately from formula vectors.
- Expand the non-epitomised control set beyond the current Parisinus rows before making any claim about epitomiser layers.
- Validate clusters by close reading: each cluster needs formula examples and counterexamples before it becomes an argument.