Stylometric Fingerprinting

Last generated: 2026-06-25 22:20:50 AEST. Detector: translation_guidance_scan_v4.

This page tracks exploratory stylometric work for Stephanos, especially whether formula, gloss, grammar, and source-form features can expose different epitomising layers. The first implemented feature family is the existing translation-guidance recogniser output: each entry gets a vector of rule hits and non-hits, then the vectors are embedded and clustered.

3,570lemmas with at least one active guidance scan

1,516lemmas in the broad formula-vector UMAP slice

320Kappa entries in that UMAP slice

13Parisinus/non-epitomised entries in that UMAP slice

formula: 32, gloss: 70, proper_noun: 75active recogniser rules by kind

0.056KMeans silhouette on broad formula vectors

Current interpretation: the broad formula-vector slice shows an apparent Kappa signal, but the complete-formula slice does not. That means the first visible separation is probably mixed with scan-history and rule-coverage effects. Treat this as a progress marker, not as a demonstrated epitomiser fingerprint.

UMAP

The plotted vectors use formula recogniser rows only, with occurrence counts capped at 5 and transformed by log1p. The broad slice requires at least 21 of the 32 active formula rules to have been checked. Embedding method: UMAP.

Cluster Summary

Cluster	Rows	Kappa	Parisinus	Median entry	Mean matched rules	Top over-represented formulae
4	699	125 (17.9%)	1	73	2.8	διὰ τοῦ + «X» (GREEK LETTER) + [FORM OF γράφειν] (+0.00); Y (EPITHET) + X (DEITY) (+0.00)
1	215	32 (14.9%)	4	81	4.7	X (AUTHOR NAME) + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.74); Χ (AUTHOR NAME) + ἐν + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.45); X (AUTHOR NAME) + ἐν + Y (dative BOOK NAME) (+0.35); X (AUTHOR NAME) + Y (NUMERAL) (+0.06)
0	211	35 (16.6%)	3	80	6.4	X (nominative PROPER NOUN) ... + ἀπό + Y (genitive ETYMON) (+0.87); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.55); X (nominative PROPER NOUN)... + ἀπό + Y (genitive ARTICLE + genitive ETYMON) (+0.42); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.30)
7	178	47 (26.4%)	3	80	6.7	X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.78); ὡς + X (nominative ETYMON) + Y (nominative DERIVED NOUN) (+0.74); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.22); ὡς + X (AUTHOR NAME) (+0.21)
2	100	24 (24.0%)	0	78	8.0	ὡς + X (definite ARTICLE + ETYMON) + Y (nominative DERIVED NOUN) (+0.99); ὡς + X (nominative ETYMON) + Y (nominative DERIVED NOUN) (+0.64); X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.59); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.25)
3	67	22 (32.8%)	1	63	5.6	X... πλησίον + Y. (genitive) (+1.00); τὸ ἐθνικὸν + X (nominative ETHNONYM) (+0.10); X (nominative PROPER NOUN)... + ἀπό + Y (genitive ARTICLE + genitive ETYMON) (+0.01); X (AUTHOR NAME) + Y (NUMERAL) (+0.01)
8	22	18 (81.8%)	0	114	11.8	ὡς + X (AUTHOR NAME) + Y (dative NUMBER) + Z (genitive BOOK NAME) (+1.00); ὡς + X (AUTHOR NAME) (+0.71); X (AUTHOR NAME) + Y (NUMERAL) (+0.58); X (nominative) + ὡς + Y (nominative HOMOMORPH) (+0.37)
5	10	10 (100.0%)	0	120	13.4	X (NEUTER ETYMON) + ἀφ' οὗ + Y (DERIVED NOUN) (+1.00); X (MASCULINE ETYMON) + ἀφ' οὗ + Y (DERIVED NOUN) (+1.00); [NO ANTECEDENT] + ἀφ' οὗ + Y (DERIVED NOUN) (+0.90); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.64)
6	7	5 (71.4%)	0	217	10.3	εἰς + «X» (GREEK LETTER) (+1.00); ὡς + X (AUTHOR NAME) (+0.36); X (genitive ETYMON) + Y (nominative DERIVED NOUN) (+0.31); X (nominative PROPER NOUN) ... + ἀπό + Y (genitive ETYMON) (+0.23)
9	7	2 (28.6%)	1	107	12.6	ἐκάλειτο X (nominative PROPER NOUN)... + κέκληται Y (nominative PROPER NOUN) (+1.00); X (nominative DERIVED NOUN) + Y (nominative ETYMON) (+0.35); Χ (AUTHOR NAME) + ἐν + Y (dative NUMBER) + Z (genitive BOOK NAME) (+0.33); τὸ ἐθνικὸν + X (nominative ETHNONYM) (+0.32)

Separation Checks

Check	Rows	Positive class	Result	Interpretation
Kappa vs non-Kappa, broad formula slice	1,516	320 Kappa	0.739 balanced accuracy across folds [0.761, 0.758, 0.701, 0.708, 0.766]	Useful as a warning flag, but confounded by recogniser coverage and rule history.
Kappa vs non-Kappa, complete formula slice	370	320 Kappa	0.554 balanced accuracy across folds [0.655, 0.492, 0.602, 0.608, 0.416]	The near-baseline result is evidence against claiming a robust Kappa fingerprint from current formula hits alone.
Parisinus/non-epitomised comparison	370 complete formula rows	1 Parisinus	descriptive only	The current non-epitomised sample is too small for a serious classifier; use it as a qualitative control set.

Coverage

Corpus Versions

Version	Rows	Usable	With Greek text
epitome	3,551	3,551	3,551
parisinus	19	19	19

Guidance Scan Counts

Kind	Scan rows	Lemmas checked	Rules checked	Lemmas matched	Occurrences
formula	46,220	1,517	32	1,442	8,099
gloss	35,262	3,570	70	728	1,445
proper_noun	57	1	57	1	2

Formula Coverage Distribution

Formula rules checked	Lemmas
0	2,053
17	1
21	1,136
29	10
32	370

Sentence Grammar Feature Tables

Table	Rows
sentence_grammar_runs	8
sentence_grammar_evaluations	111
sentence_grammar_tokens	495

Next Implementation Steps

Freeze a coverage-balanced formula/gloss feature matrix and rerun the UMAP after every nightly guidance scan.
Add non-recogniser stylometric baselines: character n-grams, function-word rates, particles, clause connectors, entry length, and normalized type-token measures.
Populate the sentence-grammar tables over a coverage-balanced sample, then test morphosyntactic vectors separately from formula vectors.
Expand the non-epitomised control set beyond the current Parisinus rows before making any claim about epitomiser layers.
Validate clusters by close reading: each cluster needs formula examples and counterexamples before it becomes an argument.