Vocabulary Signatures

DB-backed vocabulary profiles over current public Greek source text. The current run uses the Meineke word-lemma index; printed-edition segment tests are exploratory controls and are not the hypothesized textual reduction units.

Current Run

3480indexed entries
95523tokens
40segments
12651feature rows
3616stored tests
40cluster rows

Run key: vocabulary-signatures-v1:meineke:lemma:w100. Completed: 2026-06-25 20:39:55.852289+10:00. Notes: run_daily_pipeline.sh word-lemma vocabulary signatures

Printed-Edition Control Profiles

Segment Letters Entries Tokens Types Hapax Entropy Top 10 Mass Zipf Slope
Printed volume 1 alpha-gamma 899 31002 6794 4568 6.310 35.3% -1.101
Printed volume 2 delta-iota 575 18246 4203 2560 6.199 33.9% -1.042
Printed volume 3 kappa-omicron 873 19491 4777 3339 6.167 35.4% -1.047
Printed volume 4 pi-upsilon 965 20239 5148 3644 6.159 36.5% -1.089
Printed volume 5 phi-omega 147 6046 1636 1005 5.731 35.1% -1.026

Core Feature Rates

Segment Feature Count Per 1k Tokens Entry Count
Printed volume 1 γάρ 191 6.161 140
Printed volume 1 δέ 900 29.030 380
Printed volume 1 ἔθνος 135 4.355 122
Printed volume 1 εἰς 183 5.903 143
Printed volume 1 ἐκ 134 4.322 104
Printed volume 1 ἐν 573 18.483 345
Printed volume 1 καί 1909 61.577 520
Printed volume 1 πλησίον 40 1.290 38
Printed volume 1 πόλις 762 24.579 560
Printed volume 1 πρός 116 3.742 98
Printed volume 1 χώρα 107 3.451 94
Printed volume 2 γάρ 74 4.056 59
Printed volume 2 δέ 536 29.376 193
Printed volume 2 ἔθνος 79 4.330 77
Printed volume 2 εἰς 133 7.289 71
Printed volume 2 ἐκ 76 4.165 55
Printed volume 2 ἐν 379 20.772 182
Printed volume 2 καί 1148 62.918 306
Printed volume 2 πλησίον 28 1.535 26
Printed volume 2 πόλις 432 23.676 329
Printed volume 2 πρός 71 3.891 53
Printed volume 2 χώρα 63 3.453 50
Printed volume 3 γάρ 72 3.694 59
Printed volume 3 δέ 424 21.754 245
Printed volume 3 ἔθνος 78 4.002 76
Printed volume 3 εἰς 81 4.156 65
Printed volume 3 ἐκ 51 2.617 46
Printed volume 3 ἐν 356 18.265 265
Printed volume 3 καί 1277 65.517 473
Printed volume 3 πλησίον 39 2.001 39
Printed volume 3 πόλις 649 33.297 541
Printed volume 3 πρός 65 3.335 63
Printed volume 3 χώρα 84 4.310 77
Printed volume 4 γάρ 59 2.915 54
Printed volume 4 δέ 548 27.076 287
Printed volume 4 ἔθνος 107 5.287 102
Printed volume 4 εἰς 131 6.473 95
Printed volume 4 ἐκ 60 2.965 53
Printed volume 4 ἐν 346 17.096 261
Printed volume 4 καί 1285 63.491 492
Printed volume 4 πλησίον 38 1.878 37
Printed volume 4 πόλις 696 34.389 607
Printed volume 4 πρός 63 3.113 57
Printed volume 4 χώρα 73 3.607 68
Printed volume 5 γάρ 28 4.631 24
Printed volume 5 δέ 183 30.268 81
Printed volume 5 ἔθνος 15 2.481 12
Printed volume 5 εἰς 36 5.954 30
Printed volume 5 ἐκ 31 5.127 22
Printed volume 5 ἐν 179 29.606 88
Printed volume 5 καί 374 61.859 109
Printed volume 5 πλησίον 7 1.158 7
Printed volume 5 πόλις 152 25.141 87
Printed volume 5 πρός 31 5.127 26
Printed volume 5 χώρα 27 4.466 19

Stored Statistical Tests

For feature tests, positive log2 effects mean the named feature is more frequent in the focal segment. Adjusted p-values are Benjamini-Hochberg corrections within each test family.

Comparison Feature Method log2 Effect Statistic p BH p Notes
Printed-volume heterogeneity δωδωνη chi2_contingency_feature_vs_other_tokens - 148.263 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity χαλκισ chi2_contingency_feature_vs_other_tokens - 145.068 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity δωροσ chi2_contingency_feature_vs_other_tokens - 144.070 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity εκαταιοσ chi2_contingency_feature_vs_other_tokens - 136.318 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 2 vs other printed-volume groups δωδωνη fisher_exact_feature_vs_other_tokens 6.018 80.116 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 2 vs other printed-volume groups δωροσ fisher_exact_feature_vs_other_tokens 5.980 78.003 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 3 vs other printed-volume groups εκαταιοσ fisher_exact_feature_vs_other_tokens 1.680 3.216 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity εθνικοσ chi2_contingency_feature_vs_other_tokens - 92.657 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity δωδωναιοσ chi2_contingency_feature_vs_other_tokens - 91.010 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 1 vs other printed-volume groups εκαταιοσ fisher_exact_feature_vs_other_tokens -1.918 0.260 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity πόλις chi2_contingency_feature_vs_other_tokens - 76.925 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 2 vs other printed-volume groups δωδωναιοσ fisher_exact_feature_vs_other_tokens 6.043 96.903 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity δωτιον chi2_contingency_feature_vs_other_tokens - 69.984 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity δωριευσ chi2_contingency_feature_vs_other_tokens - 68.852 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity δωρα chi2_contingency_feature_vs_other_tokens - 67.338 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity δυμαιοσ chi2_contingency_feature_vs_other_tokens - 64.770 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 5 vs other printed-volume groups χαλκισ fisher_exact_feature_vs_other_tokens 5.278 41.300 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity ωσ chi2_contingency_feature_vs_other_tokens - 59.085 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity ευρωπη chi2_contingency_feature_vs_other_tokens - 58.928 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity πολιτησ chi2_contingency_feature_vs_other_tokens - 58.719 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity επιδαμνοσ chi2_contingency_feature_vs_other_tokens - 57.438 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 2 vs other printed-volume groups δωτιον fisher_exact_feature_vs_other_tokens 5.698 75.817 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 2 vs other printed-volume groups δωρα fisher_exact_feature_vs_other_tokens 7.117 Infinity 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 2 vs other printed-volume groups δωριευσ fisher_exact_feature_vs_other_tokens 3.781 14.445 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity ουτε chi2_contingency_feature_vs_other_tokens - 53.018 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity χαλκιδευσ chi2_contingency_feature_vs_other_tokens - 52.334 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 2 vs other printed-volume groups δυμαιοσ fisher_exact_feature_vs_other_tokens 4.961 37.908 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 2 vs other printed-volume groups ωσ fisher_exact_feature_vs_other_tokens -0.585 0.660 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity χιοσ chi2_contingency_feature_vs_other_tokens - 48.233 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed-volume heterogeneity ασια chi2_contingency_feature_vs_other_tokens - 46.937 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 2 vs other printed-volume groups επιδαμνοσ fisher_exact_feature_vs_other_tokens 5.442 63.170 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 4 vs other printed-volume groups εθνικοσ fisher_exact_feature_vs_other_tokens 0.465 1.390 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 1 vs other printed-volume groups ευρωπη fisher_exact_feature_vs_other_tokens -1.981 0.246 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 3 vs other printed-volume groups μέν fisher_exact_feature_vs_other_tokens -1.934 0.253 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 4 vs other printed-volume groups πολιτησ fisher_exact_feature_vs_other_tokens 0.765 1.704 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed-volume heterogeneity ἐν chi2_contingency_feature_vs_other_tokens - 43.433 0.0000 0.0000 Tests whether the feature rate differs across printed-edition volume groups.
Printed volume 2 vs other printed-volume groups ουτε fisher_exact_feature_vs_other_tokens 4.032 17.899 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 1 vs other printed-volume groups ωσ fisher_exact_feature_vs_other_tokens 0.375 1.305 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 4 vs other printed-volume groups πόλις fisher_exact_feature_vs_other_tokens 0.367 1.299 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.
Printed volume 3 vs other printed-volume groups ασια fisher_exact_feature_vs_other_tokens 1.592 3.013 0.0000 0.0000 Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups.

Unsupervised Distance Checks

Jensen-Shannon distances compare selected vocabulary distributions; higher values indicate greater profile separation.

Comparison JSD
Printed volume 3 vs Printed volume 5 0.2553
Printed volume 2 vs Printed volume 5 0.2499
Printed volume 4 vs Printed volume 5 0.2438
Printed volume 1 vs Printed volume 5 0.2323
Printed volume 2 vs Printed volume 3 0.2084
Printed volume 2 vs Printed volume 4 0.2001
Printed volume 1 vs Printed volume 3 0.1774
Printed volume 3 vs Printed volume 4 0.1735
Printed volume 1 vs Printed volume 2 0.1733
Printed volume 1 vs Printed volume 4 0.1588

Sliding-Window Clustering

KMeans selected 2 clusters for sliding windows, silhouette 0.140. Treat this as a worklist generator, not proof of epitomiser identity.