DB-backed vocabulary profiles over current public Greek source text. The current run uses the Meineke word-lemma index; printed-edition segment tests are exploratory controls and are not the hypothesized textual reduction units.
Run key: vocabulary-signatures-v1:meineke:lemma:w100. Completed: 2026-06-25 20:39:55.852289+10:00. Notes: run_daily_pipeline.sh word-lemma vocabulary signatures
| Segment | Letters | Entries | Tokens | Types | Hapax | Entropy | Top 10 Mass | Zipf Slope |
|---|---|---|---|---|---|---|---|---|
| Printed volume 1 | alpha-gamma | 899 | 31002 | 6794 | 4568 | 6.310 | 35.3% | -1.101 |
| Printed volume 2 | delta-iota | 575 | 18246 | 4203 | 2560 | 6.199 | 33.9% | -1.042 |
| Printed volume 3 | kappa-omicron | 873 | 19491 | 4777 | 3339 | 6.167 | 35.4% | -1.047 |
| Printed volume 4 | pi-upsilon | 965 | 20239 | 5148 | 3644 | 6.159 | 36.5% | -1.089 |
| Printed volume 5 | phi-omega | 147 | 6046 | 1636 | 1005 | 5.731 | 35.1% | -1.026 |
| Segment | Feature | Count | Per 1k Tokens | Entry Count |
|---|---|---|---|---|
| Printed volume 1 | γάρ | 191 | 6.161 | 140 |
| Printed volume 1 | δέ | 900 | 29.030 | 380 |
| Printed volume 1 | ἔθνος | 135 | 4.355 | 122 |
| Printed volume 1 | εἰς | 183 | 5.903 | 143 |
| Printed volume 1 | ἐκ | 134 | 4.322 | 104 |
| Printed volume 1 | ἐν | 573 | 18.483 | 345 |
| Printed volume 1 | καί | 1909 | 61.577 | 520 |
| Printed volume 1 | πλησίον | 40 | 1.290 | 38 |
| Printed volume 1 | πόλις | 762 | 24.579 | 560 |
| Printed volume 1 | πρός | 116 | 3.742 | 98 |
| Printed volume 1 | χώρα | 107 | 3.451 | 94 |
| Printed volume 2 | γάρ | 74 | 4.056 | 59 |
| Printed volume 2 | δέ | 536 | 29.376 | 193 |
| Printed volume 2 | ἔθνος | 79 | 4.330 | 77 |
| Printed volume 2 | εἰς | 133 | 7.289 | 71 |
| Printed volume 2 | ἐκ | 76 | 4.165 | 55 |
| Printed volume 2 | ἐν | 379 | 20.772 | 182 |
| Printed volume 2 | καί | 1148 | 62.918 | 306 |
| Printed volume 2 | πλησίον | 28 | 1.535 | 26 |
| Printed volume 2 | πόλις | 432 | 23.676 | 329 |
| Printed volume 2 | πρός | 71 | 3.891 | 53 |
| Printed volume 2 | χώρα | 63 | 3.453 | 50 |
| Printed volume 3 | γάρ | 72 | 3.694 | 59 |
| Printed volume 3 | δέ | 424 | 21.754 | 245 |
| Printed volume 3 | ἔθνος | 78 | 4.002 | 76 |
| Printed volume 3 | εἰς | 81 | 4.156 | 65 |
| Printed volume 3 | ἐκ | 51 | 2.617 | 46 |
| Printed volume 3 | ἐν | 356 | 18.265 | 265 |
| Printed volume 3 | καί | 1277 | 65.517 | 473 |
| Printed volume 3 | πλησίον | 39 | 2.001 | 39 |
| Printed volume 3 | πόλις | 649 | 33.297 | 541 |
| Printed volume 3 | πρός | 65 | 3.335 | 63 |
| Printed volume 3 | χώρα | 84 | 4.310 | 77 |
| Printed volume 4 | γάρ | 59 | 2.915 | 54 |
| Printed volume 4 | δέ | 548 | 27.076 | 287 |
| Printed volume 4 | ἔθνος | 107 | 5.287 | 102 |
| Printed volume 4 | εἰς | 131 | 6.473 | 95 |
| Printed volume 4 | ἐκ | 60 | 2.965 | 53 |
| Printed volume 4 | ἐν | 346 | 17.096 | 261 |
| Printed volume 4 | καί | 1285 | 63.491 | 492 |
| Printed volume 4 | πλησίον | 38 | 1.878 | 37 |
| Printed volume 4 | πόλις | 696 | 34.389 | 607 |
| Printed volume 4 | πρός | 63 | 3.113 | 57 |
| Printed volume 4 | χώρα | 73 | 3.607 | 68 |
| Printed volume 5 | γάρ | 28 | 4.631 | 24 |
| Printed volume 5 | δέ | 183 | 30.268 | 81 |
| Printed volume 5 | ἔθνος | 15 | 2.481 | 12 |
| Printed volume 5 | εἰς | 36 | 5.954 | 30 |
| Printed volume 5 | ἐκ | 31 | 5.127 | 22 |
| Printed volume 5 | ἐν | 179 | 29.606 | 88 |
| Printed volume 5 | καί | 374 | 61.859 | 109 |
| Printed volume 5 | πλησίον | 7 | 1.158 | 7 |
| Printed volume 5 | πόλις | 152 | 25.141 | 87 |
| Printed volume 5 | πρός | 31 | 5.127 | 26 |
| Printed volume 5 | χώρα | 27 | 4.466 | 19 |
For feature tests, positive log2 effects mean the named feature is more frequent in the focal segment. Adjusted p-values are Benjamini-Hochberg corrections within each test family.
| Comparison | Feature | Method | log2 Effect | Statistic | p | BH p | Notes |
|---|---|---|---|---|---|---|---|
| Printed-volume heterogeneity | δωδωνη | chi2_contingency_feature_vs_other_tokens | - | 148.263 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | χαλκισ | chi2_contingency_feature_vs_other_tokens | - | 145.068 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | δωροσ | chi2_contingency_feature_vs_other_tokens | - | 144.070 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | εκαταιοσ | chi2_contingency_feature_vs_other_tokens | - | 136.318 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 2 vs other printed-volume groups | δωδωνη | fisher_exact_feature_vs_other_tokens | 6.018 | 80.116 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 2 vs other printed-volume groups | δωροσ | fisher_exact_feature_vs_other_tokens | 5.980 | 78.003 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 3 vs other printed-volume groups | εκαταιοσ | fisher_exact_feature_vs_other_tokens | 1.680 | 3.216 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | εθνικοσ | chi2_contingency_feature_vs_other_tokens | - | 92.657 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | δωδωναιοσ | chi2_contingency_feature_vs_other_tokens | - | 91.010 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 1 vs other printed-volume groups | εκαταιοσ | fisher_exact_feature_vs_other_tokens | -1.918 | 0.260 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | πόλις | chi2_contingency_feature_vs_other_tokens | - | 76.925 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 2 vs other printed-volume groups | δωδωναιοσ | fisher_exact_feature_vs_other_tokens | 6.043 | 96.903 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | δωτιον | chi2_contingency_feature_vs_other_tokens | - | 69.984 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | δωριευσ | chi2_contingency_feature_vs_other_tokens | - | 68.852 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | δωρα | chi2_contingency_feature_vs_other_tokens | - | 67.338 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | δυμαιοσ | chi2_contingency_feature_vs_other_tokens | - | 64.770 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 5 vs other printed-volume groups | χαλκισ | fisher_exact_feature_vs_other_tokens | 5.278 | 41.300 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | ωσ | chi2_contingency_feature_vs_other_tokens | - | 59.085 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | ευρωπη | chi2_contingency_feature_vs_other_tokens | - | 58.928 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | πολιτησ | chi2_contingency_feature_vs_other_tokens | - | 58.719 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | επιδαμνοσ | chi2_contingency_feature_vs_other_tokens | - | 57.438 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 2 vs other printed-volume groups | δωτιον | fisher_exact_feature_vs_other_tokens | 5.698 | 75.817 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 2 vs other printed-volume groups | δωρα | fisher_exact_feature_vs_other_tokens | 7.117 | Infinity | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 2 vs other printed-volume groups | δωριευσ | fisher_exact_feature_vs_other_tokens | 3.781 | 14.445 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | ουτε | chi2_contingency_feature_vs_other_tokens | - | 53.018 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | χαλκιδευσ | chi2_contingency_feature_vs_other_tokens | - | 52.334 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 2 vs other printed-volume groups | δυμαιοσ | fisher_exact_feature_vs_other_tokens | 4.961 | 37.908 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 2 vs other printed-volume groups | ωσ | fisher_exact_feature_vs_other_tokens | -0.585 | 0.660 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | χιοσ | chi2_contingency_feature_vs_other_tokens | - | 48.233 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed-volume heterogeneity | ασια | chi2_contingency_feature_vs_other_tokens | - | 46.937 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 2 vs other printed-volume groups | επιδαμνοσ | fisher_exact_feature_vs_other_tokens | 5.442 | 63.170 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 4 vs other printed-volume groups | εθνικοσ | fisher_exact_feature_vs_other_tokens | 0.465 | 1.390 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 1 vs other printed-volume groups | ευρωπη | fisher_exact_feature_vs_other_tokens | -1.981 | 0.246 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 3 vs other printed-volume groups | μέν | fisher_exact_feature_vs_other_tokens | -1.934 | 0.253 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 4 vs other printed-volume groups | πολιτησ | fisher_exact_feature_vs_other_tokens | 0.765 | 1.704 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed-volume heterogeneity | ἐν | chi2_contingency_feature_vs_other_tokens | - | 43.433 | 0.0000 | 0.0000 | Tests whether the feature rate differs across printed-edition volume groups. |
| Printed volume 2 vs other printed-volume groups | ουτε | fisher_exact_feature_vs_other_tokens | 4.032 | 17.899 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 1 vs other printed-volume groups | ωσ | fisher_exact_feature_vs_other_tokens | 0.375 | 1.305 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 4 vs other printed-volume groups | πόλις | fisher_exact_feature_vs_other_tokens | 0.367 | 1.299 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
| Printed volume 3 vs other printed-volume groups | ασια | fisher_exact_feature_vs_other_tokens | 1.592 | 3.013 | 0.0000 | 0.0000 | Positive effect means the feature is more frequent in this printed-volume group than in the other indexed groups. |
Jensen-Shannon distances compare selected vocabulary distributions; higher values indicate greater profile separation.
| Comparison | JSD |
|---|---|
| Printed volume 3 vs Printed volume 5 | 0.2553 |
| Printed volume 2 vs Printed volume 5 | 0.2499 |
| Printed volume 4 vs Printed volume 5 | 0.2438 |
| Printed volume 1 vs Printed volume 5 | 0.2323 |
| Printed volume 2 vs Printed volume 3 | 0.2084 |
| Printed volume 2 vs Printed volume 4 | 0.2001 |
| Printed volume 1 vs Printed volume 3 | 0.1774 |
| Printed volume 3 vs Printed volume 4 | 0.1735 |
| Printed volume 1 vs Printed volume 2 | 0.1733 |
| Printed volume 1 vs Printed volume 4 | 0.1588 |
KMeans selected 2 clusters for sliding windows, silhouette 0.140. Treat this as a worklist generator, not proof of epitomiser identity.