legacy_scholarly v2 Evaluation

Generated: 2026-06-25 11:15:11 UTC

Pairs101
Distinct lemmas101
Slope0.888
R^20.981
BLEU-451.9%
chrF++72.4%
METEOR77.2%
ROUGE-L77.9%
BERTScore96.3%
COMET79.9%
BLEURT76.0%

Profile version id: 274

First translation using this version: 2026-06-07

Total usable AI runs for this version: 102 runs across 102 lemmas

Created: 2026-03-06 21:55:13.641826+11:00

Active:

Notes: Reviewed house-style prompt based on 20 human-reviewed translations, 20 AI comparison translations, and reviewer guidance from Brady, Greta, and Gabriel. Activated 2026-03-06.

Prompt length scatter plot

Length Regression

FormulaAI words = intercept + slope * human words
Slope0.888
Intercept2.968
R^20.981
P-value3.98e-87
Pearson r0.991
Mean source words30.248
Length fallback rows0
Mean human words43.980
Mean AI words42.020
Mean absolute length residual3.057
Mean absolute percent error6.8%

Metric vs Passage Length

These plots test whether score changes with source passage length for this prompt version. The dashed line is an ordinary least-squares fit; the annotation reports n, r, R^2, p-value, and slope.

BLEU-4 vs passage length
BLEU-4: significant negative; r = -0.431, R^2 = 0.186, p = 6.91e-06.
chrF++ vs passage length
chrF++: significant negative; r = -0.447, R^2 = 0.200, p = 2.75e-06.
METEOR vs passage length
METEOR: significant negative; r = -0.442, R^2 = 0.196, p = 3.63e-06.
ROUGE-L vs passage length
ROUGE-L: significant negative; r = -0.441, R^2 = 0.195, p = 3.91e-06.
BERTScore vs passage length
BERTScore: significant negative; r = -0.595, R^2 = 0.354, p = 5.43e-11.
COMET vs passage length
COMET: significant negative; r = -0.469, R^2 = 0.220, p = 7.34e-07.
BLEURT vs passage length
BLEURT: significant negative; r = -0.712, R^2 = 0.506, p = 7.41e-17.
Trigram precision vs passage length
Trigram precision: significant negative; r = -0.287, R^2 = 0.082, p = 0.0037.
Trigram recall vs passage length
Trigram recall: significant negative; r = -0.316, R^2 = 0.100, p = 0.0013.
Trigram F1 vs passage length
Trigram F1: significant negative; r = -0.303, R^2 = 0.092, p = 0.0021.
Trigram Jaccard vs passage length
Trigram Jaccard: significant negative; r = -0.311, R^2 = 0.097, p = 0.0015.
Metric Rows Pattern Pearson r R^2 P-value Slope Status
BLEU-4 101 significant negative -0.431 0.186 6.91e-06 -0.00289 ok
chrF++ 101 significant negative -0.447 0.200 2.75e-06 -0.00190 ok
METEOR 101 significant negative -0.442 0.196 3.63e-06 -0.00192 ok
ROUGE-L 101 significant negative -0.441 0.195 3.91e-06 -0.00170 ok
BERTScore 101 significant negative -0.595 0.354 5.43e-11 -0.00046 ok
COMET 101 significant negative -0.469 0.220 7.34e-07 -0.00091 ok
BLEURT 101 significant negative -0.712 0.506 7.41e-17 -0.00260 ok
Trigram precision 101 significant negative -0.287 0.082 0.0037 -0.00208 ok
Trigram recall 101 significant negative -0.316 0.100 0.0013 -0.00229 ok
Trigram F1 101 significant negative -0.303 0.092 0.0021 -0.00219 ok
Trigram Jaccard 101 significant negative -0.311 0.097 0.0015 -0.00218 ok

Translation Similarity Metrics

Mean BLEU-451.9%
Mean chrF++72.4%
Mean METEOR77.2%
Mean ROUGE-L77.9%
Mean BERTScore96.3%
Mean COMET79.9%
Mean BLEURT76.0%
Legacy smoothed corpus BLEU48.5%
Unigram F178.4%
Bigram F155.6%
Trigram precision43.2%
Trigram recall41.2%
Trigram F142.2%
Trigram Jaccard26.8%
4-gram F132.8%

Largest Length Residuals

Lemma ID Headword Human words AI words Residual BLEU-4 chrF++ METEOR ROUGE-L BERTScore COMET BLEURT Human Run Model
2484 Καρία 270 223 -19.710 33.1% 60.6% 56.7% 64.9% 91.2% 80.1% 55.0% reviewed/approved 3593 gpt-5.5
2604 Καρχηδών 117 125 18.144 35.8% 65.1% 78.3% 72.1% 94.2% 78.5% 63.2% reviewed/approved 3597 gpt-5.5
2468 Καπετώλιον 138 111 -14.503 17.2% 40.7% 43.9% 51.4% 87.7% 69.8% 50.5% reviewed/approved 3601 gpt-5.5
2328 Καλλίπολις 43 55 13.851 36.8% 63.4% 67.9% 63.3% 94.1% 75.2% 68.6% reviewed/approved 3841 gpt-5.5
2603 Κάρυστος 175 171 12.644 41.0% 65.0% 73.0% 68.2% 94.8% 78.8% 61.7% reviewed/approved 3595 gpt-5.5
2609 Κάσπειρος 135 134 11.161 44.0% 66.9% 55.2% 63.9% 89.9% 72.1% 62.9% reviewed/approved 3599 gpt-5.5
2346 Κάναι 77 82 10.661 62.5% 76.2% 89.4% 84.3% 95.6% 78.7% 71.7% reviewed/approved 3861 gpt-5.5
2342 Κάμιρος 54 60 9.083 42.6% 72.0% 69.8% 75.4% 96.0% 79.8% 68.3% reviewed/approved 3857 gpt-5.5
7259 Κύφος 54 60 9.083 49.9% 76.4% 84.2% 78.9% 96.3% 75.5% 69.9% reviewed/approved 4138 gpt-5.5
2116 Κάληρος 40 30 -8.486 17.3% 49.4% 53.6% 54.3% 92.9% 67.5% 62.4% reviewed/approved 3826 gpt-5.5
2455 Κάναστρον 60 48 -8.244 20.3% 51.5% 51.2% 63.0% 93.6% 68.1% 54.7% reviewed/approved 3863 gpt-5.5
2470 Καππαδοκία 68 71 7.652 23.1% 57.9% 82.9% 80.6% 95.2% 76.0% 71.5% reviewed/approved 4063 gpt-5.5
3496 Κοτιάειον 71 59 -7.012 30.3% 51.3% 57.3% 63.1% 92.7% 71.1% 67.0% reviewed/approved 4098 gpt-5.5
2605 Κάσιον 56 59 6.307 51.9% 75.2% 83.1% 78.3% 97.0% 79.1% 76.3% reviewed/approved 4071 gpt-5.5
7255 Κυτέριον 22 28 5.497 43.7% 69.3% 65.0% 68.0% 95.3% 77.1% 77.3% reviewed/approved 4130 gpt-5.5
2119 Κάλλατις 65 66 5.316 37.8% 63.9% 74.3% 68.7% 95.2% 72.1% 69.9% reviewed/approved 3833 gpt-5.5
2083 Καλαμένθη 24 19 -5.279 32.6% 53.7% 51.0% 60.5% 93.6% 77.2% 74.6% reviewed/approved 3810 gpt-5.5
7249 Κύρρος 45 38 -4.925 45.8% 60.1% 67.4% 69.9% 93.8% 72.6% 63.7% reviewed/approved 4119 gpt-5.5
2056 Καβειρία 112 98 -4.417 33.0% 70.8% 61.0% 67.6% 95.9% 78.0% 64.6% reviewed/approved 3605 gpt-5.5
7253 Κυρτώνιος 18 23 4.049 39.8% 69.1% 72.7% 63.4% 96.6% 77.5% 80.2% reviewed/approved 4126 gpt-5.5
7247 Κύρνος 50 44 -3.365 39.6% 65.8% 71.3% 72.3% 95.9% 78.5% 80.2% reviewed/approved 4115 gpt-5.5
2326 Καλλίαρος 47 48 3.299 60.1% 78.7% 88.9% 86.3% 97.9% 78.1% 77.9% reviewed/approved 3837 gpt-5.5
7260 Κυχρεῖος πάγος 63 62 3.092 40.7% 63.8% 72.6% 60.8% 95.6% 75.9% 58.7% reviewed/approved 4140 gpt-5.5
3530 Κριώα 26 23 -3.055 42.4% 70.3% 76.2% 77.6% 97.1% 78.0% 76.2% reviewed/approved 4104 gpt-5.5
2114 Κάλβιος 18 16 -2.951 19.5% 52.4% 64.5% 64.7% 94.5% 74.6% 67.0% reviewed/approved 3821 gpt-5.5
2327 Καλλιόπη 19 17 -2.839 61.3% 77.9% 75.7% 77.8% 95.3% 76.9% 77.1% reviewed/approved 3839 gpt-5.5
2599 Καρπήσιοι 11 10 -2.736 30.6% 67.0% 68.8% 66.7% 93.4% 85.2% 79.1% reviewed/approved 4067 gpt-5.5
2624 Κατάβαθμος 20 18 -2.727 71.6% 82.5% 81.4% 84.2% 96.8% 82.9% 82.4% reviewed/approved 4084 gpt-5.5
7246 Κύρις 20 18 -2.727 77.3% 84.7% 90.4% 89.5% 97.9% 84.4% 81.4% reviewed/approved 4113 gpt-5.5
2079 Καισάρεια 39 35 -2.598 45.3% 67.7% 77.2% 81.1% 94.9% 78.5% 73.7% reviewed/approved 3802 gpt-5.5

Residual Predictors

Positive signed-residual terms are associated with AI translations longer than expected from the human length; negative signed-residual terms are associated with shorter AI translations. Absolute-residual models identify terms associated with larger AI-human length divergence.

Greek source terms predicting signed length residuals

Features: 176; ridge alpha: 4.125; cross-validated R^2: -0.265.

Positive termCoefficientDocsMean presentMean absent
απο 1.7088 27 1.992 -0.727
τε 1.6162 5 4.680 -0.244
οι 1.5891 11 3.775 -0.461
τη 1.4173 15 2.656 -0.463
πολιχνιον 1.3671 3 9.943 -0.304
πολις 1.3604 69 0.667 -1.439
δε και 1.3517 8 1.840 -0.158
εν τη 1.3036 7 4.729 -0.352
παιδος 1.2000 4 5.603 -0.231
δυο 1.1746 4 6.098 -0.251
ου 1.1551 14 3.068 -0.494
εν 1.1141 37 1.118 -0.646
εστι δε και 1.0800 3 6.784 -0.208
εκαλειτο δε 0.9943 3 3.693 -0.113
εστι δε 0.9810 5 3.628 -0.189
αφ ου 0.9328 5 4.087 -0.213
εκ 0.9300 6 3.846 -0.243
αφ 0.9162 6 3.784 -0.239
και πολις 0.8939 5 3.347 -0.174
τον 0.8576 10 2.742 -0.301
Negative termCoefficientDocsMean presentMean absent
το -1.4657 71 -0.138 0.326
δια -1.3995 16 0.106 -0.020
δια του -1.3838 10 -2.187 0.240
γαρ -1.3746 9 -0.418 0.041
οτι -1.0847 5 -1.608 0.084
ηρωδιανος -1.0172 3 -12.201 0.374
εθνος -0.9661 8 -3.932 0.338
θρακης -0.8999 5 -3.115 0.162
τινες -0.8844 6 -3.149 0.199
περι -0.8307 11 -2.492 0.305
οικητωρ -0.7355 6 -4.047 0.256
την -0.7102 13 -0.590 0.087
το θηλυκον -0.6734 6 -3.410 0.215
εκαταιος -0.6338 14 -0.925 0.149
του -0.6093 37 0.732 -0.423
το εθνικον -0.6079 57 -0.370 0.479
εθνικον -0.6079 57 -0.370 0.479
ει -0.5722 4 -5.188 0.214
εστι και -0.5386 20 -0.161 0.040
τα -0.5383 12 -0.728 0.098

Greek source terms predicting large absolute residuals

Features: 176; ridge alpha: 4.125; cross-validated R^2: -0.506.

Positive termCoefficientDocsMean presentMean absent
γαρ 1.6001 9 7.659 2.607
εκαλειτο 1.5430 7 9.060 2.610
δια 1.4589 16 6.454 2.418
δε 1.3615 46 4.091 2.193
του 1.3145 37 4.515 2.215
τα 1.3019 12 7.898 2.404
εκαλειτο δε 1.2684 3 16.832 2.635
και 1.2489 66 3.645 1.948
απο 1.1729 27 5.377 2.210
τη 1.1435 15 6.152 2.517
εν 1.1409 37 3.892 2.575
ος 1.1089 6 8.808 2.694
δια του 1.0538 10 7.287 2.592
τω 0.9949 17 5.813 2.499
περι 0.9606 11 6.539 2.632
πολιχνιον 0.9461 3 9.943 2.846
εκαλειτο δε και 0.8672 3 16.832 2.635
απο της 0.8494 6 9.368 2.659
δε και 0.8303 8 7.417 2.682
ζευς 0.8009 3 14.958 2.693
Negative termCoefficientDocsMean presentMean absent
το εθνικον -0.8650 57 3.057 3.058
εθνικον -0.8650 57 3.057 3.058
νησος -0.7495 9 1.393 3.220
προς -0.7154 15 2.542 3.147
και το -0.6923 9 3.019 3.061
ως -0.6801 53 3.192 2.909
μια -0.5684 3 0.492 3.136
δευτερω -0.5594 3 0.917 3.123
προς τη -0.5576 4 1.719 3.112
πλησιον -0.4827 8 2.392 3.114
αυτην -0.4820 9 2.621 3.100
τους -0.4794 4 1.564 3.119
καππαδοκιας -0.4715 3 1.855 3.094
του δε -0.4606 3 1.070 3.118
κωμη -0.4524 3 1.980 3.090
το -0.4294 71 2.862 3.518
πορρω -0.3884 3 0.457 3.137
ου πορρω -0.3884 3 0.457 3.137
εκαταιος -0.3879 14 2.082 3.214
βοιωτιας -0.3744 3 1.944 3.091

AI English terms predicting large absolute residuals

Features: 300; ridge alpha: 2.894; cross-validated R^2: -0.465.

Positive termCoefficientDocsMean presentMean absent
with 2.1977 16 6.446 2.419
and 1.9635 53 4.071 1.938
by 1.9489 15 7.181 2.338
on 1.7969 12 6.088 2.648
was 1.7720 16 6.595 2.391
the 1.7295 92 3.140 2.211
on the 1.5651 4 9.518 2.791
was called 1.4138 5 10.336 2.678
there 1.4096 36 4.635 2.183
for 1.3269 13 5.699 2.667
there is 1.2676 27 4.704 2.456
all 1.0641 5 8.091 2.795
among 1.0474 6 6.857 2.817
diphthong 1.0456 3 11.599 2.796
zeus 1.0247 5 12.765 2.551
son 0.9801 16 6.367 2.434
son of 0.9801 16 6.367 2.434
it with 0.9732 4 10.611 2.746
called 0.8847 28 4.896 2.352
form 0.8606 7 7.021 2.762
Negative termCoefficientDocsMean presentMean absent
as in -1.1427 32 2.353 3.383
the ethnonym is -0.9974 53 2.874 3.259
ethnonym is -0.9974 53 2.874 3.259
the ethnonym -0.9185 59 3.018 3.112
ethnonym -0.9185 59 3.018 3.112
island -0.8888 14 1.798 3.260
one -0.8149 10 1.899 3.184
name -0.7975 6 1.115 3.180
near -0.7654 13 2.655 3.117
been -0.7156 6 1.704 3.143
a city -0.7034 65 3.152 2.886
an island -0.6995 10 1.359 3.244
is a -0.6207 27 2.989 3.082
an -0.5917 24 2.632 3.190
one of the -0.5641 3 0.492 3.136
one of -0.5641 3 0.492 3.136
has been -0.5179 4 1.830 3.108
cappadocia -0.4815 3 1.855 3.094
those who -0.4797 3 0.628 3.131
if -0.4694 4 1.155 3.136

Human English terms predicting large absolute residuals

Features: 336; ridge alpha: 4.125; cross-validated R^2: -0.506.

Positive termCoefficientDocsMean presentMean absent
with 1.9561 16 6.294 2.448
the 1.7727 91 3.155 2.163
on 1.7441 16 6.707 2.370
and 1.3374 52 4.032 2.023
is with 1.1467 4 13.827 2.613
on the 1.0045 10 6.926 2.632
to 0.8996 31 4.404 2.461
used to be 0.8599 6 10.302 2.600
this 0.7732 15 6.768 2.410
be 0.7682 15 7.146 2.344
karia 0.7447 3 11.616 2.795
to be 0.7359 9 7.917 2.582
son 0.7336 17 6.252 2.411
son of 0.7336 17 6.252 2.411
zeus 0.7265 5 12.765 2.551
his on 0.7074 3 11.711 2.792
used 0.7049 8 8.479 2.591
used to 0.7049 8 8.479 2.591
their 0.6934 3 16.832 2.635
situated 0.6814 4 9.560 2.789
Negative termCoefficientDocsMean presentMean absent
as in -1.2574 29 2.135 3.428
the ethnonym is -0.8600 52 2.856 3.270
ethnonym is -0.8600 52 2.856 3.270
ethnonym -0.7462 57 3.056 3.059
an -0.7418 22 2.370 3.248
the ethnonym -0.7350 56 3.070 3.041
island -0.5608 14 1.798 3.260
one of -0.5179 5 0.855 3.172
a city -0.5041 65 3.152 2.886
they -0.4689 12 3.034 3.060
was from -0.4522 4 1.034 3.141
one of the -0.4453 4 0.950 3.144
an island -0.4390 9 1.445 3.215
is a -0.4203 31 3.489 2.866
near -0.4033 16 3.173 3.035
city -0.3977 71 3.229 2.650
and the -0.3972 15 3.537 2.973
it is -0.3671 24 3.989 2.767
of his -0.3470 19 2.922 3.088
in book of -0.3357 15 2.102 3.224

Prompt Text

Show prompt text
You are an expert classical philologist and translator specialising in Stephanos of Byzantium's Ethnika.

Goal
- Produce a clear, scholarly English translation in the established reviewed house style.

Output rules (required)
- Respond ONLY by calling the submit_translation tool with a single string field: {"translation": "..."}.
- The translation text must contain only the translation (no analysis, no commentary, no multiple options).

A) Formatting + spelling
- Use Australasian spelling and punctuation conventions.
- Preserve paragraphing/line-breaks of the Greek source:
  - Do not introduce new paragraphs unless the Greek has them.
  - If the Greek includes a poetic quotation with line breaks, format the English quotation with the same line breaks.
- Use single quotes for quoted forms/snippets: '...'. Avoid double quotes.
- Use *italics* (asterisks) for titles of ancient works: e.g., *Cypriaka*.

B) Opening / structure
- Begin with the headword transliterated into Latin letters (no Greek diacritics), then a short definition.
  - Typically: Headword: ...
  - Appositive openings like 'Karia, the country.' are acceptable when the entry is of that type.
- Keep enumerations of homonymous places as inline numbered items like (2) ... (3) ..., matching the Greek's structure.

C) Transliteration + naming
- Do NOT use macrons/acute accents in transliteration: Karystos (not Kárystos), Kaspeiros (not Káspeiros).
- Prefer Greek-form transliteration with kappa = k; avoid Latinised exonyms when Stephanos is discussing the Greek form:
  - Kapetolion (not Capitolium)
  - Karchedon (not Carthage)
  - Chalkedon (not Chalcedon)
- Use conventional English names for major places/regions when standard (Rome, Cyprus, Egypt, Syria, India).
- Translate Πόντος as 'the Black Sea' when it is clearly the sea/region reference.

D) Citations (authors/works/books)
- Convert Greek book numerals to Arabic digits.
- Keep citations compact, mirroring the source's incompleteness:
  - Author + work title: 'Hellanikos in his *Cypriaka*'
  - Author + book: 'Strabo, book 12: ...' / 'Herodotus, book 3.'
  - Author + work + book: 'Dionysios in book 3 of his *Bassarika*: ...'
- Ignore modern/editorial locator codes and apparatus-like add-ons:
  - omit RE/SH numbers, (GG ...), (FGrHist ...), chapter/section locators like (12.8.12), [C 576.21], Il. 2.676, etc.
  - keep only the author/work/book level that Stephanos is using.

E) Fixed formulae (be consistent)
- τὸ ἐθνικόν X -> 'The ethnonym is 'X'.'
- ὁ πολίτης X -> 'A citizen is a 'X'.'
- τὸ θηλυκόν / καὶ θηλυκόν X -> 'In the feminine 'X'.'
- Keep (as needed): 'The possessive is '...'.' / 'An inhabitant is a '...'.' / 'A deme-member is a '...'.'
- For τὰ τοπικά (place-adverb forms), use 'The locatives are ...' and list forms in single quotes.

F) Philological/orthographic discussion
- When the entry discusses spelling/letters/diphthongs, preserve Greek letters in Greek script (e.g., ι, ει, οι).
- Prefer transliteration (not full Greek script) for cited alternative spellings unless the point requires showing a specific Greek letter/feature.

G) Comparisons / derivational analogies (ὡς + X)
- For ὡς + X where X is an etymologically related noun in the nominative, use this fixed English pattern:
  - (as in 'X')
- Keep it exactly: parentheses + single quotes (no 'like X', no bare 'as X').

H) ἀφ' οὗ
- If it has a masculine antecedent referring to a person (typically the eponym) in a naming context (implicit/explicit καλεῖται/ἐκαλεῖτο/κέκληται), translate as 'after whom' (i.e., 'named after whom').
- If it has an identifiable explicit neuter antecedent, translate as 'from which'.
- If it has a dropped neuter antecedent and functions adverbially, handle cautiously:
  - often 'hence' / 'thence'
  - in Stephanos' idiosyncratic 'example-marker' usage, 'as per' can be acceptable when it clearly introduces an ethnonym/person-as-example.

I) Morphology scaffolding (only when needed)
- If Stephanos is citing an author specifically to indicate grammatical case/gender/number, and that feature would otherwise be unclear in English, add brief tags after the relevant form:
  - case: (nom.), (acc.), (gen.), (dat.), (voc.)
  - gender: (m.), (f.), (n.)
  - number: (sing.), (pl.)

J) Fidelity + restraint
- Translate directly; do not add background explanation or modern bibliography.
- Avoid gratuitous adversatives ('however') for δέ when it is merely continuative.
- Do not add glosses like (synoikia), (kalathos) unless Stephanos explicitly defines a term.
- Preserve uncertainty when the Greek is uncertain/corrupt; do not invent.

Now translate the provided entry accordingly, and submit it via submit_translation.