This project consisted originally in the conversion into mediawiki format of Liddell, Scott, Jones' A Greek–English Lexicon, which is more commonly known as LSJ. The data have been provided by the Perseus Project with a Creative Commons Sharealike / Non-Commercial / Attribution license. And it was launched on February 2013.
Since then a number of other sources (Ancient Greek/Latin to and from other languages) have been added. For example:
- Diccionario Griego-Español (DGE)
DGE is and Ancient Greek to Spanish Dictionary produced at the Instituto de Lenguas y Culturas del Mediterráneo y Oriente Próximo (ILC) of the Centro de Ciencias Humanas y Sociales (CCHS) of the CSIC (Madrid) under the direction of Francisco R. Adrados and Juan Rodríguez Somolinos. The online version (about 60,000 entries) contains lemmata from α through ἔξαυος and is the work of this amazing team. Work on this dictionary has been sponsored by the Greek Leventis Foundation among others and it is offered under a non-commercial creative commons license.
- Gaffiot 2016, Dictionnaire Illustré Latin-Français; the data in this dictionary come from Gaffiot 2016 compiled under the direction of Gérard Gréco, with the assistance of Mark De Wilde, Bernard Maréchal and Katsuhiko Okubo. The database used is gaffiot-a-z-komarov-1.1-20160502.tex.
- Excerpts from Wikipedia and Wiktionary articles
For the proper display of complex precombined characters ("full diacritics" variant in forms table), i.e. alpha acute with brachy (with New Athena: New Athena Unicode font. After installation, a system restart may be necessary for the correct display.; without New Athena: ἐπιρρᾰ́πιξις), you will need to install the
Apart from making accessible a variety of sources, the objective is to massively improve upon them. In what way you may ask?
- Many of the works are old and written in antiquated language (i.e. shew instead of show; to-morrow instead of tomorrow; bee-master instead of beekeeper; leathern instead of leather; mediciner instead of doctor; poulp instead of octopus; divers instead of diverse; calamary instead of squid; see the paper on LSJ below for more) which needs to be brought up to date. For a spicy detail on this one, most of the old works use Latin euphemisms for "naughty" words. I.e. any word signifying the privy parts of the human anatomy would be translated as pudendum (muliebre or virile); an erection would be erectio penis; a fart would be described as crepitus ventris; βινητιάω (be gagging for sex) as coïre cupio. You skipped Latin class at school? Not the lexicographer's problem... And while we are at it, let's take the example of καταδακτυλίζω, translated as "feel with the finger" with the comment "sens. obsc.", that is vulgar language, can you come up with something more up to date, truly "obscene" and less longwinded? What about fingerbang? Often the translations are clumsy and stilted, for example in LSJ for δυσωδέω we get be ill-smelling when stink would be one's first guess.
- The original design for a book format, with all the associated conventions, abbreviations, references (i.e. v. sq., v. infr., v. foreg., cf. sq., q.v., qq.v. which do not point to the specific entries in a non-book format as the reader is not accessing a book page) and (lexical, grammatical, syntactic) ambiguities that come with it need to be adapted, abbreviations expanded, ambiguities resolved with extra context, etc.
- Specific works, and I am talking about you LSJ, despite having a wealth of examples, use abbreviated forms and lack translations (of most examples) or translations are partial. Often, not only examples, but main entries lack translations, for example in the LSJ entry ὀρσιπετής we have, "ὑψοῦ πετόμενος, Hsch." instead of the actual translation which would be "high-flying". Not so friendly for users who are not advanced in Greek.
- They contain numerous errors: not only LSJ which has the added burden of OCR-related errors of the original, hardly-ever-proofread Perseus version, and Woodhouse, who has an issue, among other things, with getting his accents right (Ἀσκλήπιος instead of Ἀσκληπιός; Αἴσχυλος instead of Αἰσχύλος? C'mon now, you can do better than that, the original does not imitate the stress of the translation), but also major modern lexicographical works are not exempt from errors, see for example some DGE and Brill issues).
- Instead of equivalent terms, many entries only have descriptions/definitions: for example the entry for Ἰνδοσκυθία in LSJ contains "the country on the banks of the Indus" instead of Indoscythia.
- Adding a layer of multilinguality. This could be more appropriately be described as enhancement as the project is multilingual by its conception. In this case it is about adding translations not for just 5 or 6 languages per Greek term, but potentially for hundreds. See for example the Translations section at the end of the entries πῦρ (467 languages) or ἀθάνατος (59 languages) or περισσός with different groups of translations depending on sense.
- Making the text more amenable to an electronic experience in any other possible way would obviously include adding links for the appropriate words/phrases, given that interlinking is a primary feature in a wiki-based project. If you think this is easy, think again. Given the complexity of the Greek morphology (which becomes even more complex and unpredictable if we take into account the dialects) and character variations (i.e. use of oxia or tonos, identical words where the last character has either an acute or a grave accent), the fact that linking goes both ways, i.e. from Greek to English entries and from English entries to Greek ones, this can become overwhelmingly complex if it is not well thought out.
Produce reverse language versions: for example, from English, German, French, Italian, Russian to Ancient Greek. This is an extremely complex task which requires ongoing work as the structure of the original source material is hardly amenable to a fully automated approach and a human in the loop is necessary to increase quality and coverage.
For example, in most works adverbs are indicated with an -ως or -ῶς or there is a description of the adverbial form like "neut. pl. as adverb" or there is no indication of the adverbial form whatsoever. So if one needs to reverse the lemma and include the adverbs, the right form(s) should be indicated and their appropriate translations inserted or researched. Similarly, some adjectives become nouns in the neuter form and/or acquire extra meanings. For example, δοῦλος ("slave", "servile") and τὸ δοῦλον ("slavery" or "the slaves" collectively). In most sources (so many instances of this in DGE and LSJ) these are indicated in a clipped form like "τὸ δ.".
Anaphora is a device to save space in a print dictionary, but it is counter-productive when it comes to reversal. For example, in the LSJ entry ἀκρωτηριάζω we can read "form a promontory, jut out like one". The first sense is easy to reverse but reversing "jut out like one" = ἀκρωτηριάζω is hardly ideal. Similarly, we see in the lemma δελφίνιον = "temple of Apollo Delphinios, esp. at Athens, τὸ ἐπὶ Δελφινίῳ δικαστήριον the law-court there". The "the law-court there" (where?) is not a very good candidate for reversal. The entry κιρσουλκός, has instrument for this purpose instead of instrument for operating on varicose veins.
Old works tend to include Latin translations (often, when no other translation is provided). In a reversal, should these be treated no different than any other terms or should we indicate somehow that they are Latin? For example, in reversing Ancient Greek to English, should we have an English to Ancient Greek language pair that will include those Latin terms or should we create a Latin to Ancient Greek language pair? And the same of course goes to reversing German/French/you name it to Ancient Greek. In all those reversals, Latin terms should ideally be identified, deduplicated, processed, edited and imported as a different language pair.
- Variant forms: Should we use Λακωνικός or λακωνικός; λακωνισμός or Λακωνισμός (the former in Brill, the latter in Cambridge and LSJ including the 1996 version); ἀγαλματοποιία or ἀγαλματοποιΐα (the former in DGE, Brill, Cambridge, Rocci and the latter in LSJ including the 1996 version); διαῤῥήγνυμι or διαρρήγνυμι; ῥάθυμος or ῥᾴθυμος; θνῄσκω or θνήσκω; ἀγγήϊον or ἀγγήιον; ἀϋπνία or ἀυπνία? Should we use oxia or tonos (they look the same, but they result in the creation of different lemmas)? Should we use the old orthography in German, the new one, or both? The same goes to American and British English: harbour or harbor; colour or color? And it is not a "potayto, potahto" type of dilemma, as it is quite important that users find the same content no matter which variant they look up.
- Irregular accentuation: An example of non-standard forms adding noise to a resource that combines multiple sources is ὄρνῦμι (and similarly accented entries) in Autenrieth, instead of ὄρνυμι. There is some sense in the madness here, that is, it was meant to be ὄρνῡμι (with macron instead of circumflex), but apparently the way the resource was encoded did not take into account the difference between a circumflex and a macron.
- Non-Unicode representations: Despite the existence of Unicode there are still some major resources using at their core non-Unicode representations, for example Beta Code, on which many of the Perseus projects are based. The result of that is some Greek phrases or words in Perseus LSJ for example, still appear in a Beta code transcription rather than Greek.
- Irregular characters or character combinations: Some resources use non-standard Greek characters, for example ϑ instead of θ; ϰ instead of κ; micro sign (µ) instead of standard μ (they look the same, but they are different characters). Notable is the use by many French resources of two forms of beta; β at the beginning of words, and ϐ otherwise, resulting in such barbaric constructs as βάρϐαρος. Such idiosyncratic (or idiolectic) approaches should only live in the realm of calligraphy, not reference resources (especially electronic ones). Another example are instances of "σς" instead of "σσ" in resources like Slater. For example, the erroneous forms πάσςαλος, τέσςαρες, Θεσςαλία, Θεσςαλοί, Θεσςαλός.
- Non-combined diacritics: Diacritics can live as independent characters, without their associated letters (vowels usually in Ancient Greek) but they are meant to be combined. However, some sources, amongst which sources developed and maintained (?) by prestigious institutions such as the Lexicon of Greek Personal Names by Oxford University, do not combine them for some reason that is quite unfathomable to me. For example you get ῞Αβρων (that is ῞ and Αβρων) instead of Ἅβρων).
- Erroneous case in binomial nomenclature: LSJ is a major (but not the only) culprit when it comes to not respecting the case norms of binomial nomenclature. For example, as the Latin name for κληματίς we see Clematis Vitalba instead of Clematis vitalba. This type of errors persist even in the 1996 edition.
To summarize, various sources use different conventions and approaches. Some may be more idiosyncratic than others, and some outright non-standard or erroneous. When one tries to combine those sources, a certain degree of standardization and/or cross-referencing must take place in order to avoid chaos and provide a user-friendly experience.
Spiros read English at Manchester Metropolitan University. His postgraduate studies include Machine Translation at UMIST and IoL's Diploma in Translation. He has been working as a translator since 1995, initially as a literary translator and then as a technical translator specializing in software and IT.
He has a keen interest in translation technologies and has been teaching translation tools and localization in meta|φραση School of Translation Studies since 2003. His research interests and skills include multilingual web site development, online terminology management systems, wiki and forum software. In 2001 he created translatum.gr, a Greek translation portal providing, among other things, terminological assistance in a customized version of an open source forum platform. He is a member of the Hellenic Society of Terminology. He is passionate about QA and improving lexicographical resources, so LSJ did seem to fit the bill. As did IATE. And Woodhouse. And...
His help and comments have been instrumental in resolving Ancient Greek lexicographical issues and ambiguities. You can read much of his feedback publicly on the English to Ancient Greek and Ancient Greek to English forum in Translatum.gr where he was moderator for Classics.
William has taught Classics at UCLA, Stanford University and Atenisi University in the Kingdom of Tonga. He has translated widely from ancient, biblical, medieval and modern Greek into English, and from English to Greek. His books include Early Virgil and Philogelos: The Laugh Addict and a translation of Yiannis Ritsos' Romiosini. He has completed a vernacular translation of the New Testament, among other early Christian works.
According to the above license, if you copy text from this site you are required to provide attribution with a link to the page you used. To be clear as to what attribution means, you have to:
Hyperlink directly to the original page on the source site of the specific article you quote from (e.g. ἀγάπη)
Wikifying the LSJ
2013. A paper (in Greek with an English abstract) has been written regarding the development of this site (in its earlier iteration on lsj.translatum.gr; there have been considerable changes and additions since then) and presented at the 9th Conference "Hellenic Language and Terminology" (Athens, 7-9 November 2013) entitled Wikifying the LSJ (presentation in Greek-English bilingual pdf and Powerpoint format). The Abstract:
This paper relates the implementation of the Ancient Greek to English dictionary Liddell, Scott, Jones (LSJ) in MediaWiki format (https://lsj.gr). The original xml file was processed and converted (using regular expressions) to a file which was appropriate for use in MediaWiki. The main features of this implementation were: a) transcription of headwords in various forms and transliterations (polytonic with vrachy/macron, polytonic without vrachy/macron, monotonic, all caps, Latin characters with accents, Latin characters without accents, greeklish, Beta Code); b) case-insensitive and diacritics-insensitive autocomplete search suggestions for Greek and Latin characters; c) css styles to modify the look and feel; d) collection of ancient Greek quotes and development of a MediaWiki random quote extension; e) fine-tuning MediaWiki in a way that is appropriate for a lexicographic work of this nature; f) creation of an import template that supports Semantic Mediawiki functionality; and g) creation of indexes for each form and transliteration.
Rev(er|i)sing the LSJ
2019. A paper (in English) has been written on the challenges of reversing and revising the LSJ so that an English to Ancient Greek version can be produced and imported in this site. The paper was presented at the 12th Conference "Hellenic Language and Terminology". Download the presentation and full paper. The Abstract:
Liddell-Scott-Jones (LSJ) is a standard lexicographical work of the Ancient Greek language available online in a number of different incarnations. Its directionality is from Ancient Greek to English. What if one wants to search from English to Ancient Greek? The Perseus Project, a seminal and authoritative electronic source, provides a functionality whereby a reverse search is possible, based on a simple term-to-translation(s) logic, devoid of any further processing.
The above approach is a far cry from being satisfactory and is subject to a number of pitfalls which this paper aims to explore and provide a framework for their remediation on a linguistic and computational level. Some of the types of issues identified:
- Missing term elements (“commander of a” for “τελάρχης”)
- Missing Greek-derived equivalents (no “cephalalgia” in “κεφαλαλγία” and no “pankration” in “παγκράτιον”)
- Use of Greek as part of the translation (for example “παρανυμφεύω” rendered as “act as παράνυμφος”)
- Use of anaphora (“σκοτωματικός” as “causing dizziness | suffering from it”)
- Use of Latin instead of English, especially for taboo words (“crepitus ventris” for “τλήμων γαστρὸς ἔριθος”)
- Use of old English (“shew” instead of “show”, “connexion” instead of “connection”)
- Use of dash inbetween words (“to-morrow”)
- Abbreviated forms of the headword in phrases resulting in inflectional ambiguity (“κλυτὰ δ. βένθεσι λίμνης”)
- Incomplete example phrases with mid-phrase ellipsis (“τὸ ὕδωρ… αὐ. μὲν οὔκ ἐστι”)
- Typos and linguistic errors (“ἐντονία” instead of “εὐτονία”)
A script was created to extract the term/translation equivalents from the xml file. Phase I consisted of a) analysis of the output in Ancient Greek to English format b) identification and categorization of the issues and c) a plan for their remediation. Phase II was an analysis and further revision of the reversed material. Phase III was preparation and publication of the output in wiki format as an interactive supplemental resource to LSJ proper.
- For example LSJ for English, Pape for German, Bailly for French, and Dvoretsky for Russian
- Do we still need this if the word referred to is linked?
- Many references to other lemmas instead of displaying the Roman numerals display decimal numbers, for example the reference in the lemma ὑποδύω "cf. ὑποδέω 111.1" should be "cf. ὑποδέω III.1". Instead, what we get in Perseus is a link to an irrelevant text rather than a link towards the lemma.
- You can also even find multiple errors that probably resulted from regexes gone wrong like in the entry "ἀπάτερθε: before a vowel ἀπάσχολ-θεν" (!!!) (instead of ἀπάτερθεν)
- Yes, I know it is listed as an alternative form in DGE but by no means should it be the main nor the only one listed in a dictionary
- As Samuel Johnson said: "Every other author may aspire to praise; the lexicographer can only hope to escape reproach, and even this negative recompense has been yet granted to very few."
- Dvoretsky's Russian Lexicon is a notable exception: adverbs are fully listed in different lemmas thus facilitating their reversal.
- Anaphora is the use of a pronoun or other linguistic unit to refer back to another word or phrase.
- In the print dictionary two lemmas further up one can find κιρσουλκέω which provides the missing information. The same issue with the lemma κιρσουλκία = this operation.
- In Ancient Greek there were only capitals, hence we may infer that some case use preferences have been carried over from other languages, i.e. English rules regarding the capitalization of adjectives deriving from proper nouns: Laconic is capitalized in English, not so in Modern Greek or French
- In modern usage, the first letter of the generic name is always capitalized in writing, while that of the specific epithet is not, even when derived from a proper noun such as the name of a person or place.