cz| en| ru
About us | People | Corpus of Czech Verse | Online tools | Books / Projects | Download | Partnership | Contact | Links

Corpus of Czech Verse

Corpus of Czech verse
The Corpus of Czech verse (CCV, henceforth) is a lemmatized, phonetically, morphologically, metrically and strophically annotated corpus.*
Each lexical unit is provided with information about its basic word form (lemma), phonetic transcription and grammatical categories; each verse line is provided with information about its type of metre (iamb, trochee, etc.), length (n-foot), type of the end of a line (masculine, feminine, etc.) and the metrical pattern. (Currently, only syllabotonic verse lines are annotated in terms of metrics.) On higher levels rhyme pairs are annotated, or n-some and fixed forms (sonnet, rondel, etc.). In the metrical and strophical description it is possible to search by means of Database of Czech Metres; the lemmatization level is partly accessible through Frequency lists; rhyme pairs can be searched in the application Gunstick
CCV is based on the texts from the Czech electronic library, which, however, contains a number of duplicates (i.e. recurrence of poems in various editions of a collection or collected writings of an author). To avoid unnecessary misrepresentation of statistical data, we decided to include into CCV only the oldest occurrence of each poem (the inventory of discarded poems), the correspondence between the poems being determined on the basis of their phonetic transcription. Thus, selection should not be affected by variations in punctuation, and at the same time there should not occur elimination of those reprints in which certain (albeit minor) changes had been made.
* Lemmatization and morphological annotation were carried out by the researchers at the Institute of Theoretical and Computational Linguistics FA CU (Hana Skoumalová, Milena Hnátková, Tomáš Jelínek and Vladimír Petkevič) in cooperation with the researchers at the Institute of Formal and Applied Linguistics FMP CU (Jan Hajič, Jaroslava Hlaváčová).

The basic characteristics of the Corpus of Czech verse

  • 1 689 poetry collections
  • 76 699 poems
  • 2 664 989 verse lines
  • 14 592 037 words

The structure of the Corpus of Czech Verse

Number of poems

» Number of lines «

Number of words

» Date of publication «

Date of author's birth

Na Florenci 3/1420, 110 00 Praha 1
+420 222 828 148
Versification Research Group profile


Development of tools located at this website, website development and its English (Gabriela Brůhova) and Russian (Evgenia Tumanova) translations were supported by Czech Science Foundation (P406/11/1825) and by the long-term conceptual development of a research institution 68378068.
© 2014 Petr Plecháč