Changes

From Nordan Symposia
Jump to navigationJump to search
1,842 bytes removed ,  07:05, 4 August 2007
no edit summary
Line 36: Line 36:  
     * Neo-Assyrian Text Corpus Project
 
     * Neo-Assyrian Text Corpus Project
 
     * Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.)
 
     * Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.)
  −
Other languages:
  −
  −
    * Leeds collection of Web-derived Corpora of 100-200 million words for English, Chinese, Finnish, French, German, Italian, Polish, Portuguese, Russian and Spanish
  −
    * Leipzig Corpus of 15 languages with collocation statistics
  −
    * Red iberoamericana de terminología
  −
    * Red panlatina de terminología
  −
    * Corpus diacrónico del español (CORDE)
  −
    * Corpus de Referencia del Español Actual (CREA)
  −
    * Croatian National Corpus
  −
    * Czech National Corpus
  −
    * Slovak National Corpus
  −
    * Hungarian National Corpus
  −
    * The IPI PAN Corpus of Polish
  −
    * Corpus of Slovenian Language
  −
    * Bank of Swedish
  −
    * Spoken Dutch Corpus
  −
    * Balanced Corpus of Modern Chinese
  −
    * Persian Today Corpus
  −
    * METU Turkish Corpus
  −
    * Hellenic National Corpus
  −
    * Greek corpus from journalistic and high educational discourse
  −
    * Portuguese Corpora by Linguateca
  −
    * Russian National Corpus
  −
  −
Bilingual corpora:
  −
  −
    * Evrokorpus English-Slovene parallel corpus
  −
    * COMPARA Portuguese-English parallel corpus
  −
    * EuroParl Parallel corpora including 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish. One of the most used corpora on Natural Language Processing.
  −
    * JRC-Acquis The JRC-Acquis Multilingual Parallel Corpus, includes the languages: Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Slovak, Slovene and Swedish.
  −
  −
  −
== See also ==
  −
  −
  −
    * concordance
  −
    * corpus linguistics
  −
    * Linguistic Data Consortium
  −
    * natural language processing
  −
    * Natural Language Toolkit
  −
    * parallel text alignment
  −
    * Search engines: they access the "web corpus".
  −
    * translation memory
  −
    * treebank
 

Navigation menu