Line 36: |
Line 36: |
| * Neo-Assyrian Text Corpus Project | | * Neo-Assyrian Text Corpus Project |
| * Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.) | | * Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.) |
− |
| |
− | Other languages:
| |
− |
| |
− | * Leeds collection of Web-derived Corpora of 100-200 million words for English, Chinese, Finnish, French, German, Italian, Polish, Portuguese, Russian and Spanish
| |
− | * Leipzig Corpus of 15 languages with collocation statistics
| |
− | * Red iberoamericana de terminología
| |
− | * Red panlatina de terminología
| |
− | * Corpus diacrónico del español (CORDE)
| |
− | * Corpus de Referencia del Español Actual (CREA)
| |
− | * Croatian National Corpus
| |
− | * Czech National Corpus
| |
− | * Slovak National Corpus
| |
− | * Hungarian National Corpus
| |
− | * The IPI PAN Corpus of Polish
| |
− | * Corpus of Slovenian Language
| |
− | * Bank of Swedish
| |
− | * Spoken Dutch Corpus
| |
− | * Balanced Corpus of Modern Chinese
| |
− | * Persian Today Corpus
| |
− | * METU Turkish Corpus
| |
− | * Hellenic National Corpus
| |
− | * Greek corpus from journalistic and high educational discourse
| |
− | * Portuguese Corpora by Linguateca
| |
− | * Russian National Corpus
| |
− |
| |
− | Bilingual corpora:
| |
− |
| |
− | * Evrokorpus English-Slovene parallel corpus
| |
− | * COMPARA Portuguese-English parallel corpus
| |
− | * EuroParl Parallel corpora including 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish. One of the most used corpora on Natural Language Processing.
| |
− | * JRC-Acquis The JRC-Acquis Multilingual Parallel Corpus, includes the languages: Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Slovak, Slovene and Swedish.
| |
− |
| |
− |
| |
− | == See also ==
| |
− |
| |
− |
| |
− | * concordance
| |
− | * corpus linguistics
| |
− | * Linguistic Data Consortium
| |
− | * natural language processing
| |
− | * Natural Language Toolkit
| |
− | * parallel text alignment
| |
− | * Search engines: they access the "web corpus".
| |
− | * translation memory
| |
− | * treebank
| |