Changes

Corpora (view source)

Revision as of 07:05, 4 August 2007

1,842 bytes removed , 07:05, 4 August 2007

no edit summary

Line 36: Line 36:

* Neo-Assyrian Text Corpus Project

* Amarna letters, (for Akkadian, Egyptian, Sumerogram's, etc.)

−

~~Other languages:~~

−

* Leeds collection of Web-derived Corpora of 100-200 million words for English, Chinese, Finnish, French, German, Italian, Polish, Portuguese, Russian and Spanish

−

* Leipzig Corpus of 15 languages with collocation statistics

−

* Red iberoamericana de terminología

−

* Red panlatina de terminología

−

* Corpus diacrónico del español (CORDE)

−

* Corpus de Referencia del Español Actual (CREA)

−

* Croatian National Corpus

−

* Czech National Corpus

−

* Slovak National Corpus

−

* Hungarian National Corpus

−

* The IPI PAN Corpus of Polish

−

* Corpus of Slovenian Language

−

* Bank of Swedish

−

* Spoken Dutch Corpus

−

* Balanced Corpus of Modern Chinese

−

* Persian Today Corpus

−

* METU Turkish Corpus

−

* Hellenic National Corpus

−

* Greek corpus from journalistic and high educational discourse

−

* Portuguese Corpora by Linguateca

−

* Russian National Corpus

−

~~Bilingual corpora:~~

−

* Evrokorpus English-Slovene parallel corpus

−

* COMPARA Portuguese-English parallel corpus

−

* EuroParl Parallel corpora including 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish. One of the most used corpora on Natural Language Processing.

−

* JRC-Acquis The JRC-Acquis Multilingual Parallel Corpus, includes the languages: Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Slovak, Slovene and Swedish.

−

~~== See also ==~~

−

* concordance

−

* corpus linguistics

−

* Linguistic Data Consortium

−

* natural language processing

−

* Natural Language Toolkit

−

* parallel text alignment

−

* Search engines: they access the "web corpus".

−

* translation memory

−

* treebank

Rdavis

Bureaucrats, Administrators

102,807

edits

Changes

Corpora (view source)

Revision as of 07:05, 4 August 2007

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Tools

Search