Leipzig Corpora Collection

Search in 1056 Corpus-Based Monolingual Dictionaries for 290 Languages.

Selected language: Portuguese Newscrawl 2011 (Brazil)

Search suggestions: Era · Rodoviária · pretende · possui · atingiu

More information about: Portuguese Newscrawl 2011 (Brazil) Change corpus

The corpus por-br_newscrawl_2011 is a Portuguese news corpus (Brazil) based on material crawled in 2011. It contains 25,008,883 sentences and 486,724,987 tokens. Details

DOWNLOADS

Download parts of this corpus.

STATISTICS

More details about this corpus on our corpus and language statistics page.

Further services:

Description

Portuguese news corpus (Brazil) based on material crawled in 2011

Details

Name	por-br_newscrawl_2011	Sentences	25,008,883
Language	Portuguese ()	Types	2,087,177
Genre	Newscrawl	Tokens	486,724,987
Year	2011
Location	Brazil

Link to the corpus

https://corpora.wortschatz-leipzig.de?corpusId=por-br_newscrawl_2011

Annotations

coocSim
GDEX
POS (OpenNLP - unknown)
wordsLevenshteinSim

Cite this corpus

Leipzig Corpora Collection: Portuguese news corpus (Brazil) based on material crawled in 2011. Leipzig Corpora Collection. Dataset. https://corpora.wortschatz-leipzig.de?corpusId=por-br_newscrawl_2011. BibTeX

@misc{por-br_newscrawl_2011,
    author = {Leipzig Corpora Collection},
    title = {Portuguese news corpus (Brazil) based on material crawled in 2011},
    howpublished = {https://corpora.wortschatz-leipzig.de?corpusId=por-br_newscrawl_2011},
    note = {Accessed: 2025-12-15}
}