The corpus ben_newscrawl_2014_300K is a Bengali news subcorpus based on material crawled in 2014 (300,000 sentences).
It contains 300,000 sentences and 4,043,381 tokens.
Details
Leipzig Corpora Collection: Bengali news subcorpus based on material crawled in 2014 (300,000 sentences). Leipzig Corpora Collection. Dataset. https://corpora.wortschatz-leipzig.de?corpusId=ben_newscrawl_2014_300K.
BibTeX
@misc{ben_newscrawl_2014_300K,
author = {Leipzig Corpora Collection},
title = {Bengali news subcorpus based on material crawled in 2014 (300,000 sentences)},
howpublished = {https://corpora.wortschatz-leipzig.de?corpusId=ben_newscrawl_2014_300K},
note = {Accessed: 2024-10-06}
}