What does this tool do
The Word Cloud Generator visualizes word frequency as a cloud. Paste your text, click Generate Word Cloud, and see which words appear most often — larger words indicate higher frequency. Use Max words and Min count to filter the display. Word counting reuses the same tokenization as the Text Tokenizer: whitespace split, punctuation stripped from boundaries.
How to use it
- Choose input mode — From text tokenizes pasted text automatically (same as Text Tokenizer). From list lets you enter words and frequencies manually, one per line (e.g.
word 10orword<Tab>10). - Enter or paste — For text mode: type or paste text; use Generate dummy text to quickly fill. For list mode: enter one
word frequencypair per line. - Click Generate Word Cloud — The tool processes your input and renders a word cloud.
- Adjust options — Set Max words (default 80) to limit how many words appear, and Min count to exclude low-frequency words.
- Hover for counts — Hover over any word to see its frequency count in a tooltip.
How it works
Word counting is delegated to the Text Tokenizer:
- Text is split on whitespace.
- Leading and trailing punctuation (Unicode
\p{P}) are stripped from each word. - Empty strings are filtered.
- Frequency is computed and sorted by count descending.
The cloud layout uses d3-cloud to pack words without overlap. Font size scales with frequency. All computation runs entirely in your browser.
Use cases & examples
- Quick overview — See at a glance which terms dominate a document or transcript.
- Presentations — Create a visual summary of key topics from meeting notes or articles.
- Content analysis — Identify recurring themes in blog posts or customer feedback.
- Education — Illustrate word frequency and vocabulary distribution in texts.
Example
For input: "hello world hello.":
- Tokens:
hello,world,hello - Cloud: "hello" appears larger than "world" because it occurs twice.
Limitations & known constraints
- Input cap — Maximum 512KB (~512,000 characters). Larger input returns an error.
- Client-side only — No server; processing runs in the browser. Very large inputs may cause brief UI lag.
- Simple tokenization — Same as Text Tokenizer: whitespace split only; no stemming, lemmatization, or language-specific tokenization.
- For detailed analysis — Use the Text Tokenizer for frequency tables, copy output, and "Analyze in Statistics" integration.