Skip to content

Conversation

@matrix
Copy link
Member

@matrix matrix commented Sep 7, 2025

Hi,

ngramX is a small utility that reads a text file, splits it into words, and generates n-grams of a given size.
It preserves punctuation (only removing newline characters) and supports n-grams that continue across line boundaries.

POC:

$ head alice_in_wonderland.txt 
TITLE: Alice's Adventures in Wonderland
AUTHOR: Lewis Carroll


= CHAPTER I = 
=( Down the Rabbit-Hole )=

  Alice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do:  once or twice she had
peeped into the book her sister was reading, but it had no
$ src/ngramX.bin alice_in_wonderland.txt 3 | head
TITLE: Alice's Adventures
Alice's Adventures in
Adventures in Wonderland
in Wonderland AUTHOR:
Wonderland AUTHOR: Lewis
AUTHOR: Lewis Carroll
Lewis Carroll =
Carroll = CHAPTER
= CHAPTER I
CHAPTER I =

Thanks

@jsteube jsteube merged commit 4c7362d into hashcat:master Sep 8, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants