The word cruncher program (i.e. client.rb) reads in a given dictionary and spits out two output files: sequences and words. Sequences will contain every sequence of four letters that appears in exactly one word of the dictionary, one sequence per line. Words will contain the corresponding words that contain the sequences, in the same order.
├── lib
│ ├── ext
│ │ └── string.rb
│ ├── output_strategy
│ │ ├── array.rb
│ │ ├── file.rb
│ │ └── tempfile.rb
│ ├── output_strategy.rb
│ └── word_cruncher.rb
├── spec
│ ├── ext
│ │ └── string_spec.rb
│ ├── spec_helper.rb
│ ├── output_strategy_spec.rb
│ ├── support
│ │ └── dictionary.txt
│ └── word_cruncher_spec.rb
├── client.rb
├── dictionary.txt
├── sequences
└── words
lib/ contains the library classes that are used by client.rb.
spec/ contains test classes for the library.
client.rb makes use of WordCruncher class to process the dictionary and write data output to files.
dictionary.txt is the dictionary to be processed.
sequences and words are the output files generated when the program run.
NOTE: There is no need to unit test client.rb because it contains only setup code and relies entirely on the library code.
WordCruncher class accepts sequences and words to which the outputs will be written and s_length for customizing sequence length (default to 4) for processing.
I use Strategy pattern initially to avoid tests from writing output files when run. Now it also helps improve the flexibility of the program when it need to support additional output types. There are currently three supported strategies:
- Array strategy for storing the output in array in memory
- File strategy for writing the output to file
- Tempfile strategy for writing the output to temporary file
lib/ext/string.rb contains extensions for String class:
each_cons(n)for enumerating list of consecutive n-character sequences of a string
Clone the repo:
git clone https://github.com/lchanmann/word_cruncher.git
cd word_cruncher
Install dependencies
bundle install
To run the program:
ruby client.rb
The program assumes that the dictionary file exists and named dictionary.txt under the same directory as client.rb. Sequences and Words will be written to files named sequences and words.
However, dictionary, sequences and words files can be customized by using environment variables DICTIONARY, SEQUENCES and WORDS. E.g.:
$ DICTIONARY=my_dictionary.txt ruby client.rb
To run tests:
rspec