Skip to content

Commit 476ab1f

Browse files
authored
Merge pull request #43 from cardmagic/refactor/early-returns
Refactor nested conditionals to early returns
2 parents 5ceb182 + 2522a35 commit 476ab1f

File tree

5 files changed

+78
-27
lines changed

5 files changed

+78
-27
lines changed

CLAUDE.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
Ruby gem providing text classification via two algorithms:
8+
- **Bayes** (`Classifier::Bayes`) - Naive Bayesian classification
9+
- **LSI** (`Classifier::LSI`) - Latent Semantic Indexing for semantic classification, clustering, and search
10+
11+
## Common Commands
12+
13+
```bash
14+
# Run all tests
15+
rake test
16+
17+
# Run a single test file
18+
ruby -Ilib test/bayes/bayesian_test.rb
19+
ruby -Ilib test/lsi/lsi_test.rb
20+
21+
# Run tests with native Ruby vector (without GSL)
22+
NATIVE_VECTOR=true rake test
23+
24+
# Interactive console
25+
rake console
26+
27+
# Generate documentation
28+
rake doc
29+
```
30+
31+
## Architecture
32+
33+
### Core Components
34+
35+
**Bayesian Classifier** (`lib/classifier/bayes.rb`)
36+
- Train with `train(category, text)` or dynamic methods like `train_spam(text)`
37+
- Classify with `classify(text)` returning the best category
38+
- Uses log probabilities for numerical stability
39+
40+
**LSI Classifier** (`lib/classifier/lsi.rb`)
41+
- Uses Singular Value Decomposition (SVD) for semantic analysis
42+
- Optional GSL gem for 10x faster matrix operations; falls back to pure Ruby SVD
43+
- Key operations: `add_item`, `classify`, `find_related`, `search`
44+
- `auto_rebuild` option controls automatic index rebuilding after changes
45+
46+
**String Extensions** (`lib/classifier/extensions/word_hash.rb`)
47+
- `word_hash` / `clean_word_hash` - tokenize text to stemmed word frequencies
48+
- `CORPUS_SKIP_WORDS` - stopwords filtered during tokenization
49+
- Uses `fast-stemmer` gem for Porter stemming
50+
51+
**Vector Extensions** (`lib/classifier/extensions/vector.rb`)
52+
- Pure Ruby SVD implementation (`Matrix#SV_decomp`)
53+
- Vector normalization and magnitude calculations
54+
55+
### GSL Integration
56+
57+
LSI checks for the `gsl` gem at load time. When available:
58+
- Uses `GSL::Matrix` and `GSL::Vector` for faster operations
59+
- Serialization handled via `vector_serialize.rb`
60+
- Test without GSL: `NATIVE_VECTOR=true rake test`
61+
62+
### Content Nodes (`lib/classifier/lsi/content_node.rb`)
63+
64+
Internal data structure storing:
65+
- `word_hash` - term frequencies
66+
- `raw_vector` / `raw_norm` - initial vector representation
67+
- `lsi_vector` / `lsi_norm` - reduced dimensionality representation after SVD

Gemfile.lock

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ PATH
33
specs:
44
classifier (1.4.4)
55
fast-stemmer (~> 1.0)
6-
matrix
76
mutex_m (~> 0.2)
87
rake
98

@@ -24,7 +23,7 @@ GEM
2423
PLATFORMS
2524
arm64-darwin-22
2625
arm64-darwin-23
27-
arm64-darwin-24
26+
arm64-darwin-25
2827
x86_64-linux
2928

3029
DEPENDENCIES

lib/classifier/bayes.rb

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -100,20 +100,13 @@ def classify(text)
100100
# b.untrain_that "That text"
101101
# b.train_the_other "The other text"
102102
def method_missing(name, *args)
103+
return super unless name.to_s =~ /(un)?train_(\w+)/
104+
103105
category = name.to_s.gsub(/(un)?train_(\w+)/, '\2').prepare_category_name
104-
if @categories.key?(category)
105-
args.each do |text|
106-
if name.to_s.start_with?('untrain_')
107-
untrain(category, text)
108-
else
109-
train(category, text)
110-
end
111-
end
112-
elsif name.to_s =~ /(un)?train_(\w+)/
113-
raise StandardError, "No such category: #{category}"
114-
else
115-
super
116-
end
106+
raise StandardError, "No such category: #{category}" unless @categories.key?(category)
107+
108+
method = name.to_s.start_with?('untrain_') ? :untrain : :train
109+
args.each { |text| send(method, category, text) }
117110
end
118111

119112
#

lib/classifier/extensions/vector.rb

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,9 @@
88
class Array
99
def sum_with_identity(identity = 0.0, &block)
1010
return identity unless size.to_i.positive?
11+
return map(&block).sum_with_identity(identity) if block_given?
1112

12-
if block_given?
13-
map(&block).sum_with_identity(identity)
14-
else
15-
compact.reduce(:+).to_f || identity.to_f
16-
end
13+
compact.reduce(:+).to_f || identity.to_f
1714
end
1815
end
1916

lib/classifier/lsi.rb

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -332,13 +332,8 @@ def node_for_content(item, &block)
332332
return @items[item] if @items[item]
333333

334334
clean_word_hash = block ? block.call(item).clean_word_hash : item.to_s.clean_word_hash
335-
336-
cn = ContentNode.new(clean_word_hash, &block) # make the node and extract the data
337-
338-
unless needs_rebuild?
339-
cn.raw_vector_with(@word_list) # make the lsi raw and norm vectors
340-
end
341-
335+
cn = ContentNode.new(clean_word_hash, &block)
336+
cn.raw_vector_with(@word_list) unless needs_rebuild?
342337
cn
343338
end
344339

0 commit comments

Comments
 (0)