Skip to content

Commit 545348d

Browse files
authored
Merge pull request #229 from ClojureCivitas/tensor-images-3
tensor-image wip
2 parents 6944084 + 7e72039 commit 545348d

File tree

1 file changed

+98
-77
lines changed

1 file changed

+98
-77
lines changed

src/dtype_next/image_processing_with_tensors.clj

Lines changed: 98 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,29 @@ original-img
111111
(def original-tensor
112112
(bufimg/as-ubyte-tensor original-img))
113113

114+
;; ## ⚠️ Important: Understanding Channel Order
115+
;;
116+
;; BufferedImage can use different pixel formats (RGB, BGR, ARGB, etc.). The specific
117+
;; format depends on the image type and how it was loaded. Our image uses **BGR** order:
118+
119+
(bufimg/image-type original-img)
120+
121+
;; `:byte-bgr` means this image stores colors in BGR (Blue-Green-Red) order, not RGB.
122+
;; The `bufimg/as-ubyte-tensor` function preserves whatever order BufferedImage uses.
123+
;;
124+
;; For this tutorial's BGR images, the channels are:
125+
;; - **Channel 0 = Blue**
126+
;; - **Channel 1 = Green**
127+
;; - **Channel 2 = Red**
128+
;;
129+
;; Always check `bufimg/image-type` to confirm your image's channel order before
130+
;; processing. We'll be explicit about BGR ordering throughout this tutorial.
131+
;;
132+
;; **Why the round-trip works:** `bufimg/tensor->image` defaults to creating BGR
133+
;; BufferedImages. So our workflow maintains BGR throughout: load BGR → process as
134+
;; BGR tensor → create BGR image. If you had an RGB tensor, you'd need to either
135+
;; swap channels or use `(bufimg/tensor->image rgb-tensor {:img-type :int-rgb})`.
136+
114137
original-tensor
115138

116139
;; ## Understanding Tensor Shape
@@ -126,19 +149,6 @@ original-tensor
126149

127150
;; This is `[height width channels]` — our image has 3 color channels.
128151

129-
;; **Channel ordering**: Java's BufferedImage uses BGR order internally:
130-
131-
(bufimg/image-type original-img)
132-
133-
;; `:byte-bgr` confirms BGR byte order. The `bufimg/as-ubyte-tensor` function
134-
;; preserves this ordering, so our tensor channels are:
135-
;; - **Channel 0 = Blue**
136-
;; - **Channel 1 = Green**
137-
;; - **Channel 2 = Red**
138-
;;
139-
;; This is the opposite of the more common RGB convention. Throughout this tutorial,
140-
;; we'll work with BGR order and be explicit about it in our code.
141-
142152
(def height
143153
(first (dtype/shape original-tensor)))
144154

@@ -219,7 +229,7 @@ original-tensor
219229
tc/dataset
220230
ds-tensor/dataset->tensor
221231
(tensor/reshape [height width 3])
222-
bufimg/tensor->image)
232+
bufimg/tensor->image) ; Creates BGR BufferedImage by default
223233

224234
;; This round-trip demonstrates the seamless interop between tensors and datasets,
225235
;; useful for combining spatial operations (tensors) with statistical analysis (datasets).
@@ -534,27 +544,26 @@ flat-tensor
534544

535545
;; ## Extracting Color Channels
536546

537-
;; Use `tensor/select` to slice out individual channels (zero-copy views):
538-
539-
(defn extract-channels
540-
"Extract B, G, R channels from BGR tensor using tensor/select.
541-
Takes: [H W 3] tensor (BGR order)
542-
Returns: map with :blue, :green, :red tensors (each [H W])
543-
544-
Note: For simply extracting all channels, tensor/slice-right is cleaner:
545-
(let [[b g r] (tensor/slice-right img 1)] {:blue b :green g :red r})"
546-
[img-tensor]
547-
{:blue (tensor/select img-tensor :all :all 0)
548-
:green (tensor/select img-tensor :all :all 1)
549-
:red (tensor/select img-tensor :all :all 2)})
547+
;; There are two main approaches for extracting channels:
548+
;;
549+
;; 1. **tensor/select** — Extract specific channels by index
550+
;; 2. **tensor/slice-right** — Iterate over all channels cleanly
551+
;;
552+
;; We'll use `tensor/select` for explicit channel extraction:
550553

551-
(def channels (extract-channels original-tensor))
554+
(def channels
555+
(let [[blue green red] (tensor/slice-right original-tensor 1)]
556+
{:blue blue :green green :red red}))
552557

553558
;; Each channel is now `[H W]` instead of `[H W C]`:
554559

555-
(dtype/shape (:red channels))
560+
(:red channels)
556561

557-
;; **Key insight**: These are **views** into the original tensor—no copying.
562+
;; **Key insight**: These are **zero-copy views** into the original tensor—no data is copied.
563+
;;
564+
;; **Alternative with tensor/select**:
565+
;; Blue channel:
566+
(tensor/select original-tensor :all :all 2)
558567

559568
;; ## Channel Statistics
560569

@@ -615,7 +624,8 @@ flat-tensor
615624
"Convert BGR [H W 3] to grayscale [H W].
616625
Standard formula: 0.299*R + 0.587*G + 0.114*B
617626
Takes BGR tensor, extracts channels correctly.
618-
Returns float32 tensor (use dtype/elemwise-cast for uint8)."
627+
Returns float64 tensor (dfn/* operates on floats for precision).
628+
Use dtype/elemwise-cast :uint8 when you need integer values for display."
619629
[img-tensor]
620630
(let [b (tensor/select img-tensor :all :all 0) ; Blue is channel 0
621631
g (tensor/select img-tensor :all :all 1) ; Green is channel 1
@@ -626,18 +636,17 @@ flat-tensor
626636

627637
(def grayscale (to-grayscale original-tensor))
628638

629-
;; **Note on types**: `to-grayscale` returns float32 because `dfn/*` operates on
630-
;; floats for precision. When visualizing, `bufimg/tensor->image` automatically
631-
;; handles the float→uint8 conversion.
632-
633639
;; **Grayscale statistics**:
634640

635641
(tc/dataset (channel-stats grayscale))
636642

637-
;; Visualize grayscale (automatic float→uint8 conversion):
643+
;; Visualize grayscale:
638644

639645
(bufimg/tensor->image grayscale)
640646

647+
;; **Note**: `bufimg/tensor->image` automatically handles float64→uint8 conversion
648+
;; and interprets single-channel tensors as grayscale images.
649+
641650
;; ## Histograms
642651

643652
;; A [histogram](https://en.wikipedia.org/wiki/Image_histogram) shows the distribution
@@ -659,25 +668,7 @@ flat-tensor
659668
(plotly/layer-histogram {:=x :green
660669
:=mark-color "green"}))
661670

662-
;; **Approach 2**: Separate histograms using `dtype/as-reader` for direct tensor access:
663-
664-
(require '[scicloj.kindly.v4.kind :as kind])
665-
(require '[scicloj.tableplot.v1.plotly :as plotly])
666-
667-
;; The `scicloj.kindly.v4.kind` namespace provides visualization directives for Clay.
668-
;; The `scicloj.tableplot.v1.plotly` namespace enables Plotly-based charting.
669-
670-
(->> (assoc channels :gray grayscale)
671-
(map (fn [[k v]]
672-
(-> (tc/dataset {:x (dtype/as-reader v)})
673-
(plotly/base {:=title k
674-
:=height 200
675-
:=width 600})
676-
(plotly/layer-histogram {:=histogram-nbins 30
677-
:=mark-color k}))))
678-
kind/fragment)
679-
680-
;; **Approach 3**: Using `slice-right` for cleaner per-channel histograms:
671+
;; **Per-channel histograms** using `slice-right` for clean iteration:
681672

682673
(kind/fragment
683674
(mapv (fn [color channel]
@@ -746,7 +737,7 @@ flat-tensor
746737
Takes: [H W] tensor
747738
Returns: [H W-1] tensor"
748739
[tensor-2d]
749-
(let [[h w] (dtype/shape tensor-2d)]
740+
(let [[_ w] (dtype/shape tensor-2d)]
750741
(dfn/- (tensor/select tensor-2d :all (range 1 w))
751742
(tensor/select tensor-2d :all (range 0 (dec w))))))
752743

@@ -755,7 +746,7 @@ flat-tensor
755746
Takes: [H W] tensor
756747
Returns: [H-1 W] tensor"
757748
[tensor-2d]
758-
(let [[h w] (dtype/shape tensor-2d)]
749+
(let [[h _] (dtype/shape tensor-2d)]
759750
(dfn/- (tensor/select tensor-2d (range 1 h) :all)
760751
(tensor/select tensor-2d (range 0 (dec h)) :all))))
761752

@@ -778,8 +769,8 @@ gy
778769
Takes: gx [H W-1], gy [H-1 W]
779770
Returns: [H-1 W-1] (trimmed to common size)"
780771
[gx gy]
781-
(let [[h-gx w-gx] (dtype/shape gx)
782-
[h-gy w-gy] (dtype/shape gy)
772+
(let [[_ w-gx] (dtype/shape gx)
773+
[h-gy _] (dtype/shape gy)
783774
;; Trim to common dimensions: gx loses 1 row, gy loses 1 column
784775
gx-trimmed (tensor/select gx (range 0 h-gy) :all)
785776
gy-trimmed (tensor/select gy :all (range 0 w-gx))]
@@ -797,6 +788,8 @@ edges
797788
(dfn/* (/ 255.0 (max 1.0 (dfn/reduce-max edges))))
798789
(dtype/elemwise-cast :uint8)))
799790

791+
;; **Note**: Grayscale (single-channel) tensors are rendered as grayscale images.
792+
800793
;; ## Sharpness Metric
801794

802795
;; Measure image sharpness by averaging edge magnitude—higher = sharper:
@@ -818,7 +811,7 @@ edges
818811

819812
;; ---
820813

821-
;; # Advanced Tensor Operations — Rows, Columns, and Regions
814+
;; # Spatial Profiling — Row and Column Analysis
822815

823816
;; We've seen how to extract channels and compute global statistics. Now let's
824817
;; explore **row-wise and column-wise analysis** using `tensor/slice`, `tensor/transpose`,
@@ -909,7 +902,6 @@ edges
909902
;; ## Efficient Aggregation with reduce-axis
910903

911904
;; For statistics without explicit iteration, use `tensor/reduce-axis`.
912-
;; **Important**: Specify result dtype to avoid truncation!
913905

914906
;; Compute row brightness using reduce-axis:
915907

@@ -923,9 +915,10 @@ edges
923915

924916
(take 10 (dtype/as-reader row-means-fast))
925917

926-
;; **Note**: Specifying `:float64` explicitly ensures the result type. In some cases,
927-
;; omitting the dtype can lead to unexpected type coercion, so it's good practice to
928-
;; specify the desired output type when precision matters.
918+
;; **Why specify `:float64`?** Without it, dtype-next might infer the output type from
919+
;; the input (`:uint8`), which would truncate decimal values from the mean operation.
920+
;; For example, a mean of 127.8 would become 127. Always specify the output datatype
921+
;; when reducing to ensure you get the precision you need.
929922

930923
;; ## Block-Based Region Analysis
931924

@@ -979,9 +972,10 @@ edges
979972

980973
;; # Enhancement Pipeline
981974

982-
;; With analysis tools in place, let's build functions that *improve* images.
983-
;; We'll create composable transformations for white balance and contrast,
984-
;; each verifiable through numeric properties we can check in the REPL.
975+
;; We've explored analyzing image properties—now let's actively *transform* them.
976+
;; With analysis tools in place, we'll build functions that improve images through
977+
;; white balance and contrast adjustment. Each transformation is composable and
978+
;; verifiable through numeric properties we can check in the REPL.
985979

986980
;; ## Auto White Balance
987981

@@ -1031,7 +1025,9 @@ edges
10311025
[original-img
10321026
(-> original-tensor
10331027
auto-white-balance
1034-
bufimg/tensor->image)]])
1028+
bufimg/tensor->image)]]) ; BGR tensor → BGR image
1029+
1030+
;; **Note**: Our BGR tensor flows seamlessly to BGR BufferedImage.
10351031

10361032
;; ## Contrast Enhancement
10371033

@@ -1075,7 +1071,7 @@ edges
10751071
[original-img
10761072
(-> original-tensor
10771073
(enhance-contrast 1.5)
1078-
bufimg/tensor->image)
1074+
bufimg/tensor->image) ; BGR → BGR
10791075
(-> original-tensor
10801076
(enhance-contrast 3)
10811077
bufimg/tensor->image)]])
@@ -1175,6 +1171,8 @@ edges
11751171
(bufimg/tensor->image (simulate-color-blindness original-tensor :deuteranopia))
11761172
(bufimg/tensor->image (simulate-color-blindness original-tensor :tritanopia))]])
11771173

1174+
;; All color blindness transformations maintain BGR order throughout.
1175+
11781176
;; ---
11791177

11801178
;; # Convolution & Filtering
@@ -1226,7 +1224,11 @@ kernel-3x3
12261224
(defn convolve-2d
12271225
"Apply 2D convolution to grayscale image [H W].
12281226
kernel: [kh kw] float tensor
1229-
Returns [H W] float32 tensor (zero-padded edges)."
1227+
Returns [H W] float32 tensor (zero-padded edges).
1228+
1229+
Note: This implementation prioritizes clarity over performance.
1230+
For production use, consider tech.v3.libs.opencv or specialized
1231+
convolution libraries for better performance on large images."
12301232
[img-2d kernel]
12311233
(let [[h w] (dtype/shape img-2d)
12321234
[kh kw] (dtype/shape kernel)
@@ -1266,6 +1268,8 @@ kernel-3x3
12661268
[(bufimg/tensor->image grayscale)
12671269
(bufimg/tensor->image (dtype/elemwise-cast blurred-gray :uint8))]])
12681270

1271+
;; Grayscale tensors (2D) are automatically rendered as grayscale images.
1272+
12691273
;; ## Gaussian Blur
12701274

12711275
;; [Gaussian blur](https://en.wikipedia.org/wiki/Gaussian_blur) uses a kernel based
@@ -1401,12 +1405,13 @@ gaussian-5x5
14011405
(dfn/* (/ 255.0 (max 1.0 (dfn/reduce-max sobel-edges))))
14021406
(dtype/elemwise-cast :uint8)))
14031407

1404-
;; **Comparison**: Simple gradient vs Sobel
1408+
;; Single-channel tensors display as grayscale images.
1409+
1410+
;; **Comparison**: Simple gradient (from Spatial Analysis section) vs Sobel
14051411

1406-
(let [simple-edges (edge-magnitude (gradient-x grayscale) (gradient-y grayscale))]
1407-
{:simple-mean (dfn/mean simple-edges)
1408-
:sobel-mean (dfn/mean sobel-edges)
1409-
:sobel-smoother? true})
1412+
{:simple-mean (dfn/mean edges) ; reuse edges computed earlier
1413+
:sobel-mean (dfn/mean sobel-edges)
1414+
:sobel-smoother? true}
14101415

14111416
;; Sobel produces smoother, more robust edge detection.
14121417

@@ -1450,7 +1455,7 @@ gaussian-5x5
14501455
[(bufimg/tensor->image grayscale)
14511456
(bufimg/tensor->image downsampled-gray)]])
14521457

1453-
;; **Verification**: Downsampled image is exactly half the size in each dimension.
1458+
;; Both grayscale tensors render as grayscale images.
14541459

14551460
;; ## Image Pyramid
14561461

@@ -1486,6 +1491,8 @@ gaussian-5x5
14861491
(kind/fragment
14871492
(mapv bufimg/tensor->image gray-pyramid))
14881493

1494+
;; Each grayscale tensor at different scales renders as a grayscale image.
1495+
14891496
;; **Use case**: Multi-scale edge detection for finding features at different sizes.
14901497

14911498
;; ## Block Average Downsampling
@@ -1518,6 +1525,8 @@ gaussian-5x5
15181525
[(bufimg/tensor->image downsampled-gray)
15191526
(bufimg/tensor->image avg-downsampled)]])
15201527

1528+
;; Both are grayscale. `tensor->image` handles float32 → uint8 conversion automatically.
1529+
15211530
;; Average downsampling produces smoother results with less aliasing.
15221531

15231532
;; **Verification**: Both produce same shape, but averaging reduces noise
@@ -1531,7 +1540,19 @@ gaussian-5x5
15311540

15321541
;; # Conclusion: The dtype-next Pattern
15331542

1534-
;; We've built a complete image analysis toolkit demonstrating core dtype-next concepts:
1543+
;; We started with a simple question: **Why use dtype-next for image processing?**
1544+
;;
1545+
;; Through building a complete analysis toolkit—from channel statistics to edge detection
1546+
;; to convolution—we've seen the answer in action:
1547+
;;
1548+
;; - **Efficient typed arrays** replace boxed sequences, saving memory and enabling SIMD
1549+
;; - **Zero-copy views** let us slice and transform without allocation overhead
1550+
;; - **Functional composition** keeps operations pure and composable
1551+
;; - **Immediate visual feedback** makes abstract tensor operations concrete and verifiable
1552+
;;
1553+
;; Images provided the perfect learning vehicle: every transformation has visible results
1554+
;; we can inspect in the REPL. The patterns we've practiced transfer directly to any
1555+
;; domain requiring efficient numerical computing.
15351556

15361557
;; ## Key Patterns
15371558

0 commit comments

Comments
 (0)