This repository contains code for the texture patch classification research project.
From all given input images, texture patches are extracted and stored into TFRecord files
for later processing; this is explained below.
The following experiments can be found:
som.pycontains a Self-Organizing Maps (Kohonen Maps) clustering approach.ae.pycontains a (non-convolutional) tied-weights autoencoder that treats the inputs as an n-dimensional vector and compresses them to some m < n dimensions using standard MLP-like TanH activatedaxpyoperations.cae.pycontains a convolutional autoencoder that attempts to do the same using 2D convolutions (sigmoid activation) on 2D RGB images, rather than flat vectors.cae_tw.pylikecae.py, but using a tied-weights approach.show_energy.pyandshow_entropy.pydemonstrate energy and entropy based cropping, respectively, and are showcased below.
The show_energy.py performs energy based cropping of an image. The following
image shows the energy based on a scharr edge detector.
Alternatively, show_entropy.py determines the local entropy given a window size
and crops based on this. The following image shows the entropy given a window
of 10 pixels.
The extract_random_patches.py program loops through all files in the product-pictures
bucket of the Google Cloud project determined by the google-credentials.json.
It then assigns an int64 ID number ("iid") to each file and maps these out in the data/index.tsv
file.
The format of that index is a tab-separated plain-text file containing the ID in the first column and the file name within the bucket as the second column, e.g.
0 000/571e174e9fcd67000f55d000.jpg
1 000/571e28bc9fcd67000f55e000.jpg
2 000/571e74e39fcd67000f562000.jpg
3 000/571e976b9fcd67000f564000.jpg
...
368442 fff/5852c2efb2f46e00010dafff.jpg
368443 fff/5852ec9ab2f46e00010dbfff.jpg
368444 fff/58535a17b2f46e00010ddfff.jpg
368445 socks.jpgEach of these files is downloaded, cropped using auto-determined energy level threshold
and then run through a patch extraction.
The algorithm randomly samples RGB patches of sizes 32x32, 64x64 and 128x128,
resamples them to 32x32 and discards patches that are not distinct (per batch).
Each remaining patch is assigned an increasing int64 ID ("pid") in order to
identify the images that contained a specific patch.
The patches are then formatted as TFExample shaped protocol buffers using the
following structure:
iid: int64 the image ID
pid: int64 the patch ID
raw: [uint8] the pixel values in columnwise RGB order
The TFExamples are collected and written to gzipped TFRecord files of 100000
example entries each (resulting in approx. 150 MB per file) and stored
as data/[0-9]{5,}\.tfrecord\.gz (e.g. data/00000.tfrecord.gz).
The load_patches.py file demonstrates loading the examples back in.

