PixPlot for Python 3.10

PixPlot Installation Guide

This guide will help you install PixPlot for Python 3.10 environments. PixPlot is a tool for visualizing large image collections using WebGL, machine learning, and dimensionality reduction techniques.

Prerequisites

Python 3.10 (required)
Conda or Miniconda (recommended for environment management)

Installation Steps

1. Create a Python 3.10 Environment

First, create a dedicated Python 3.10 environment:

Using Conda:

conda create -n pixplot python=3.10
conda activate pixplot

Using venv (if not using Conda):

python -m venv pixplot-env
# On Windows
pixplot-env\Scripts\activate
# On macOS/Linux
source pixplot-env/bin/activate

2. Clone the Repository

https://github.com/XLabCU/pix-plot.git
cd pix-plot

3. Run the Installation Script

Run the installation script which will install all required dependencies:

python install.py

This script will:

Install numpy 1.22.4
Install TensorFlow 2.13.0
Install critical dependencies (scipy, matplotlib, scikit-learn, umap-learn, etc.)
Install the Yale fork of rasterfairy
Install MulticoreTSNE (if conda is available)
Install PixPlot itself

4. Fix the Rasterfairy Module

After installation, you'll need to replace the rasterfairy.py file with the updated version to ensure compatibility with Python 3.10:

Locate the rasterfairy module in your environment:

For Conda environments:

# Find the path to your environment
conda env list

# Navigate to the site-packages directory
cd /path/to/your/conda/envs/pixplot/lib/python3.10/site-packages/rasterfairy

For venv environments:

# Navigate to the site-packages directory
cd pixplot-env/lib/python3.10/site-packages/rasterfairy

Replace the content of the rasterfairy.py file with the provided fix:
- Copy the entire content from rasterfairy.py in this repository
- Paste it into the rasterfairy.py file, replacing all existing content
You can do this with a text editor or use these commands:
```
# Copy the fixed version over the original
# Assuming you're in this repository folder
cp rasterfairy.py /path/to/site-packages/rasterfairy/rasterfairy.py
```

5. Verify the Installation

To verify the installation was successful:

python -c "import pixplot; print('PixPlot successfully installed')"

If no errors appear and you see the success message, the installation is complete.

Usage

You can now use PixPlot with Python 3.10. Basic usage:

# Process a folder of images
pixplot --images "path/to/images/*.jpg"

# With network visualization:
pixplot --images "path/to/images/*.jpg" --network_n_neighbors 5 --network_edge_threshold 0.7 --network_layout_iterations 100

# Run a local web server to view the visualization
pixplot --serve

A note on the network things

You can use the stand-alone script to create network files for your own analysis and visualization.

The 'baked-in' visualization of network neighbours uses these conventions:

Weight-Based Coloring: Added color gradients based on edge weights:
- Stronger connections (higher weights) appear more white/bright
- Weaker connections appear more blue
- This creates a visual hierarchy where stronger connections stand out
Thicker Lines: Increased the linewidth from 1 to 2 to make edges more visible. Note that in WebGL, linewidth values greater than 1 are not supported on all platforms, but we've set it anyway for platforms that do support it.
Vertex Colors: Implemented vertex coloring to allow gradient effects along edges.
Weight Normalization: Added code to find the maximum weight and normalize all weights, ensuring consistent visual scaling regardless of the actual weight range.

Troubleshooting

If you encounter errors related to rasterfairy after following these steps, verify that:

You've properly replaced the rasterfairy.py file with the provided fix
Your Python environment is exactly 3.10.x
You have the correct numpy version (1.22.4)

(original readme:)

PixPlot

This repository contains code that can be used to visualize tens of thousands of images in a two-dimensional projection within which similar images are clustered together. The image analysis uses Tensorflow's Inception bindings, and the visualization layer uses a custom WebGL viewer.

See the change log for recent updates.

Installation & Dependencies

We maintain several platform-specific installation cookbooks online.

Broadly speaking, to install the Python dependencies, we recommend you install Anaconda and then create a conda environment with a Python 3.7 runtime:

conda create --name=3.7 python=3.7
source activate 3.7

Then you can install the dependencies by running:

bash
pip install https://github.com/yaledhlab/pix-plot/archive/master.zip

The website that PixPlot eventually creates requires a WebGL-enabled browser.

Quickstart

If you have a WebGL-enabled browser and a directory full of images to process, you can prepare the data for the viewer by installing the dependencies above then running:

pixplot --images "path/to/images/*.jpg"

To see the results of this process, you can start a web server by running:

# for python 3.x
python -m http.server 5000

# for python 2.x
python -m SimpleHTTPServer 5000

The visualization will then be available at http://localhost:5000/output.

Sample Data

To acquire some sample data with which to build a plot, feel free to use some data prepared by Yale's DHLab:

pip install image_datasets

Then in a Python script:

import image_datasets
image_datasets.oslomini.download()

The .download() command will make a directory named datasets in your current working directory. That datasets directory will contain a subdirectory named 'oslomini', which contains a directory of images and another directory with a CSV file of image metadata. Using that data, we can next build a plot:

pixplot --images "datasets/oslomini/images/*" --metadata "datasets/oslomini/metadata/metadata.csv"

Creating Massive Plots

If you need to plot more than 100,000 images but don't have an expensive graphics card with which to visualize huge WebGL displays, you might want to specify a smaller "cell_size" parameter when building your plot. The "cell_size" argument controls how large each image is in the atlas files; smaller values require fewer textures to be rendered, which decreases the GPU RAM required to view a plot:

pixplot --images "path/to/images/*.jpg" --cell_size 10

Controlling UMAP Layout

The UMAP algorithm is particularly sensitive to three hyperparemeters:

--min_dist: determines the minimum distance between points in the embedding
--n_neighbors: determines the tradeoff between local and global clusters
--metric: determines the distance metric to use when positioning points

UMAP's creator, Leland McInnes, has written up a helpful overview of these hyperparameters. To specify the value for one or more of these hyperparameters when building a plot, one may use the flags above, e.g.:

pixplot --images "path/to/images/*.jpg" --n_neighbors 2

Curating Automatic Hotspots

If installed and available, PixPlot uses Hierarchical density-based spatial clustering of applications with noise, a refinement of the earlier DBSCAN algorithm, to find hotspots in the visualization. You may be interested in consulting this explanation of how HDBSCAN works.

Tip: If you are using HDBSCAN and find that PixPlot creates too few (or only one) 'automatic hotspots', try lowering the --min_cluster_size from its default of 20. This often happens with smaller datasets (less than a few thousand.)

If HDBSCAN is not available, PixPlot will fall back to scikit-learn's implementation of KMeans.

Adding Metadata

If you have metadata associated with each of your images, you can pass in that metadata when running the data processing script. Doing so will allow the PixPlot viewer to display the metadata associated with an image when a user clicks on that image.

To specify the metadata for your image collection, you can add --metadata=path/to/metadata.csv to the command you use to call the processing script. For example, you might specify:

pixplot --images "path/to/images/*.jpg" --metadata "path/to/metadata.csv"

Metadata should be in a comma-separated value file, should contain one row for each input image, and should contain headers specifying the column order. Here is a sample metadata file:

filename	category	tags	description	permalink	Year
bees.jpg	yellow	a\|b\|c	bees' knees	https://...	1776
cats.jpg	dangerous	b\|c\|d	cats' pajamas	https://...	1972

The following column labels are accepted:

Column	Description
filename	the filename of the image
category	a categorical label for the image
tags	a pipe-delimited list of categorical tags for the image
description	a plaintext description of the image's contents
permalink	a link to the image hosted on another domain
year	a year timestamp for the image (should be an integer)
label	a categorical label used for supervised UMAP projection
lat	the latitudinal position of the image
lng	the longitudinal position of the image

IIIF Images

If you would like to process images that are hosted on a IIIF server, you can specify a newline-delimited list of IIIF image manifests as the --images argument. For example, the following could be saved as manifest.txt:

https://manifests.britishart.yale.edu/manifest/40005
https://manifests.britishart.yale.edu/manifest/40006
https://manifests.britishart.yale.edu/manifest/40007
https://manifests.britishart.yale.edu/manifest/40008
https://manifests.britishart.yale.edu/manifest/40009

One could then specify these images as input by running pixplot --images manifest.txt --n_clusters 2

Demonstrations (Developed with PixPlot 2.0 codebase)

Link	Image Count	Collection Info	Browse Images	Download for PixPlot
NewsPlot: 1910-1912	24,026	George Grantham Bain Collection	News in the 1910s	Images, Metadata
Bildefelt i Oslo	31,097	oslobilder	Advanced search, 1860-1924	Images, Metadata

Acknowledgements

The DHLab would like to thank Cyril Diagne and Nicolas Barradeau, lead developers of the spectacular Google Arts Experiments TSNE viewer, for generously sharing ideas on optimization techniques used in this viewer, and Lillianna Marie for naming this viewer PixPlot.

Name		Name	Last commit message	Last commit date
Latest commit History 859 Commits
pixplot		pixplot
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
install.py		install.py
oslomini.png		oslomini.png
pixplot-network-tool-README.md		pixplot-network-tool-README.md
pixplot_network_export.py		pixplot_network_export.py
rasterfairy.py		rasterfairy.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PixPlot for Python 3.10

PixPlot Installation Guide

Prerequisites

Installation Steps

1. Create a Python 3.10 Environment

2. Clone the Repository

3. Run the Installation Script

4. Fix the Rasterfairy Module

5. Verify the Installation

Usage

A note on the network things

Troubleshooting

PixPlot

Installation & Dependencies

Quickstart

Sample Data

Creating Massive Plots

Controlling UMAP Layout

Curating Automatic Hotspots

Adding Metadata

IIIF Images

Demonstrations (Developed with PixPlot 2.0 codebase)

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

XLabCU/pix-plot

Folders and files

Latest commit

History

Repository files navigation

PixPlot for Python 3.10

PixPlot Installation Guide

Prerequisites

Installation Steps

1. Create a Python 3.10 Environment

2. Clone the Repository

3. Run the Installation Script

4. Fix the Rasterfairy Module

5. Verify the Installation

Usage

A note on the network things

Troubleshooting

PixPlot

Installation & Dependencies

Quickstart

Sample Data

Creating Massive Plots

Controlling UMAP Layout

Curating Automatic Hotspots

Adding Metadata

IIIF Images

Demonstrations (Developed with PixPlot 2.0 codebase)

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages