ReSID

This repository provides a PyTorch reference implementation of the main models and training procedures described in our paper:

Yu Liang*, Zhongjin Zhang*, Yuxuan Zhu, Kerui Zhang, Zhiluohan Guo, Zhou Wenhang, Zonqi Yang, Kangle Wu, Yabo Ni, Anxiang Zeng, Cong Fu, Jianxin Wang, and Jiazhi Xia. Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs.

Overview

We propose ReSID, a recommendation-native, principled SID framework that rethinks representation learning and quantization from the perspective of information preservation and sequential predictability, without relying on LLMs. ReSID consists of two components: (i) Field-Aware Masked Auto-Encoding (FAMAE), which learns predictive-sufficient item representations from structured features, and (ii) Globally Aligned Orthogonal Quantization (GAOQ), which produces compact and predictable SID sequences by jointly reducing semantic ambiguity and prefix-conditional uncertainty.

Project Structure

The structure of this repository is as follows:

.
├── config/                   # All *.yaml configuration files for the pipeline
├── dataset/                  # Amazon-2023 review dataset processing code
├── model/                    # Model implementations
├── logger.py                 # Logging utilities for printing runtime outputs
├── main.py                   # Main entry point for training and evaluation
├── metrics.py                # Evaluation-related code
├── requirements.txt          # List of required Python packages and dependencies
├── run_pipelines.py          # One-click script to run the full ReSID pipeline
├── trainer.py                # Training script
├── utils.py                  # Training utilities, mainly for data loading
└── README.md                 # This file

Experiments

Setup

We recommend installing dependencies using requirements.txt. This setup has been tested on Ubuntu 18.04, CUDA 12.4, and Python 3.12.

pip3 install -r requirements.txt

Data Processing

Download the ten Amazon-2023 review subsets used in our experiments by running:

bash dataset/download_amazon_2023.sh
bash dataset/download_amazon_2023_statistics.sh

Preprocess the downloaded data:
```
python dataset/data_process.py
```

Training

To run ReSID, use the following command:

python run_pipelines.py --dataset Musical_Instruments --device cuda:0

Set --dataset to the name of the dataset you want to run.

Results

Citation

If you find this repository helpful, please consider citing our paper:

@misc{ReSID,
      title={Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs}, 
      author={Yu Liang and Zhongjin Zhang and Yuxuan Zhu and Kerui Zhang and Zhiluohan Guo and Wenhang Zhou and Zonqi Yang and Kangle Wu and Yabo Ni and Anxiang Zeng and Cong Fu and Jianxin Wang and Jiazhi Xia},
      year={2026},
      eprint={2602.02338},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2602.02338}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReSID

Overview

Project Structure

Experiments

Setup

Data Processing

Training

Results

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
dataset		dataset
images		images
model		model
LICENSE		LICENSE
README.md		README.md
logger.py		logger.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
run_pipelines.py		run_pipelines.py
trainer.py		trainer.py
utils.py		utils.py

License

FuCongResearchSquad/ReSID

Folders and files

Latest commit

History

Repository files navigation

ReSID

Overview

Project Structure

Experiments

Setup

Data Processing

Training

Results

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages