Business attributes in Yelp Open Dataset are stored as dict. However, we require vectorized attributes in some scenarios, such as deep learning model training. This piece of script is functioned for this.
You need indicate where your Yelp business dataset are stored as input. And this script will output two files in the form of pickle file: one is a numpy array kept all the vectorized attributes, and a python dict to illustrate how the string attributes are tranformed into integers.
python attrivec.py dataset/yelp_academic_dataset_business.json .numpy = 1.24.3
pandas = 1.5.3
python = 3.9.15
The development and maintainence of this project have nothing relations with Yelp. In this project, the mention of "Yelp" is purely indicative of the fact that this project is able to work for the Yelp Open Dataset and is not affiliated with or endorsed by Yelp.