Skip to content

Yelp Open Dataset business attributes vectorize tool.

Notifications You must be signed in to change notification settings

zhwang3/attrivec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Yelp Open Dataset business attributes vectorize tool

About

Business attributes in Yelp Open Dataset are stored as dict. However, we require vectorized attributes in some scenarios, such as deep learning model training. This piece of script is functioned for this.

Input and Output

You need indicate where your Yelp business dataset are stored as input. And this script will output two files in the form of pickle file: one is a numpy array kept all the vectorized attributes, and a python dict to illustrate how the string attributes are tranformed into integers.

Running

python attrivec.py dataset/yelp_academic_dataset_business.json .

Dependencies

numpy = 1.24.3
pandas = 1.5.3
python = 3.9.15

Disclaimer

The development and maintainence of this project have nothing relations with Yelp. In this project, the mention of "Yelp" is purely indicative of the fact that this project is able to work for the Yelp Open Dataset and is not affiliated with or endorsed by Yelp.

About

Yelp Open Dataset business attributes vectorize tool.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages