Skip to content
This repository was archived by the owner on Jan 26, 2021. It is now read-only.
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions example/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#LightLDA usage
# LightLDA usage

Running ```lightlda --help``` gives the usage information.

Expand Down Expand Up @@ -26,23 +26,23 @@ LightLDA usage:
-alias_capacity <arg> Memory pool size(MB) for alias table
-delta_capacity <arg> Memory pool size(MB) for local delta cache
```
#Note on the input data
# Note on the input data

The input data is placed in a folder, which is specified by the command line argument ```input_dir```.

This folder should contains files named as ```block.id```, ```vocab.id```. The ```id``` is range from 0 to N-1 where ```N``` is the number of data block.

The input data should be generated by the tool ```dump_binary```(released along with LightLDA), which convert the libsvm format in a binary format. This is for training efficiency consideration.

#Note on the arguments about capacity
# Note on the arguments about capacity

In LightLDA, almost all the memory chunk is pre-allocated. LightLDA uses these fixed-capacity memory as memory pool.

For data capacity, you should assign a value at least larger than the largest size of your binary training block file(generated by ```dump_binary```, see Note on input data above).

For ```model/alias/delta capacity```, you can assign any value. LightLDA handles big model challenge under limited memory condition by model scheduling, which loads only a slice of needed parameters that can fit into the pre-allocated memory and schedules only related tokens to train. To reduce the wait time, the next slice is prefetched in the background. Empirically, ```model capacity``` and ```alias capacity``` are in same order. ```delta capacity``` can be much smaller than model/alias capacity. Logs will gives the actually memory size used at the beggning of program. You can use this information to adjust these arguments to achieve better computation/memory efficiency.

#Note on distirubted running
# Note on distirubted running

Data should be distributed into different nodes.

Expand Down