From 304d42c558ab91ad9f4a2946d68023f5e71ca41a Mon Sep 17 00:00:00 2001 From: Rongzhong Lian Date: Sun, 5 May 2019 11:48:31 +0800 Subject: [PATCH] Update README.md --- example/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/example/README.md b/example/README.md index 119ddef..e7c8f27 100644 --- a/example/README.md +++ b/example/README.md @@ -1,4 +1,4 @@ -#LightLDA usage +# LightLDA usage Running ```lightlda --help``` gives the usage information. @@ -26,7 +26,7 @@ LightLDA usage: -alias_capacity Memory pool size(MB) for alias table -delta_capacity Memory pool size(MB) for local delta cache ``` -#Note on the input data +# Note on the input data The input data is placed in a folder, which is specified by the command line argument ```input_dir```. @@ -34,7 +34,7 @@ This folder should contains files named as ```block.id```, ```vocab.id```. The ` The input data should be generated by the tool ```dump_binary```(released along with LightLDA), which convert the libsvm format in a binary format. This is for training efficiency consideration. -#Note on the arguments about capacity +# Note on the arguments about capacity In LightLDA, almost all the memory chunk is pre-allocated. LightLDA uses these fixed-capacity memory as memory pool. @@ -42,7 +42,7 @@ For data capacity, you should assign a value at least larger than the largest si For ```model/alias/delta capacity```, you can assign any value. LightLDA handles big model challenge under limited memory condition by model scheduling, which loads only a slice of needed parameters that can fit into the pre-allocated memory and schedules only related tokens to train. To reduce the wait time, the next slice is prefetched in the background. Empirically, ```model capacity``` and ```alias capacity``` are in same order. ```delta capacity``` can be much smaller than model/alias capacity. Logs will gives the actually memory size used at the beggning of program. You can use this information to adjust these arguments to achieve better computation/memory efficiency. -#Note on distirubted running +# Note on distirubted running Data should be distributed into different nodes.