-
Notifications
You must be signed in to change notification settings - Fork 0
Ted96/kmeans_LSH_cpp
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
------------------------------------------------------
Theodoros Anagnostopoulos / sdi1400009 / project2
------------------------------------------------------
(sorry gia ta greeklish)
---------------- HOW TO RUN ----------------------
{{compilation}}:
$make
{{run}}:
$./cluster -h (show help)
$./cluster -i xxx -d xxx -c xxx -a
!!!!!!! -complete = -a !!!!!!!
------------------ INPUT FILES ---------------------------
global metavlhtes sto global .h/.cpp
DImension = 203
Config file :
den einai ipoxrewtiko
EXTRA epiloges:
max kmeans iterations (n)
check g()'s in lsh?? (0/1)
----------------- SXEDIASH KWDIKA 1/2 --------------------
------- PARADOXES - WARNINGS ----------
break critiria()::
exw valei thn mikrh sinolikh
allagh twn kentrwn, thn mikrh allagh twn dianismatwn se
clusters, kai to max iterations ( default = -->globals.cpp)
exw prospa8hsei na apofygw ta infinte loops tou lsh
kanontas ena check me tis previous times twn number_of_changes
radius lsh assign search
kanw hardcode to R otan paei poli konta sto max_distance between
datapoins. px otan R= 1 den exei nohma na paei sto R=2
giati 8a epistrafoun polla datapoints gia ka8e kentro. kai apo
tis ektipwseis pou ekana ( y_cluster_changes_per_iteration)
, ta datapoints poy phgainoun se diaforetiko cluster gia R=2 search
den einai poly perissotera apo auta gia R=1.
opote save some time.
hash_euclidean_cube()
koitaei mexri kai Hamming distance = 2
----------------- SXEDIASH KWDIKA 2/2 --------------------
silhouete + medoids: unordered_map gia na krataw tis apostaseis
elegxw ta G(query) by default, alla pisteuw oti den prepei na to kanoume
auto gia to Radius search. Opote evala +epilogh sto c_file
to lsh/cube search se mia sinarthsh h opoia kaleitai ws
lsh_search( query , radius , dataset , mode=1 or 2 )
auto den to vlepoun oi alles sinarthseis ka8ws h ilopoihsh
tous (project 1) egine mesa se vivlio8hkh .a / .so
diafores apostaseis sto distance_fucntions.cpp
parsing twn arxeiwn ths main, sto main's function.
o kmeans exei mia genikh sinarthsh run()
kai diafores parallages gia init() , update(), assign() ::
kmeans.cpp::
<<<genikes sinarthseis , silhouette , contructors>>>
kmeans_functions.cpp::
<<< main project >>>
xrhsimopoiountai arketes global metavlhtes.
giati oxi?
--------------- PARAMETERS / METRHSEIS ------------------
>sto results2.txt
kala w: ~1.5 - 2.0
kalo range ri: sini8ws mikra..
~
ta w prepei na einai se range tetoio wste ena megalo
pososto twn kontinoteron geitonwn enos query na
vriskontai sto distance = [3w , 4w]
gia k=20:: mesos oros S = 0.06
~
ta klasika runs, px loyd - medoid
den kanoun pote la8h, ka8ws einai brute force search,
opote to Si antiproswpeuei ka8ara sthn epilogh tou K
kai isws ligo apo thn arxkh init twn kentrwn
genikotera fainontai pio sta8era apo to LSH.
Otan sigrinoume K1-means loyd run me K2-means loyd run
sthn ousia apofasizoume posa cluster einai kalitero na exoume
~
te medoids xalane to lsh / cube search. infinite loop enalla3 ta kentra
den proteinetai medoids + lsh or cube
ta medoids siglinoun para poli grhgora se epanalhpseis ,
alla kanoun poli perissotero xrono, opote den 3erw an worth it se sxesh
me to loyd. isws an eixame noise sto dataset, auta na eixane kalitero S
pros stigmhn einai poli xeirotera (isws exw kanei la8os?)
~
to capacity_percent (px. n/8 , n/4) den metraei sxedon ka8olou sto Si
to LSH anamenomena htan arketa grhgoro, alla diolou spania emfanizontai
fails::
1 fail = 1 dianisma den kathgoriopoih8hke ston nearest neighbor.
opote exei auto na aphreazei arketa ta pososta Si
~
sto range assign : mporei na einai arketa grhgoro alla tautoxrona kai
ipervolika argo, ka8ws peutei se infinite loops kai to termatizw me
si8hkes sigklishs.
~
hypercube den ta paei poli kala.
th mia kanei S = 0.03 kai time = 76" kai thn allh
kanei S = 0.06 kai time = 17"
den poli asxolh8hka me auto. mporei na ftaei to
probes = (elegxe ola ta hamming dist= 2)
~
sto lsh_assign radius search::
otan to R paei poli konta sto max_distance between
datapoins. (px otan R= 1 ) den exei nohma na paei sto R=2 giati:
8a epistrafoun polla datapoints gia ka8e kentro.
apo tis ektipwseis pou ekana ( y_cluster_changes_per_iteration)
, ta datapoints poy phgainoun se diaforetiko cluster gia R=2 search
den einai poly perissotera apo auta gia R=1 .
About
C++ Implementation of Kmeans algorithm , to cluster high dimensional data
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published