Copied (and adapted) existing k-means test cases to new k-means#3
Copied (and adapted) existing k-means test cases to new k-means#3fschopp wants to merge 5 commits intomadlib:masterfrom
Conversation
|
@huor It seems that |
|
@huor Are there assumption on the names in |
|
The algorithms/methods defined in algorithms.xml, and datasets defined in dataset.xml will be used in kmeans.xml. |
|
The information in dataset.xml and kmeans.xml is not redundant. The the datasets defined dataset.xml is a kind of algorithms-prarameters combination, it will ease us to write cases in kmeans.xml |
|
So, you are saying that, e.g., the fact that test suite |
|
Regarding the naming conventions: Yes, I see that some names are references. But I am wondering if you also make assumption that names are composed of different substrings? Do you ever concatenate string and then assume that there is a certain algorithm/test suite/method with that composed name? |
testspec/casespec/kmeans.xml
Outdated
There was a problem hiding this comment.
I found that I had to explicitly include the madlib schema, e.g.: madlib.squared_dist_norm1
(Similar change had to be made in the algorithmspec.xml file for madlib.avg)
|
The MADmark Installation Guide that Jiali sent explains everything pretty well. Here's a summary of how to run select test cases: After updating the 3 XML files accordingly, run 'cd $MADMARK_HOME/bin; python run.py -g' to generate all your test cases. So, if in kmeans.xml, you have a test_suite named "kmeans_baseline" and you have a total of 4 combinations of parameters to test, you'll be generating four case files in $MADMARK_HOME/testcase: To run select test cases:
|
…ml, to track changes compared to old kmeans test cases.
…d. Now use CTAS workaround to be compatible with older versions of Greenplum.
testspec/metadata/algorithmspec.xml
Outdated
There was a problem hiding this comment.
I think we still need the "DROP TABLE {table_name};" in order to be able to run different tests using the same dataset (otherwise, after using the dataset table_name once, all other tests that use the same dataset ERROR out since the table_name already exists).
There was a problem hiding this comment.
I think a teardown section should be used for that. This is what I did.
…a pair. Added declarations for kmeanspp_seeding and kmeans_random_seeding.
More complex arguments may contain quotes, e.g., ARRAY['madlib.squared_dist_norm2','madlib.dist_norm2']. Previously, quotes did not pass through the shell invocation but caused errors.
This is still work in progress. Pull request opened to facilitate discussion. Please only merge when requested.