Skip to content

patch to loomcat#247

Open
Schaechtle wants to merge 2 commits intomasterfrom
20180813-schaechtle-patch-loomcat
Open

patch to loomcat#247
Schaechtle wants to merge 2 commits intomasterfrom
20180813-schaechtle-patch-loomcat

Conversation

@Schaechtle
Copy link
Collaborator

This patch is needed since the column names I am getting from loom while calling _update_state
(and _retrieve_featureid_to_cgpm, specifically) don't follow the assumptions made in the code. In fact, the column names I am getting from loom are the actual natural language column names which seems correct (thus patching cgpm).

I have a test case for this patch which depends on loom and bql -- I initially planned to include the
test case here (i.e the cgpm repo). Looking at the test suite however, I am not sure anymore if this makes sense (not much loom and BQL around in cgpm/test); I am now planning to include this test in another repo focussed only on exporting models from loom. Open for suggestions here.

Tested using the probcomp dev image as folllows:

 2018-08-13 14:47:24 ⌚  ulli-notebook in ~/cgpm
○ → pytest tests/

Result:

454 passed, 8 skipped, 2 xfailed in 363.28 seconds

@Schaechtle Schaechtle requested a review from fsaad August 13, 2018 14:57
@fsaad
Copy link
Collaborator

fsaad commented Aug 13, 2018

The loomcat conversion is assuming that the loom project on disk was created using loomcat.initialize(state), in which case the loom column names are of the form c%05d. This encoding scheme is needed since cgpm does not have a notion of natural language column names, only column numbers.

When the loom project on disk was created using BayesDB's loom_backend class instead of cgpm's loomcat.initialize method, the loom column names are the natural language column names since these are known to BayesDB.

The assumptions in the code should be further loosened by:

  • Making colname_colno_mapping a required argument of loomcat._retrieve_featureid_to_cgpm and loomcat._update_state.
  • Updating loomcat.transition and loomcat.transition_engine to use the output mapping given by _generate_column_names and pass this mapping into _update_state.

Copy link
Collaborator

@fsaad fsaad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #247 (comment) for requested change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants