Conversation
There was a problem hiding this comment.
20 seems pretty small, I'd think we'd want to allow at least 100 or 200 here.
|
@avibryant So, this has some of the stuff I've been doing (still very WIP), but the important bits are:
There is an implementation, The main goal of the |
| a <- this.uniques | ||
| b <- that.uniques | ||
| c = a ++ b | ||
| if (c.size < 20) |
There was a problem hiding this comment.
As @avibryant mentioned earlier, this should probably be bumped up or made configurable.
| import com.twitter.algebird.{ Aggregator, Semigroup, Monoid } | ||
|
|
||
| case class DispatchedFeatureEncoding[L](encoder: DispatchedFeatureEncoder) | ||
| extends FeatureEncoding[String, FeatureValue, Map[L, Long]] with Defaults { |
There was a problem hiding this comment.
Sorry for the terrible name collisions here, but FeatureValue is a type alias for Dispatched[Double, String, Double, String]
This is super early work, but we have a CsvTrainerJob, which can run on ~arbitrary CSVs, with the labels provided by the user. The actual types of the values will be inferred before training.