rethink Evaluator by avibryant · Pull Request #78 · stripe-archive/brushfire

avibryant · 2015-12-28T23:01:09Z

This makes a few related changes to Evaluator:

it decouples it from Split and just has it directly operate on Iterable[T], which was the only part of a Split it ever cared about anyway.
this makes it possible to apply an Evaluator to a tree or sub-tree, which could be useful in various ways in the future (eg if we want to change Splitter to produce candidate sub-trees instead of splits)
in this context the Iterable[T] represents the leaves of the sub-tree, and so it's now labeled as such
rather than using Infinity to represent an unacceptable level of error, it now returns Option[Double] so you can return None to signal that
the sign has been reversed (ie this is now an error value not a "goodness" value)
it doesn't return a new Split, just the error
the root T is now also provided. None of the evaluators makes use of it, but there are at least two potential uses I have in mind:
- considering a split unacceptable based on some ratio of the leaf T to the root T (eg, "all leaves must contain at least X% of the input Ys")
- using more global optimization criteria (this comes up in particular in the "explanation tree" use case which I won't explain in more detail here)

This is WIP because I haven't fully updated the training code to make use of it.

avibryant · 2015-12-29T04:18:46Z

@tixxit ok I think this is ready now for review

avibryant · 2015-12-29T04:19:46Z

brushfire-core/src/main/scala/com/stripe/brushfire/Brushfire.scala

It seems a bit arbitrary to make this be a Double - it could be some generic F:Ordering, but then again I've got a bit of type param exhaustion.

I also wonder whether the Option[T] is actually confounding two concepts: "unacceptable" should really be something that Stopper determines separately. That is, Evaluator would always return Double but Stopper would include both isWorthTryingToSplit and isAcceptableSplitOutcome.

So, I think we should stick with Double instead of Option[Double]. My guess is that a NaN value would be totally fine to use for this purpose (it certainly won't be a valid score).

The idea of using some Double => Boolean values to test criteria seems fine to me. I just think we don't need to create more boxes in this case.

(Sorry I am late to this party.)

Re: Double vs new type param - I'm definitely pro-Double until we reach a point where we have an Evaluator where Double won't work anymore.

…mple

avibryant · 2016-02-04T00:33:57Z

Got rid of the root part, and merged with #77 which should now be landed first.

tixxit · 2016-02-04T15:12:26Z

brushfire-training/src/main/scala/com/stripe/brushfire/training/Brushfire.scala

-  def evaluate(split: Split[V, T]): Option[(Split[V, T], Double)]
+trait Evaluator[T] {
+  /** returns an overall numeric training error for a tree or split, or None for infinite/unacceptable error */
+  def trainingError(leaves: Iterable[T]): Option[Double]


👍 on the rename!

(Also, 1 less type param!)

erik-stripe · 2016-02-04T19:19:34Z

We should probably remove all the commented code in this PR once we agree it is really gone.

avi-stripe added 10 commits December 20, 2015 16:13

working through a TrainingStep abstraction

963e9d6

add validation step, don't use distributed stopping criterion

5bcace3

split out brushfire-training module

a89cf33

move TrainingStep into training package

981a721

moving stuff

940e7eb

reorganize to separate training out better

d77003b

rethink evaluator to operate on root:T, leaves:Iterable[T]

14e55be

make the local trainer build

af81e73

update scalding trainer

e77e51f

update example scripts

23a4517

avibryant changed the title ~~WIP: rethink Evaluator~~ rethink Evaluator Dec 29, 2015

avibryant reviewed Dec 29, 2015
View reviewed changes

avi-stripe added 6 commits January 26, 2016 14:43

use Erik's Lines.scala to actually stream over local input in the exa…

ce97a1f

…mple

update to new master

7001c76

builds post-merge

6c1a3f4

update versions in iris scripts

6908ef6

fix merge conflicts

ba72df1

fix merge errors

8efd0aa

tixxit reviewed Feb 4, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

rethink Evaluator#78

rethink Evaluator#78
avibryant wants to merge 16 commits intomasterfrom
avi-new-evaluator

avibryant commented Dec 28, 2015

Uh oh!

avibryant commented Dec 29, 2015

Uh oh!

avibryant Dec 29, 2015

Uh oh!

avibryant Dec 29, 2015

Uh oh!

erik-stripe Jan 4, 2016

Uh oh!

tixxit Jan 4, 2016

Uh oh!

avibryant commented Feb 4, 2016

Uh oh!

tixxit Feb 4, 2016

Uh oh!

tixxit Feb 4, 2016

Uh oh!

erik-stripe commented Feb 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

avibryant commented Dec 28, 2015

Uh oh!

avibryant commented Dec 29, 2015

Uh oh!

avibryant Dec 29, 2015

Choose a reason for hiding this comment

Uh oh!

avibryant Dec 29, 2015

Choose a reason for hiding this comment

Uh oh!

erik-stripe Jan 4, 2016

Choose a reason for hiding this comment

Uh oh!

tixxit Jan 4, 2016

Choose a reason for hiding this comment

Uh oh!

avibryant commented Feb 4, 2016

Uh oh!

tixxit Feb 4, 2016

Choose a reason for hiding this comment

Uh oh!

tixxit Feb 4, 2016

Choose a reason for hiding this comment

Uh oh!

erik-stripe commented Feb 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants