Add fingerprint support #7

alexanderjeurissen · 2020-12-23T22:54:41Z

Summary of changes

This PR adds a new methods to the CukeModeler::Model class:

fingerprint

This methods generate a Digest::MD5 hex digest of the to_s return value of a given model.
A block can be provided to control what attribute or value is used to generate the fingerprint

Use-cases / context

The use-case for this new method is easy comparison between models.

For instance, using the fingerprint method one can confirm if two Scenarios are exactly the same or if two scenarios use the exact same steps.

Below is an example cuke_linter that could be build with these changes merged in.

# frozen_string_literal: true

require "cuke_linter"

module CukeLinter
  module Linters
    # A linter that detects duplicate Scenarios 

    class ScenarioWithDuplicateSteps < Linter

      # The rule used to determine if a model has a problem
      def rule(model)
        return false unless model.is_a?(CukeModeler::Scenario)

        @scenario_fingerprints ||= {}

        fingerprint = model.fingerprint 

        @scenario_name = model.name
        @previous_fingerprint = @scenario_fingerprints[fingerprint]

        path = "#{model.path}:#{model.source_line}"

        if @previous_fingerprint.nil?
          @scenario_fingerprints[fingerprint] = path
        end

        @previous_fingerprint.present?
      end

      # The message used to describe the problem that has been found
      def message
        "Scenario '#{@name}' defined at `#{path}` is a exact duplicate of  `#{@previous_fingerprint}`"
      end
    end
  end
end

coveralls · 2020-12-24T01:20:55Z

Coverage increased (+0.008%) to 99.54% when pulling c4a6a13 on alexanderjeurissen:add_fingerprint_support_to_models into 52a7ea4 on enkessler:master.

lib/cuke_modeler/models/row.rb

alexanderjeurissen · 2020-12-28T22:05:49Z

@enkessler thanks for fixing this upstream, I just rebased are you open to merging the remainder of the changes in ?

enkessler · 2020-12-28T23:06:09Z

@alexanderjeurissen Admittedly, I've looked at this PR the least because it has actually significant enhancements and I've been running off to do the other quick/intriguing stuff instead (not that this stuff isn't still useful). My mind may or may not be overwhelmed by other things this week but I will get back to this in the near future.

enkessler · 2021-01-01T02:40:17Z

Out of curiosity, what is the main benefit of having a fingerprint method over just the == method for each model?

alexanderjeurissen · 2021-01-01T15:35:16Z

Out of curiosity, what is the main benefit of having a fingerprint method over just the == method for each model?

For two reasons, one is performance, the other is additional use-cases that a fingerprint enables.

Performance

String comparison is way more performant than object comparison.

I created a integration spec with Benchmark to demonstrate this:

This performance difference is even more significant when taking into account that strings allow for memoization and thus faster subsequent comparisons.

with the same spec as above, with N = 10_000_000:

Additional use-cases

fingerprint strings allow for easier and faster lookup when traversing a directory, feature_file, or feature. Fingerprints are strings and thus can be stored in a hash, allowing for O(1) lookup time vs includes? in an array which is O(n) lookup time.

alexanderjeurissen · 2021-01-01T15:58:31Z

It might be interesting to refactor the == methods to utilize the memoized fingerprint as to improve == performance in a followup PR.

alexanderjeurissen · 2021-01-06T01:03:01Z

@enkessler does the above answer your question ?

enkessler · 2021-01-06T04:52:25Z

So #fingerprint is essentially just a custom implementation for #hash instead of using Object#hash but they serve the same purpose? I would say to just implement #hash instead of making a separate method but that would technically not be backwards compatible. So adding #fingerprint is fine for now and we can make it an alias for #hash on the next major version release.

Regarding the calculation of the hash value, it would certainly be easy on the development side to just base it off of #to_s but, due to formatting/whitespace, models can have different string representations while still being an essentially equivalent model. Also, FeatureFile models don't even include their comments in their #to_s output. For that reason, I'd rather do the long way and calculate the hash value based on the various properties of the model.

That reminds me, I still need to get around to having the string form of FeatureFile models be the text content of all of its child models plus comments instead of just being its file path but doing that requires models having associated line numbers so that the comments can be placed in the relatively correct spots in the output and, given that models can be created out of thin air with no properties at all, it might become a large pain and so I've been putting it off.

enkessler · 2021-01-19T19:16:17Z

@alexanderjeurissen poke

In summary: I'm okay with this as long as the fingerprint is based on the various attributes of the model instead of being based on the string output.

alexanderjeurissen changed the title ~~Add fingerprint support~~ WIP: Add fingerprint support Dec 23, 2020

alexanderjeurissen changed the title ~~WIP: Add fingerprint support~~ Add fingerprint support Dec 24, 2020

enkessler reviewed Dec 24, 2020

View reviewed changes

lib/cuke_modeler/models/row.rb Show resolved Hide resolved

alexanderjeurissen added 6 commits December 28, 2020 23:04

add fingerprint support

66ee501

Add support for nil values

3d84d0a

remove unneeded model specs

b8c484f

split up methods fingerprint + fingerprint_children

177abd1

remove fingerprint_children as it was too specific

ef03868

Fix specs

b2fdbb6

alexanderjeurissen force-pushed the add_fingerprint_support_to_models branch from 89b67df to b2fdbb6 Compare December 28, 2020 22:05

alexanderjeurissen added 2 commits January 1, 2021 16:39

memoize fingerprint, and add performance specs

580cec0

recalculate when a block is given

c4a6a13

enkessler added the enhancement label Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fingerprint support #7

Add fingerprint support #7

Uh oh!

alexanderjeurissen commented Dec 23, 2020 •

edited

Loading

Uh oh!

coveralls commented Dec 24, 2020 •

edited

Loading

Uh oh!

Uh oh!

alexanderjeurissen commented Dec 28, 2020

Uh oh!

enkessler commented Dec 28, 2020

Uh oh!

enkessler commented Jan 1, 2021

Uh oh!

alexanderjeurissen commented Jan 1, 2021 •

edited

Loading

Uh oh!

alexanderjeurissen commented Jan 1, 2021

Uh oh!

alexanderjeurissen commented Jan 6, 2021

Uh oh!

enkessler commented Jan 6, 2021

Uh oh!

enkessler commented Jan 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add fingerprint support #7

Are you sure you want to change the base?

Add fingerprint support #7

Uh oh!

Conversation

alexanderjeurissen commented Dec 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Use-cases / context

Uh oh!

coveralls commented Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

alexanderjeurissen commented Dec 28, 2020

Uh oh!

enkessler commented Dec 28, 2020

Uh oh!

enkessler commented Jan 1, 2021

Uh oh!

alexanderjeurissen commented Jan 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Additional use-cases

Uh oh!

alexanderjeurissen commented Jan 1, 2021

Uh oh!

alexanderjeurissen commented Jan 6, 2021

Uh oh!

enkessler commented Jan 6, 2021

Uh oh!

enkessler commented Jan 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexanderjeurissen commented Dec 23, 2020 •

edited

Loading

coveralls commented Dec 24, 2020 •

edited

Loading

alexanderjeurissen commented Jan 1, 2021 •

edited

Loading