Skip to content

Conversation

@alexanderjeurissen
Copy link

@alexanderjeurissen alexanderjeurissen commented Dec 23, 2020

Summary of changes

This PR adds a new methods to the CukeModeler::Model class:

  • fingerprint

This methods generate a Digest::MD5 hex digest of the to_s return value of a given model.
A block can be provided to control what attribute or value is used to generate the fingerprint

Use-cases / context

The use-case for this new method is easy comparison between models.

For instance, using the fingerprint method one can confirm if two Scenarios are exactly the same or if two scenarios use the exact same steps.

Below is an example cuke_linter that could be build with these changes merged in.

# frozen_string_literal: true

require "cuke_linter"

module CukeLinter
  module Linters
    # A linter that detects duplicate Scenarios 

    class ScenarioWithDuplicateSteps < Linter

      # The rule used to determine if a model has a problem
      def rule(model)
        return false unless model.is_a?(CukeModeler::Scenario)

        @scenario_fingerprints ||= {}

        fingerprint = model.fingerprint 

        @scenario_name = model.name
        @previous_fingerprint = @scenario_fingerprints[fingerprint]

        path = "#{model.path}:#{model.source_line}"

        if @previous_fingerprint.nil?
          @scenario_fingerprints[fingerprint] = path
        end

        @previous_fingerprint.present?
      end

      # The message used to describe the problem that has been found
      def message
        "Scenario '#{@name}' defined at `#{path}` is a exact duplicate of  `#{@previous_fingerprint}`"
      end
    end
  end
end

@alexanderjeurissen alexanderjeurissen changed the title Add fingerprint support WIP: Add fingerprint support Dec 23, 2020
@alexanderjeurissen alexanderjeurissen changed the title WIP: Add fingerprint support Add fingerprint support Dec 24, 2020
@coveralls
Copy link

coveralls commented Dec 24, 2020

Coverage Status

Coverage increased (+0.008%) to 99.54% when pulling c4a6a13 on alexanderjeurissen:add_fingerprint_support_to_models into 52a7ea4 on enkessler:master.

@alexanderjeurissen alexanderjeurissen force-pushed the add_fingerprint_support_to_models branch from 89b67df to b2fdbb6 Compare December 28, 2020 22:05
@alexanderjeurissen
Copy link
Author

@enkessler thanks for fixing this upstream, I just rebased are you open to merging the remainder of the changes in ?

@enkessler
Copy link
Owner

@alexanderjeurissen Admittedly, I've looked at this PR the least because it has actually significant enhancements and I've been running off to do the other quick/intriguing stuff instead (not that this stuff isn't still useful). My mind may or may not be overwhelmed by other things this week but I will get back to this in the near future.

@enkessler
Copy link
Owner

Out of curiosity, what is the main benefit of having a fingerprint method over just the == method for each model?

@alexanderjeurissen
Copy link
Author

alexanderjeurissen commented Jan 1, 2021

Out of curiosity, what is the main benefit of having a fingerprint method over just the == method for each model?

For two reasons, one is performance, the other is additional use-cases that a fingerprint enables.

Performance

String comparison is way more performant than object comparison.

I created a integration spec with Benchmark to demonstrate this:
Screen Shot 2021-01-01 at 4 23 15 PM
image

This performance difference is even more significant when taking into account that strings allow for memoization and thus faster subsequent comparisons.

with the same spec as above, with N = 10_000_000:

image

Additional use-cases

fingerprint strings allow for easier and faster lookup when traversing a directory, feature_file, or feature. Fingerprints are strings and thus can be stored in a hash, allowing for O(1) lookup time vs includes? in an array which is O(n) lookup time.

@alexanderjeurissen
Copy link
Author

It might be interesting to refactor the == methods to utilize the memoized fingerprint as to improve == performance in a followup PR.

@alexanderjeurissen
Copy link
Author

@enkessler does the above answer your question ?

@enkessler
Copy link
Owner

So #fingerprint is essentially just a custom implementation for #hash instead of using Object#hash but they serve the same purpose? I would say to just implement #hash instead of making a separate method but that would technically not be backwards compatible. So adding #fingerprint is fine for now and we can make it an alias for #hash on the next major version release.

Regarding the calculation of the hash value, it would certainly be easy on the development side to just base it off of #to_s but, due to formatting/whitespace, models can have different string representations while still being an essentially equivalent model. Also, FeatureFile models don't even include their comments in their #to_s output. For that reason, I'd rather do the long way and calculate the hash value based on the various properties of the model.

That reminds me, I still need to get around to having the string form of FeatureFile models be the text content of all of its child models plus comments instead of just being its file path but doing that requires models having associated line numbers so that the comments can be placed in the relatively correct spots in the output and, given that models can be created out of thin air with no properties at all, it might become a large pain and so I've been putting it off.

@enkessler
Copy link
Owner

@alexanderjeurissen poke

In summary: I'm okay with this as long as the fingerprint is based on the various attributes of the model instead of being based on the string output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants