Skip to content

Conversation

@davidanthoff
Copy link
Member

This is probably a horrible idea and most likely I won't merge it. But hey, lets think about it a while. It is the only way that I can think of that would make the last query in queryverse/Query.jl#134 work out of the gate without a need to think about missing values... Maybe that goal is mistaken in the first place, though.

Essentially what this would do is treat a missing string passed to split as equivalent to an empty string...

@codecov-io
Copy link

codecov-io commented Nov 24, 2017

Codecov Report

Merging #34 into master will decrease coverage by 1.06%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
- Coverage   84.95%   83.89%   -1.07%     
==========================================
  Files          11       12       +1     
  Lines         472      478       +6     
==========================================
  Hits          401      401              
- Misses         71       77       +6
Impacted Files Coverage Δ
src/scalar/strings.jl 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45e1c18...a711151. Read the comment docs.

@tcovert
Copy link

tcovert commented Nov 24, 2017

I sorta think that if you go down this path, you ought to add lifted versions of many other string functions in the standard library: length, sizeof, invalid, all the regex stuff etc. I would totally be a fan of this.

@davidanthoff
Copy link
Member Author

Oh, I definitely want to add all those lifted versions!

The question is, what should they return when they encounter a NA. There is the philosophy that they should always propagate NA. That is nice and consistent and predictable. But I'm not sure really helpful for something like split, i.e. if the return value could either be an array or NA. But maybe that is what it should do... I'm just not sure.

@tcovert
Copy link

tcovert commented Nov 24, 2017

Ah good point. Is there any common theme across base string functions that return scalars vs arrays? In the array case (i.e. ‘split’) where the correct return value for a non-null input is, say, an empty array, returning the same thing for a null input seems reasonable. For the scalar case maybe null propagation makes more sense?

@tcovert
Copy link

tcovert commented Nov 24, 2017

Here's another thought, at least based on the split case. When you split a string on a pattern that isn't found, split seems to give you an array containing just the original string:

julia> split("abc", " ")
1-element Array{SubString{String},1}:
 "abc"

Closest analog to this that I can think of for the null case would be a single element array of a DataValue{String}().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants