Skip to content

Selection using Base functions and possibly missing values #134

@tcovert

Description

@tcovert

Suppose I have a DataFrame with two fields: idx and date. The date field has missing values (in the DataFrames sense) and is currently stored in the DataFrame as a string. Is there a query statement that I can write which parses the string into a date? I tried something like this:

df2 = @from i in df begin
       @select {i.idx, date = Date.(i.date, "mm/dd/yyyy")}
       @collect DataFrame
       end

but got an error like this:

ERROR: type UnionAll has no field parameters
Stacktrace:
 [1] column_types at /Users/tcovert/.julia/v0.6/IterableTables/src/utilities.jl:20 [inlined]
 [2] _DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:105
 [3] DataFrames.DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:128
 [4] collect(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},_} where _,Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##11#13}, ::Type{DataFrames.DataFrame}) at /Users/tcovert/.julia/v0.6/Query/src/sinks/sink_type.jl:2

I also tried a version with no dot-broadcasting:

df2 = @from i in df begin
       @select {i.idx, date = Date(i.date, "mm/dd/yyyy")}
       @collect DataFrame
       end

and got this error:

ERROR: MethodError: Cannot `convert` an object of type DataValues.DataValue{String} to an object of type Int64
This may have arisen from a call to the constructor Int64(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] next at /Users/tcovert/.julia/v0.6/Query/src/enumerable/enumerable_select.jl:41 [inlined]
 [2] macro expansion at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:91 [inlined]
 [3] _filldf(::Tuple{DataArrays.DataArray{Int64,1},Array{Date,1}}, ::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:79
 [4] _DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:119
 [5] DataFrames.DataFrame(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes.jl:128
 [6] collect(::Query.EnumerableSelect{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},Date},Query.EnumerableIterable{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},IterableTables.DataFrameIterator{NamedTuples._NT_idx_date{DataValues.DataValue{Int64},DataValues.DataValue{String}},Tuple{DataArrays.DataArray{Int64,1},DataArrays.DataArray{String,1}}}},##15#16}, ::Type{DataFrames.DataFrame}) at /Users/tcovert/.julia/v0.6/Query/src/sinks/sink_type.jl:2

is what I am trying to do possible? if so, what am I doing wrong?

thanks in advance for any suggestions you can offer.

here is some example data to apply the code to above: https://www.dropbox.com/s/kgiicawhegmtavc/query_example.csv?dl=0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions