Add Haskell implementation to benchmark#144
Conversation
* Add Haskell dataframe benchmark entry --------- Co-authored-by: Claude <noreply@anthropic.com>
Tmonster
left a comment
There was a problem hiding this comment.
Thank you for this! Looks pretty good to me. I tried to test it on my own c6id.4xlarge instance and was getting errors at the build step. Potentially the dataframe library has been updated recently?
This is the error I saw
Downloading the latest package list from hackage.haskell.org
Package list of hackage.haskell.org is up to date.
The index-state is set to 2026-01-02T18:38:39Z.
Build profile: -w ghc-9.4.7 -O2
In order, the following will be built (use -v for more details):
- haskell-benchmark-0.1.0.0 (exe:groupby-haskell) (first run)
- haskell-benchmark-0.1.0.0 (exe:join-haskell) (first run)
Preprocessing executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Preprocessing executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
[1 of 1] Compiling Main ( groupby-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/groupby-haskell/opt/build/groupby-haskell/groupby-haskell-tmp/Main.o )
[1 of 1] Compiling Main ( join-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/join-haskell/opt/build/join-haskell/join-haskell-tmp/Main.o )
join-haskell.hs:126:34: error:
• Couldn't match expected type ‘D.Expr Double’
with actual type ‘T.Text’
• In the first argument of ‘D.columnAsDoubleVector’, namely
‘(T.pack name)’
In the expression: D.columnAsDoubleVector (T.pack name) df
In the expression:
case D.columnAsDoubleVector (T.pack name) df of
Right vec -> VU.sum vec
Left _ -> 0.0
|
126 | case D.columnAsDoubleVector (T.pack name) df of
| ^^^^^^^^^^^
groupby-haskell.hs:175:31: error:
• Couldn't match expected type ‘D.Expr Int’ with actual type ‘Text’
• In the first argument of ‘D.columnAsIntVector’, namely
‘(T.pack col)’
In the expression: D.columnAsIntVector (T.pack col) df
In the expression:
case D.columnAsIntVector (T.pack col) df of
Right vec -> fromIntegral $ VU.sum vec
Left _ -> 0.0
|
175 | case D.columnAsIntVector (T.pack col) df of
| ^^^^^^^^^^
groupby-haskell.hs:181:34: error:
• Couldn't match expected type ‘D.Expr Double’
with actual type ‘Text’
• In the first argument of ‘D.columnAsDoubleVector’, namely
‘(T.pack col)’
In the expression: D.columnAsDoubleVector (T.pack col) df
In the expression:
case D.columnAsDoubleVector (T.pack col) df of
Right vec -> VU.sum vec
Left _ -> 0.0
|
181 | case D.columnAsDoubleVector (T.pack col) df of
| ^^^^^^^^^^
Error: [Cabal-7125]
|
@Tmonster updated the implementation. I pinned it to a major version so it doesn't get broken by version updates. |
|
Hi @mchav, seems like some other package got updated causing the regression tests to start failing. I'm gonna try and fix that first, then I'll go ahead and merge this. Also, the DuckDB release was pushed back a week, so results will therefore also be about a week later |
|
@Tmonster alright. I noticed the failures in the last CI check were about trailing commas I had left in some R files. I made sure to fix those as well. |
|
Hi @mchav, Thanks, was going to mention an issue with Seems like something is wrong with how the join data file names are read/parsed? |
|
@Tmonster was a small bug when inferring how to replace the NA. Should be fixed now. |
|
@Tmonster can we run the testing workflow? I'm sure that it works not but I haven't been able to run the exact presubmit workflow locally. |
|
Hi @mchav, sorry for disappearing, got busy with some other tasks. I ran the regressions tests again. Are there any haskell specific things that should be installed in the setup_small.sh script? That also gets run in the regression.yml |
Also adds some apt installations for robustness.
|
Thanks. I assumed that green ticks in the stages meant that they ran successfully. The debug logs were extremely helpful. |
|
@Tmonster things seem to be working. Apparently still not writing to the log.csv file which I must have missed in some reference implementations. Working on that. |

Also tested out that this works end to end on a c6id.4xlarge instance.