Skip to content

Add Haskell implementation to benchmark#144

Open
mchav wants to merge 23 commits intoduckdblabs:mainfrom
mchav:main
Open

Add Haskell implementation to benchmark#144
mchav wants to merge 23 commits intoduckdblabs:mainfrom
mchav:main

Conversation

@mchav
Copy link

@mchav mchav commented Nov 20, 2025

Also tested out that this works end to end on a c6id.4xlarge instance.

Copy link
Collaborator

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! Looks pretty good to me. I tried to test it on my own c6id.4xlarge instance and was getting errors at the build step. Potentially the dataframe library has been updated recently?

This is the error I saw

Downloading the latest package list from hackage.haskell.org
Package list of hackage.haskell.org is up to date.
The index-state is set to 2026-01-02T18:38:39Z.
Build profile: -w ghc-9.4.7 -O2
In order, the following will be built (use -v for more details):
 - haskell-benchmark-0.1.0.0 (exe:groupby-haskell) (first run)
 - haskell-benchmark-0.1.0.0 (exe:join-haskell) (first run)
Preprocessing executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Preprocessing executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'join-haskell' for haskell-benchmark-0.1.0.0...
Building executable 'groupby-haskell' for haskell-benchmark-0.1.0.0...
[1 of 1] Compiling Main             ( groupby-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/groupby-haskell/opt/build/groupby-haskell/groupby-haskell-tmp/Main.o )
[1 of 1] Compiling Main             ( join-haskell.hs, /var/lib/mount/db-benchmark-metal/haskell/dist-newstyle/build/x86_64-linux/ghc-9.4.7/haskell-benchmark-0.1.0.0/x/join-haskell/opt/build/join-haskell/join-haskell-tmp/Main.o )

join-haskell.hs:126:34: error:
    • Couldn't match expected type ‘D.Expr Double’
                  with actual type ‘T.Text’
    • In the first argument of ‘D.columnAsDoubleVector’, namely
        ‘(T.pack name)’
      In the expression: D.columnAsDoubleVector (T.pack name) df
      In the expression:
        case D.columnAsDoubleVector (T.pack name) df of
          Right vec -> VU.sum vec
          Left _ -> 0.0
    |
126 |     case D.columnAsDoubleVector (T.pack name) df of
    |                                  ^^^^^^^^^^^

groupby-haskell.hs:175:31: error:
    • Couldn't match expected type ‘D.Expr Int’ with actual type ‘Text’
    • In the first argument of ‘D.columnAsIntVector’, namely
        ‘(T.pack col)’
      In the expression: D.columnAsIntVector (T.pack col) df
      In the expression:
        case D.columnAsIntVector (T.pack col) df of
          Right vec -> fromIntegral $ VU.sum vec
          Left _ -> 0.0
    |
175 |     case D.columnAsIntVector (T.pack col) df of
    |                               ^^^^^^^^^^

groupby-haskell.hs:181:34: error:
    • Couldn't match expected type ‘D.Expr Double’
                  with actual type ‘Text’
    • In the first argument of ‘D.columnAsDoubleVector’, namely
        ‘(T.pack col)’
      In the expression: D.columnAsDoubleVector (T.pack col) df
      In the expression:
        case D.columnAsDoubleVector (T.pack col) df of
          Right vec -> VU.sum vec
          Left _ -> 0.0
    |
181 |     case D.columnAsDoubleVector (T.pack col) df of
    |                                  ^^^^^^^^^^
Error: [Cabal-7125]

@mchav
Copy link
Author

mchav commented Jan 3, 2026

@Tmonster updated the implementation. I pinned it to a major version so it doesn't get broken by version updates.

@Tmonster
Copy link
Collaborator

Hi @mchav, seems like some other package got updated causing the regression tests to start failing. I'm gonna try and fix that first, then I'll go ahead and merge this. Also, the DuckDB release was pushed back a week, so results will therefore also be about a week later

@mchav
Copy link
Author

mchav commented Jan 19, 2026

@Tmonster alright. I noticed the failures in the last CI check were about trailing commas I had left in some R files. I made sure to fix those as well.

@Tmonster
Copy link
Collaborator

Hi @mchav,

Thanks, was going to mention an issue with ver-haskell.hs but looks like you solved it. I think something may be wrong with the join script though? I ran it myself and got the following errors.

Seems like something is wrong with how the join data file names are read/parsed?

ubuntu@ip-172-31-22-80:/var/lib/mount/db-benchmark-metal$ cat out/run_haskell_join_J1_1e7_NA_0_0.err
join-haskell: ./data/J1_1e7_10e4_0_0.csv: openBinaryFile: does not exist (No such file or directory)
ubuntu@ip-172-31-22-80:/var/lib/mount/db-benchmark-metal$ ls data
G1_1e7_1e2_0_0.csv  J1_1e7_1e1_0_0.csv  J1_1e7_1e4_0_0.csv  J1_1e7_1e7_0_0.csv  J1_1e7_NA_0_0.csv

@mchav
Copy link
Author

mchav commented Jan 19, 2026

@Tmonster was a small bug when inferring how to replace the NA. Should be fixed now.

@mchav
Copy link
Author

mchav commented Feb 18, 2026

@Tmonster can we run the testing workflow? I'm sure that it works not but I haven't been able to run the exact presubmit workflow locally.

@Tmonster
Copy link
Collaborator

Hi @mchav, sorry for disappearing, got busy with some other tasks. I ran the regressions tests again.
Looks like there were failures again? The haskell results are uploaded as an artifact, you can find them here

Are there any haskell specific things that should be installed in the setup_small.sh script? That also gets run in the regression.yml
This was in the stderr

Error: [Cabal-7107]
Could not resolve dependencies:
[__0] trying: haskell-benchmark-0.1.0.0 (user goal)
[__1] trying: vector-0.13.2.0 (dependency of haskell-benchmark)
[__2] trying: dataframe-0.4.1.0 (dependency of haskell-benchmark)
[__3] trying: template-haskell-2.24.0.0/installed-7190 (dependency of dataframe)
[__4] trying: ghc-boot-th-9.14.1/installed-cae0 (dependency of template-haskell)
[__5] trying: pretty-1.1.3.6/installed-7229 (dependency of ghc-boot-th)
[__6] next goal: aeson (dependency of dataframe)
[__6] rejecting: aeson-2.2.3.0 (conflict: template-haskell==2.24.0.0/installed-7190, aeson => template-haskell>=2.14.0.0 && <2.24)
[__6] skipping: aeson; 2.2.2.0, 2.2.1.0, 2.2.0.0, 2.1.2.1, 2.1.2.0, 2.1.1.0, 2.1.0.0, 2.0.3.0, 2.0.2.0, 2.0.1.0, 2.0.0.0, 1.5.6.0, 1.5.5.1, 1.5.5.0, 1.5.4.1, 1.5.4.0, 1.5.3.0, 1.5.2.0, 1.5.1.0, 1.5.0.0, 1.4.7.1, 1.4.7.0, 1.4.6.0, 1.4.5.0, 1.4.4.0, 1.4.3.0, 1.4.2.0 (has the same characteristics that caused the previous version to fail: excludes 'template-haskell' version 2.24.0.0)
[__6] rejecting: aeson; 1.4.1.0, 1.4.0.0, 1.3.1.1, 1.3.1.0, 1.3.0.0, 1.2.4.0, 1.2.3.0, 1.2.2.0 (conflict: pretty => deepseq==1.5.1.0/installed-f671, aeson => deepseq>=1.3 && <1.5)
[__6] rejecting: aeson; 1.2.1.0, 1.2.0.0, 1.1.2.0, 1.1.1.0, 1.1.0.0 (conflict: template-haskell => base==4.22.0.0/installed-fde1, aeson => base>=4.5 && <4.13)
[__6] rejecting: aeson; 1.0.2.1, 1.0.2.0, 1.0.1.0, 1.0.0.0 (conflict: pretty => deepseq==1.5.1.0/installed-f671, aeson => deepseq>=1.3 && <1.5)
[__6] rejecting: aeson; 0.11.3.0, 0.11.2.1, 0.11.2.0, 0.11.1.4, 0.11.1.3, 0.11.1.2, 0.11.1.1, 0.11.1.0, 0.11.0.0 (conflict: template-haskell => base==4.22.0.0/installed-fde1, aeson => base>=4.5 && <4.13)
[__6] rejecting: aeson-0.9.0.1 (conflict: dataframe => aeson>=0.11.0.0 && <3)
[__6] skipping: aeson; 0.9.0.0, 0.8.1.1, 0.8.1.0, 0.8.0.2, 0.7.0.6, 0.7.0.4, 0.6.2.1, 0.6.2.0, 0.6.1.0, 0.6.0.2, 0.6.0.1, 0.6.0.0, 0.5.0.0, 0.4.0.1, 0.4.0.0, 0.3.2.14, 0.3.2.13, 0.3.2.12, 0.3.2.11, 0.3.2.10, 0.3.2.9, 0.3.2.8, 0.3.2.7, 0.3.2.6, 0.3.2.5, 0.3.2.4, 0.3.2.3, 0.3.2.2, 0.3.2.1, 0.3.2.0, 0.3.1.1, 0.3.1.0, 0.3.0.0, 0.2.0.0, 0.1.0.0, 0.10.0.0, 0.8.0.1, 0.8.0.0, 0.7.0.5, 0.7.0.3, 0.7.0.2, 0.7.0.1, 0.7.0.0 (has the same characteristics that caused the previous version to fail: excluded by constraint '>=0.11.0.0 && <3' from 'dataframe')
[__6] fail (backjumping, conflict set: aeson, dataframe, pretty, template-haskell)
After searching the rest of the dependency tree exhaustively, these were the goals I've had most trouble fulfilling: template-haskell, aeson, dataframe, ghc-boot-th, base, pretty, haskell-benchmark, vector
Try running with --minimize-conflict-set to improve the error message.

Also adds some apt installations for robustness.
@mchav
Copy link
Author

mchav commented Feb 18, 2026

Thanks. I assumed that green ticks in the stages meant that they ran successfully. The debug logs were extremely helpful.

@mchav
Copy link
Author

mchav commented Feb 22, 2026

@Tmonster things seem to be working. Apparently still not writing to the log.csv file which I must have missed in some reference implementations. Working on that.

@mchav
Copy link
Author

mchav commented Feb 22, 2026

Had a successful run of the command: MACHINE_TYPE="c6id.4xlarge" ./run.sh && ./_utils/validate_no_errors.sh.
Screenshot 2026-02-22 03 30 21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants