Strategy: 1) export / extract the codes for the targets from set A 2) classify, then export/extract codes for the predictions from set B 3) use standard chunk evaluation code (huggingface?) to calculate metrics