Skip to content

Usmarc record terminator#4

Open
tsbere wants to merge 3 commits intoperl4lib:masterfrom
tsbere:usmarc_record_terminator
Open

Usmarc record terminator#4
tsbere wants to merge 3 commits intoperl4lib:masterfrom
tsbere:usmarc_record_terminator

Conversation

@tsbere
Copy link

@tsbere tsbere commented Aug 9, 2016

This is an attempt to fix issues with the record terminator in USMARC also being a pretty quote character present in some descriptions, meaning that the resulting record ends up split into a truncated record and a junk record. I have attached an example file, trimmed from an OCLC export, only including the record before, record with the terminator within it, and the record after.

onebadrecord.zip

tsbere added 3 commits August 9, 2016 11:57
The record terminator is also occasionally *in* records, so attempt to see if
what follows it is another record or a continuation of the current one. In the
latter case, keep reading the record.

Signed-off-by: Thomas Berezansky <tsbere@mvlc.org>
Signed-off-by: Thomas Berezansky <tsbere@mvlc.org>
Because we pull from after the record terminator we may pick up junk from the
end of the file, so clean up any of that we run into.

Signed-off-by: Thomas Berezansky <tsbere@mvlc.org>
@gmcharlt
Copy link
Member

While a mode that can more gracefully deal with records that contain embedded record terminators would be nice, the patches at present break the test suite:

t/75.warnings.t ................... 1/21 
#   Failed test 'next() w/ strict on'
#   at t/75.warnings.t line 31.
#          got: '1'
#     expected: '2'

#   Failed test 'warnings() w/ strict off'
#   at t/75.warnings.t line 54.
#          got: '3'
#     expected: '2'

#   Failed test 'next() w/ strict off'
#   at t/75.warnings.t line 55.
#          got: '6'
#     expected: '8'
# Looks like you planned 21 tests but ran 18.
# Looks like you failed 3 tests of 18 run.

@Dyrcona
Copy link
Contributor

Dyrcona commented Jun 3, 2021

I have reviewed tsbere's code. It changes the way that records are parsed such that the assumptions of the warnings tests against the badldr.usmarc file no longer hold true. The extra record separator between the 2nd and 3rd records in the file ends up tacked on to the end of the 2nd record record rather than parsing as an empty record. I would suggest that we update the tests to reflect the new output.

That said, the modified code does not seem to parse the sample file provided by tsbere correctly, nor does it parse a sample file from a bug on rt.cpan.org. I was planning to use those files to add additional tests.

I've tinkered a bit with the code and these variations have all broken something else in the tests or not fixed the tests broken by tsbere's changes.

At this point, I think a new approach is in order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments