Skip to content

Wont Read Onix Feed #5

@acolchagoff

Description

@acolchagoff

Ive got an onix feed that is sent to me via a zip file in an email. The zip file contains a 100+ mb xml file and a dtd file. The top of the file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ONIXMessage SYSTEM
"ONIX_BookProduct_3.0_short.dtd">
<ONIXmessage release="3.0">
<header>
<sender>
<x298>Publisher</x298>
<x299>Vendor</x299>
<j272>vendor_feeds@place.com</j272>
</sender>
<x307>20140311</x307>
<m183>An Onix message file from Publisher</m183>
</header>

in spite of the fact that this file has well over 10,000 products in it, the gem wont read any of them.

reader.each do |product|
    puts product.inspect
end

The each loop does nothing, it never fires, its as if the XML file had zero products in it.

Ive spent several days here, heres the entire algorithm for reference:

def self.parse_onix(publisher_id, onix_file)
    Zip::ZipFile.open(onix_file.tempfile.path) do |zip|
        xml_file = ""
        dir = "#{Rails.root.to_s}/tmp/onix/"

        zip.each do |entry|
            next if entry.name =~ /__MACOSX/ or \
             entry.name =~ /\.DS_Store/ or !entry.file?
            logger.debug "#{entry.name}"
            puts entry.name
            FileUtils::mkdir_p(dir)
            #this_file = FileUtils.touch(dir + entry.name)
            entry.extract(dir + entry.name)

            p '--->Thing:'+entry.name.last(3)
            if entry.name.last(3) == 'xml'
                xml_file = dir + entry.name
            end
        end

        Work.fix_dtd_path(dir, xml_file)

        reader = ONIX::Reader.new(xml_file)

        puts reader.inspect

        reader.each do |product|
            puts product.inspect
        end
    end
end


def self.fix_dtd_path(dir, xml_file)
    xml = File.read(xml_file)

    # fix the path in the DOCTYPE
    dtd_file = 'ONIX_BookProduct_3.0_short.dtd'
    xml = xml.gsub(dtd_file, dir + dtd_file)
    File.delete(xml_file)
    File.open(xml_file, 'w') do |file|
        file.write(xml)
    end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions