Skip to content

UnicodeDecoreError in serializer.to_XML() #29

@nemobis

Description

@nemobis

I read some data from a CSV and writing it to an empty MARCXMLRecord, then I write it out to XML. Is it normal for a record with Unicode characters like this to fail?

OrderedDict([('040', [{'a': ['IT-MiFBE'], 'ind1': u' ', 'b': ['ita'], 'e': ['reicat'], 'ind2': u' '}]), ('041', [{'a': ['ita'], 'ind1': '0', 'ind2': u' '}]), ('044', [{'a': ['ita'], 'ind1': u' ', 'ind2': u' '}]), ('100', [{'a': ['Medici, Mario'], 'ind2': u' ', 'ind1': '1', 'd': ['1899-1979'], '4': ['aut']}]), ('240', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe']}]), ('245', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe'], 'ind1': '1', 'ind2': '0'}]), ('260', [{'a': ['Venezia'], 'c': ['1934-1935'], 'b': ['Presso la sede del Reale Istituto Veneto'], 'ind1': u' ', 'ind2': u' '}]), ('300', [{'a': ['183-190 p.'], 'ind1': u' ', 'ind2': u' '}]), ('362', [{'a': ['Tomo 94. (1934-1935)'], 'ind1': '0', 'ind2': u' '}]), ('524', [{'a': ['Contributo allo studio di macchine idrauliche a duplice funzionalit\xc3\xa0: le turbine-pompe / Mario Medici. In: Atti. Parte seconda, Scienze matematiche e naturali. - Tomo 94. (1934-1935). - Venezia : Presso la sede del Reale Istituto Veneto, 1934-1935. - 183-190 p.'], 'ind1': '8', 'ind2': u' '}]), ('690', [{'a': ['Atti di accademie italiane'], 'ind1': '0', 'ind2': u' '}, {'a': ['Istituto veneto di scienze, lettere ed arti'], 'ind1': '0', 'ind2': u' '}]), ('773', [{'ind1': '0', 't': ['Atti. Tomo 94., Parte 2., Dispense 1.-4. (1934-1935). Parte seconda, Scienze matematiche e naturali'], 'w': ['ISS-IVSLA100130'], 'ind2': u' '}]), ('856', [{'ind1': '4', 'u': ['http://atena.beic.it/webclient/DeliveryManager?pid=8028533&custom_att_2=simple_viewer&search_terms=DTL23&pds_handle='], 'ind2': u' '}]), ('887', [{'a': ['In: Atti. Parte seconda, Scienze matematiche e naturali. - Tomo 94. (1934-1935). - Venezia : Presso la sede del Reale Istituto Veneto, 1934-1935. - 183-190 p.'], '2': ['local'], 'ind1': u' ', 'ind2': u' '}])])

The exception is:

Traceback (most recent call last):
  File ..., in <module>
    xmlout.write(currentrecord.to_XML())
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 147, in to_XML
    DATA_FIELDS=self._serialize_data_fields().strip()
  File "/usr/lib/python2.7/site-packages/marcxml_parser/serializer.py", line 96, in _serialize_data_fields
    CONTENT=self._serialize_data_subfields(dict_field)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 71: ordinal not in range(128)

Reading the Template definition in serializer.py, I wonder if the template string shold be unicode. I found some other people for whom that was the culprit: https://stackoverflow.com/a/6038077/1333493

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions