fix(ooxml): regexp does not replace the whole tag in every case#28
fix(ooxml): regexp does not replace the whole tag in every case#28PandiPanda69 wants to merge 1 commit intodocbleach:masterfrom
Conversation
Signed-off-by: Sébastien Mériot <sebastien.meriot@corp.ovh.com>
|
Thanks for the patch! Do you believe this regex is "enough", or should we resort to some XML parsing library just to be sure? |
|
The more I look at the code the more I think it could be a nice option. I'm only afraid about how slow it could be to parse the DOM of every single file. But we should give it a try and see. Probably that disabling the DTD/schema checking would not be that slow. What do you think? |
|
If performance is really an issue, we might read the text looking for "externalData" (the check is already in place). DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true); // <-
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(inputFile); // Using the stream *might* improve performances, right?
NodeList nodes = doc.getElementsByTagName("externalData"); // <- Without the NS
for (int i = 0; i < nodes.getLength(); i++) {
// Remove? Empty it?
// getParentNode then removeChild?
}I never used XML APIs in Java (and did not test the above code) so I can't judge the performance impact, but this does not depend on XPath tricks and should be fast enough, right? 😕 And we don't have to parse every files, do we? I don't have samples to check but I can only guess the "externalData" must be in a specific file, and using the relations we could find it. Am I right? |
Hey !
I just figured out that the
externalDatais not a leaf. Thus, sometimes, the DOM was just fucked up because the replacement was a big #fail.Tell me if you see anything wrong :-)