-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi !
I am using version 3.4 of this library and it is working great, but lately I started to using with one of forum (I am parsing content from there while I am parsing it, it produces OutOfMemory error.
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.base/java.util.Arrays.copyOf(Arrays.java:3745) at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:172) at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:748) at java.base/java.lang.StringBuilder.append(StringBuilder.java:245) at java.base/java.lang.StringBuilder.append(StringBuilder.java:89) at net.htmlparser.jericho.Renderer$Processor.append(Renderer.java:1111) at net.htmlparser.jericho.Renderer$Processor.appendNonPreformattedText(Renderer.java:1009) at net.htmlparser.jericho.Renderer$Processor.appendNonPreformattedSegment(Renderer.java:976) at net.htmlparser.jericho.Renderer$Processor.appendSegment(Renderer.java:941) at net.htmlparser.jericho.Renderer$Processor.appendSegmentRemovingTags(Renderer.java:930) at net.htmlparser.jericho.Renderer$Processor.appendSegmentProcessingChildElements(Renderer.java:913) at net.htmlparser.jericho.Renderer$Processor.appendElementContent(Renderer.java:902) at net.htmlparser.jericho.Renderer$Processor.access$200(Renderer.java:828) at net.htmlparser.jericho.Renderer$FontStyleElementHandler.process(Renderer.java:1154) at net.htmlparser.jericho.Renderer$Processor.appendSegmentProcessingChildElements(Renderer.java:910) at net.htmlparser.jericho.Renderer$Processor.appendElementContent(Renderer.java:902) at net.htmlparser.jericho.Renderer$Processor.access$200(Renderer.java:828) at net.htmlparser.jericho.Renderer$StandardInlineElementHandler.process(Renderer.java:1128) at net.htmlparser.jericho.Renderer$Processor.appendSegmentProcessingChildElements(Renderer.java:910) at net.htmlparser.jericho.Renderer$Processor.appendElementContent(Renderer.java:902) at net.htmlparser.jericho.Renderer$Processor.access$200(Renderer.java:828) at net.htmlparser.jericho.Renderer$StandardInlineElementHandler.process(Renderer.java:1128) at net.htmlparser.jericho.Renderer$Processor.appendSegmentProcessingChildElements(Renderer.java:910) at net.htmlparser.jericho.Renderer$Processor.appendElementContent(Renderer.java:902) at net.htmlparser.jericho.Renderer$Processor.access$200(Renderer.java:828) at net.htmlparser.jericho.Renderer$StandardInlineElementHandler.process(Renderer.java:1128) at net.htmlparser.jericho.Renderer$Processor.appendSegmentProcessingChildElements(Renderer.java:910) at net.htmlparser.jericho.Renderer$Processor.appendElementContent(Renderer.java:902) at net.htmlparser.jericho.Renderer$Processor.access$200(Renderer.java:828) at net.htmlparser.jericho.Renderer$StandardInlineElementHandler.process(Renderer.java:1128) at net.htmlparser.jericho.Renderer$Processor.appendSegmentProcessingChildElements(Renderer.java:910) at net.htmlparser.jericho.Renderer$Processor.appendElementContent(Renderer.java:902)
The HTML snippet I am parsing is pretty small, so I am not sure what is happening... It does have some japanese characters in it though... And a lot of elements...
Code I am using is pretty simple:
Source htmlSource = new Source(htmlText); Segment htmlSeg = new Segment(htmlSource, 0, htmlSource.length()); Renderer htmlRend = new Renderer(htmlSeg); htmlRend.setMaxLineLength(Integer.MAX_VALUE); return htmlRend.toString();