From dee525c747e07c0323dca8ea521837d561b8d738 Mon Sep 17 00:00:00 2001 From: Lawrence Siebert Date: Sat, 5 Apr 2014 17:29:58 -0700 Subject: [PATCH] modified: lib/HTML/Element.pm added delimtier option to as_text to provide for a user supplied delimiter. This allows users to better parse text from a website. POD has been modified to reflect this change. --- lib/HTML/Element.pm | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lib/HTML/Element.pm b/lib/HTML/Element.pm index 4976ccb..4d1141a 100644 --- a/lib/HTML/Element.pm +++ b/lib/HTML/Element.pm @@ -2132,10 +2132,11 @@ sub content_as_XML $s = $h->as_text(); $s = $h->as_text(skip_dels => 1); + $s - $h->as_text(delimiter => "\n"); Returns a string consisting of only the text parts of the element's descendants. Any whitespace inside the element is included unchanged, -but whitespace not in the tree is never added. But remember that +but whitespace not in the tree is never added unless a delimiter is included. But remember that whitespace may be ignored or compacted by HTML::TreeBuilder during parsing (depending on the value of the C and C attributes). Also, since whitespace is @@ -2144,7 +2145,8 @@ never added during parsing, HTML::TreeBuilder->new_from_content("

a

b

") ->as_text; -returns C<"ab">, not C<"a b"> or C<"a\nb">. +returns C<"ab">, not C<"a b"> or C<"a\nb">, +unless those characters are specified as a delimiter. Text under C<<