Skip to content

Extract items from 10-K as HTML Snippet with Formatting Intact #57

@debalee101

Description

@debalee101

I'm working with the ten_k.parse() function to extract the risk factor section from 10-K filings. However, for my analysis, I need to preserve the original HTML formatting, particularly bold and italic tags, so I can accurately identify and count individual risk factors (e.g., those introduced with formatted subheadings).

Would it be possible to support an additional output format like format="html" in get_section, which returns the section as a raw HTML snippet with the tags intact? Alternatively, is there a recommended way to recover the exact HTML corresponding to a parsed section?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions