Class HtmlParseNode

java.lang.Object
com.renomad.minum.htmlparsing.HtmlParseNode

public final class HtmlParseNode extends Object
Represents the expected types of things we may encounter when parsing an HTML string, which for our purposes is ParseNodeType.

See W3.org Elements

  • Field Details

  • Constructor Details

  • Method Details

    • print

      public List<String> print()
      Return a list of strings of the text content of the tree.

      This method traverses the tree from this node downwards, adding the text content as it goes. Its main purpose is to quickly render all the strings out of an HTML document at once.

    • search

      public List<HtmlParseNode> search(TagName tagName, Map<String,String> attributes)
      Return a list of HtmlParseNode nodes in the HTML that match provided attributes.
    • getType

      public ParseNodeType getType()
      Gets the type of this node - either it's an element, with opening and closing tags and attributes and an inner content, or it's just plain text.
    • getTagInfo

      public TagInfo getTagInfo()
      Returns the TagInfo, which contains valuable information like the type of element (p, a, div, and so on) and attributes like class, id, etc.
    • getInnerContent

      public List<HtmlParseNode> getInnerContent()
      The inner content is the data between the opening and closing tags of this element, comprised of potentially other complex elements and/or characters or a mix (or nothing at all, which will return an empty list).
    • getTextContent

      public String getTextContent()
      If the ParseNodeType is ParseNodeType.CHARACTERS, then this will have text content. Otherwise, it returns an empty string.
    • innerText

      public String innerText()
      Return the inner text of a node

      If this element has only one inner content item, and it's a ParseNodeType.CHARACTERS element, return its text content.

      If there is more than one node, concatenates them to a single string, with each section wrapped in square brackets.

    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object