g!|dZddlZddlZddlmZddlmZddlm Z ddl m Z m Z m Z  eZ ddlmZ ddlmZGd d eZ dd lmZGd d eZeZdZddZ ddZ ddZddZ ddZ!dZ"eZ#y#e$reefZYcwxYw#e$r ddlmZYmwxYw#e$r ddlmZYwwxYw#e$rY^wxYw)z? An interface to html5lib that mimics the lxml.html interface. N) HTMLParser) TreeBuilder)etree)ElementXHTML_NAMESPACE_contains_block_level_tag)urlopen)urlparseceZdZdZddZy)rz*An html5lib HTML parser with lxml as tree.c >tj|f|td|yN)stricttree) _HTMLParser__init__rselfrkwargss F/opt/hc_python/lib64/python3.12/site-packages/lxml/html/html5parser.pyrzHTMLParser.__init__sTM&{MfMNF__name__ __module__ __qualname____doc__rrrrrs 4Nrr) XHTMLParserceZdZdZddZy)rz+An html5lib XHTML Parser with lxml as tree.c >tj|f|td|yr ) _XHTMLParserrrrs rrzXHTMLParser.__init__*s  ! !$ RvK R6 RrNrrrrrrr's 9 Srrcb|j|}||S|jdtd|S)N{})findr)rtagelems r _find_tagr(0s. 99S>D  99#6 77rct|ts td|t}i}|t|trd}|||d<|j |fi|j S)z Parse a whole document into a string. If `guess_charset` is true, or if the input is not Unicode but a byte string, the `chardet` library will perform charset guessing on the string. string requiredT useChardet) isinstance_strings TypeError html_parserbytesparsegetroot)html guess_charsetparseroptionss rdocument_fromstringr77sn dH %)** ~GD%!8  -  6<< ( ( 0 0 22rc>t|ts td|t}i}|t|trd}|||d<|j |dfi|}|rFt|dtr3|r1|dj rtjd|dz|d=|S)a`Parses several HTML elements, returning a list of elements. The first item in the list may be a string. If no_leading_text is true, then it will be an error if there is leading text, and it will always be a list of only elements. If `guess_charset` is true, the `chardet` library will perform charset guessing on the string. r*Fr+divrzThere is leading text: %r) r,r-r.r/r0 parseFragmentstripr ParserError)r3no_leading_textr4r5r6childrens rfragments_fromstringr?Os dH %)** ~GD%!8  - #v##D%;7;HJx{H5 {  "''(C(0 )455 Orc6t|ts tdt|}t |||| }|rRt|tsd}t |}|r1t|dtr |d|_|d=|j||S|stjdt|dkDrtjd|d}|jr<|jjr"tjd|jzd |_ |S) aParses a single HTML element; it is an error if there is more than one element, or if anything but whitespace precedes or follows the element. If 'create_parent' is true (or is a tag name) then a parent node will be created to encapsulate the HTML in a single element. In this case, leading or trailing text is allowed. If `guess_charset` is true, the `chardet` library will perform charset guessing on the string. r*)r4r5r=r9rzNo elements foundzMultiple elements foundzElement followed by text: %rN) r,r-r.boolr?rtextextendrr<lentailr;)r3 create_parentr4r5accept_leading_textelementsnew_rootresults rfragment_fromstringrLqs dH %)**}-# M&//1H-2!M=) (1+x0 (  QK OOH %  344 8}q 9:: a[F {{v{{((* > LMMFK Mrctt|ts tdt|||}|dd}t|tr|j dd}|j j}|jds|jdr|St|d }t|r|St|d }t|d k(rW|jr|jjs1|d jr|d jjs|d St|r d|_|Sd|_|S)aParse the html, returning a single element/document. This tries to minimally parse the chunk of text, without knowing if it is a fragment or a document. 'base_url' will set the document's base_url attribute (and the tree's docinfo.URL) If `guess_charset` is true, or if the input is not Unicode but a byte string, the `chardet` library will perform charset guessing on the string. r*)r5r4N2asciireplacezrus .8IIH'&! NN !4SlS =L8300548D-237)X3l!'H l ks|H'&'&%&  sEBBB"B3 B B BB" B0/B03B;:B;