h fS@sldZddlZddlZddlZddlmZdgZejdZejdZ ejdZ ejdZ ejd Z ejd Z ejd Zejd Zejd ZejdZejdZejdejZejdejZejd ZejdZGdddeZeZGdddejZdS)zA parser for HTML and XHTML.N)unescape HTMLParserz[&<]z &[a-zA-Z#]z%&([a-zA-Z][-.a-zA-Z0-9]*)[^a-zA-Z0-9]z)&#(?:[0-9]+|[xX][0-9a-fA-F]+)[^0-9a-fA-F]z <[a-zA-Z]>z--\s*>z(([a-zA-Z][-.a-zA-Z0-9:_]*)(?:\s|/(?!>))*z$([a-zA-Z][^ />]*)(?:\s|/(?!>))*zJ\s*([a-zA-Z_][-.:a-zA-Z_0-9]*)(\s*=\s*(\'[^\']*\'|"[^"]*"|[^\s"\'=<>`]*))?z]((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*(\'[^\']*\'|"[^"]*"|(?![\'"])[^>\s]*))?(?:\s|/(?!>))*a <[a-zA-Z][-.a-zA-Z0-9:_]* # tag name (?:\s+ # whitespace before attribute name (?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name (?:\s*=\s* # value indicator (?:'[^']*' # LITA-enclosed value |\"[^\"]*\" # LIT-enclosed value |[^'\">\s]+ # bare value ) )? ) )* \s* # trailing whitespace aF <[a-zA-Z][^\t\n\r\f />\x00]* # tag name (?:[\s/]* # optional whitespace before attribute name (?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name (?:\s*=+\s* # value indicator (?:'[^']*' # LITA-enclosed value |"[^"]*" # LIT-enclosed value |(?!['"])[^>\s]* # bare value ) (?:\s*,)* # possibly followed by a comma )?(?:\s|/(?!>))* )* )? \s* # trailing whitespace z#c@s1eZdZdZdddZddZdS)HTMLParseErrorz&Exception raised for all parse errors.NcCs'||_|d|_|d|_dS)Nr)msglinenooffset)selfrZpositionr 0/opt/alt/python34/lib64/python3.4/html/parser.py__init__Us  zHTMLParseError.__init__cCsW|j}|jdk r,|d|j}n|jdk rS|d|jd}n|S)Nz , at line %dz , column %dr)rrr )r resultr r r __str__[s  zHTMLParseError.__str__)NN)__name__ __module__ __qualname____doc__r rr r r r rRs rc@sfeZdZdZd;ZededdZddZd d Zd d Z d dZ dZ ddZ ddZ ddZddZddZdddZddZdd Zd!d"Zd#d$Zd%d&Zd'd(Zd)d*Zd+d,Zd-d.Zd/d0Zd1d2Zd3d4Zd5d6Zd7d8Zd9d:Z dS)'.)_HTMLParser__starttag_text)r r r r get_starttag_textszHTMLParser.get_starttag_textcCs2|j|_tjd|jtj|_dS)Nz )lowerr$recompileIr#)r elemr r r set_cdata_modeszHTMLParser.set_cdata_modecCst|_d|_dS)N)r"r#r$)r r r r clear_cdata_modes zHTMLParser.clear_cdata_modec Cs#|j}d}t|}xs||kr|jr|j r|jd|}|dkr|jdt||d}|dkrtjdj || rPn|}qn=|j j ||}|r|j }n|jrPn|}||krH|jr.|j r.|j t |||qH|j |||n|j||}||krjPn|j}|d|r_tj||r|j|} n|d|r|j|} n|d|r|j|} n|d|r |j|} ng|d |rE|jr3|j|} qp|j|} n+|d |kro|j d|d } nP| dkrJ|sPn|jr|jd n|jd |d } | dkr|jd|d } | dkr|d } qn | d 7} |jr0|j r0|j t ||| qJ|j ||| n|j|| }q|d |r;tj||}|r|jdd} |j| |j} |d| d s| d } n|j|| }qqd||dkr7|j |||d|j||d}nPq|d|rtj||}|r|jd } |j| |j} |d| d s| d } n|j|| }qnt j||}|rS|rO|j||dkrO|jr|jdqO|j} | |kr6|} n|j||d }nPq|d |kr|j d|j||d }qPqqW|r ||kr |j r |jr|j r|j t |||n|j ||||j||}n||d|_dS)Nr<&"z[\s;]z z junk characters in start tag: %rr;r;r;)rrd)r.check_for_whole_start_tagr rtagfindrFtagfind_tolerantrOrMr0r!attrfindattrfind_tolerantrappendstripr+countr<r>r-rBendswithhandle_startendtaghandle_starttagCDATA_CONTENT_ELEMENTSr5)r rSendposr attrsrFrVtagmZattrnamerestZ attrvaluerOrr r r r rGps^       00    "zHTMLParser.parse_starttagcCsk|j}|jr'tj||}ntj||}|r[|j}|||d}|dkrs|dS|dkr|jd|r|dS|jd|rd S|jr|j||d|jdn||kr|S|dSn|dkrd S|dkrd S|jr@|j|||jd n||krP|S|dSnt d dS)Nrr/z/>rzmalformed empty start tagrz6abcdefghijklmnopqrstuvwxyz=/ABCDEFGHIJKLMNOPQRSTUVWXYZzmalformed start tagzwe should not get here!r;r;r;) r rlocatestarttagendrFlocatestarttagend_tolerantrOrDrCr-AssertionError)r rSr rvrUnextr r r rgs>             z$HTMLParser.check_for_whole_start_tagcCs|j}tj||d}|s)dS|j}tj||}|s1|jdk rw|j||||S|jr|j d|||fnt j||d}|s|||ddkr|dS|j |Sn|j dj }|jd|j}|j||dS|j dj }|jdk r||jkr|j||||Sn|j|j |j|S)Nrzbad end tag: %rrrYzrr;)r endendtagr@rO endtagfindrFr$rBrr-rir\rMr0r= handle_endtagr6)r rSr rFr]Z namematchZtagnamer4r r r rHs:   !  zHTMLParser.parse_endtagcCs!|j|||j|dS)N)rqr)r rurtr r r rpszHTMLParser.handle_startendtagcCsdS)Nr )r rurtr r r rqszHTMLParser.handle_starttagcCsdS)Nr )r rur r r r szHTMLParser.handle_endtagcCsdS)Nr )r rWr r r rNszHTMLParser.handle_charrefcCsdS)Nr )r rWr r r rQszHTMLParser.handle_entityrefcCsdS)Nr )r r(r r r rBszHTMLParser.handle_datacCsdS)Nr )r r(r r r r^szHTMLParser.handle_commentcCsdS)Nr )r Zdeclr r r r[szHTMLParser.handle_declcCsdS)Nr )r r(r r r ra"szHTMLParser.handle_picCs$|jr |jd|fndS)Nzunknown declaration: %r)rr-)r r(r r r unknown_decl%s zHTMLParser.unknown_declcCs tjdtddt|S)NzZThe unescape method is deprecated and will be removed in 3.5, use html.unescape() instead.rr)rrrr)r sr r r r*s  zHTMLParser.unescape)rr)!rrrrrrrr rr)r*r-r.r/r5r6r'rKr\rJrGrgrHrprqrrNrQrBr^r[rarrr r r r rfs<         < + *          )rr1rr%Zhtmlr__all__r2r"rRrPrLrEr`Z commentcloserhrirjrkVERBOSEryrzr}r~ Exceptionrobjectrr&rr r r r s6