|
1 <?php |
|
2 /** |
|
3 * HTML API: WP_HTML_Processor class |
|
4 * |
|
5 * @package WordPress |
|
6 * @subpackage HTML-API |
|
7 * @since 6.4.0 |
|
8 */ |
|
9 |
|
10 /** |
|
11 * Core class used to safely parse and modify an HTML document. |
|
12 * |
|
13 * The HTML Processor class properly parses and modifies HTML5 documents. |
|
14 * |
|
15 * It supports a subset of the HTML5 specification, and when it encounters |
|
16 * unsupported markup, it aborts early to avoid unintentionally breaking |
|
17 * the document. The HTML Processor should never break an HTML document. |
|
18 * |
|
19 * While the `WP_HTML_Tag_Processor` is a valuable tool for modifying |
|
20 * attributes on individual HTML tags, the HTML Processor is more capable |
|
21 * and useful for the following operations: |
|
22 * |
|
23 * - Querying based on nested HTML structure. |
|
24 * |
|
25 * Eventually the HTML Processor will also support: |
|
26 * - Wrapping a tag in surrounding HTML. |
|
27 * - Unwrapping a tag by removing its parent. |
|
28 * - Inserting and removing nodes. |
|
29 * - Reading and changing inner content. |
|
30 * - Navigating up or around HTML structure. |
|
31 * |
|
32 * ## Usage |
|
33 * |
|
34 * Use of this class requires three steps: |
|
35 * |
|
36 * 1. Call a static creator method with your input HTML document. |
|
37 * 2. Find the location in the document you are looking for. |
|
38 * 3. Request changes to the document at that location. |
|
39 * |
|
40 * Example: |
|
41 * |
|
42 * $processor = WP_HTML_Processor::create_fragment( $html ); |
|
43 * if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) { |
|
44 * $processor->add_class( 'responsive-image' ); |
|
45 * } |
|
46 * |
|
47 * #### Breadcrumbs |
|
48 * |
|
49 * Breadcrumbs represent the stack of open elements from the root |
|
50 * of the document or fragment down to the currently-matched node, |
|
51 * if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs() |
|
52 * to inspect the breadcrumbs for a matched tag. |
|
53 * |
|
54 * Breadcrumbs can specify nested HTML structure and are equivalent |
|
55 * to a CSS selector comprising tag names separated by the child |
|
56 * combinator, such as "DIV > FIGURE > IMG". |
|
57 * |
|
58 * Since all elements find themselves inside a full HTML document |
|
59 * when parsed, the return value from `get_breadcrumbs()` will always |
|
60 * contain any implicit outermost elements. For example, when parsing |
|
61 * with `create_fragment()` in the `BODY` context (the default), any |
|
62 * tag in the given HTML document will contain `array( 'HTML', 'BODY', … )` |
|
63 * in its breadcrumbs. |
|
64 * |
|
65 * Despite containing the implied outermost elements in their breadcrumbs, |
|
66 * tags may be found with the shortest-matching breadcrumb query. That is, |
|
67 * `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )` |
|
68 * matches all IMG elements directly inside a P element. To ensure that no |
|
69 * partial matches erroneously match it's possible to specify in a query |
|
70 * the full breadcrumb match all the way down from the root HTML element. |
|
71 * |
|
72 * Example: |
|
73 * |
|
74 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>'; |
|
75 * // ----- Matches here. |
|
76 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) ); |
|
77 * |
|
78 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>'; |
|
79 * // ---- Matches here. |
|
80 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) ); |
|
81 * |
|
82 * $html = '<div><img></div><img>'; |
|
83 * // ----- Matches here, because IMG must be a direct child of the implicit BODY. |
|
84 * $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) ); |
|
85 * |
|
86 * ## HTML Support |
|
87 * |
|
88 * This class implements a small part of the HTML5 specification. |
|
89 * It's designed to operate within its support and abort early whenever |
|
90 * encountering circumstances it can't properly handle. This is |
|
91 * the principle way in which this class remains as simple as possible |
|
92 * without cutting corners and breaking compliance. |
|
93 * |
|
94 * ### Supported elements |
|
95 * |
|
96 * If any unsupported element appears in the HTML input the HTML Processor |
|
97 * will abort early and stop all processing. This draconian measure ensures |
|
98 * that the HTML Processor won't break any HTML it doesn't fully understand. |
|
99 * |
|
100 * The following list specifies the HTML tags that _are_ supported: |
|
101 * |
|
102 * - Containers: ADDRESS, BLOCKQUOTE, DETAILS, DIALOG, DIV, FOOTER, HEADER, MAIN, MENU, SPAN, SUMMARY. |
|
103 * - Custom elements: All custom elements are supported. :) |
|
104 * - Form elements: BUTTON, DATALIST, FIELDSET, INPUT, LABEL, LEGEND, METER, PROGRESS, SEARCH. |
|
105 * - Formatting elements: B, BIG, CODE, EM, FONT, I, PRE, SMALL, STRIKE, STRONG, TT, U, WBR. |
|
106 * - Heading elements: H1, H2, H3, H4, H5, H6, HGROUP. |
|
107 * - Links: A. |
|
108 * - Lists: DD, DL, DT, LI, OL, UL. |
|
109 * - Media elements: AUDIO, CANVAS, EMBED, FIGCAPTION, FIGURE, IMG, MAP, PICTURE, SOURCE, TRACK, VIDEO. |
|
110 * - Paragraph: BR, P. |
|
111 * - Phrasing elements: ABBR, AREA, BDI, BDO, CITE, DATA, DEL, DFN, INS, MARK, OUTPUT, Q, SAMP, SUB, SUP, TIME, VAR. |
|
112 * - Sectioning elements: ARTICLE, ASIDE, HR, NAV, SECTION. |
|
113 * - Templating elements: SLOT. |
|
114 * - Text decoration: RUBY. |
|
115 * - Deprecated elements: ACRONYM, BLINK, CENTER, DIR, ISINDEX, KEYGEN, LISTING, MULTICOL, NEXTID, PARAM, SPACER. |
|
116 * |
|
117 * ### Supported markup |
|
118 * |
|
119 * Some kinds of non-normative HTML involve reconstruction of formatting elements and |
|
120 * re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE |
|
121 * may in fact belong _before_ the table in the DOM. If the HTML Processor encounters |
|
122 * such a case it will stop processing. |
|
123 * |
|
124 * The following list specifies HTML markup that _is_ supported: |
|
125 * |
|
126 * - Markup involving only those tags listed above. |
|
127 * - Fully-balanced and non-overlapping tags. |
|
128 * - HTML with unexpected tag closers. |
|
129 * - Some unbalanced or overlapping tags. |
|
130 * - P tags after unclosed P tags. |
|
131 * - BUTTON tags after unclosed BUTTON tags. |
|
132 * - A tags after unclosed A tags that don't involve any active formatting elements. |
|
133 * |
|
134 * @since 6.4.0 |
|
135 * |
|
136 * @see WP_HTML_Tag_Processor |
|
137 * @see https://html.spec.whatwg.org/ |
|
138 */ |
|
139 class WP_HTML_Processor extends WP_HTML_Tag_Processor { |
|
140 /** |
|
141 * The maximum number of bookmarks allowed to exist at any given time. |
|
142 * |
|
143 * HTML processing requires more bookmarks than basic tag processing, |
|
144 * so this class constant from the Tag Processor is overwritten. |
|
145 * |
|
146 * @since 6.4.0 |
|
147 * |
|
148 * @var int |
|
149 */ |
|
150 const MAX_BOOKMARKS = 100; |
|
151 |
|
152 /** |
|
153 * Holds the working state of the parser, including the stack of |
|
154 * open elements and the stack of active formatting elements. |
|
155 * |
|
156 * Initialized in the constructor. |
|
157 * |
|
158 * @since 6.4.0 |
|
159 * |
|
160 * @var WP_HTML_Processor_State |
|
161 */ |
|
162 private $state = null; |
|
163 |
|
164 /** |
|
165 * Used to create unique bookmark names. |
|
166 * |
|
167 * This class sets a bookmark for every tag in the HTML document that it encounters. |
|
168 * The bookmark name is auto-generated and increments, starting with `1`. These are |
|
169 * internal bookmarks and are automatically released when the referring WP_HTML_Token |
|
170 * goes out of scope and is garbage-collected. |
|
171 * |
|
172 * @since 6.4.0 |
|
173 * |
|
174 * @see WP_HTML_Processor::$release_internal_bookmark_on_destruct |
|
175 * |
|
176 * @var int |
|
177 */ |
|
178 private $bookmark_counter = 0; |
|
179 |
|
180 /** |
|
181 * Stores an explanation for why something failed, if it did. |
|
182 * |
|
183 * @see self::get_last_error |
|
184 * |
|
185 * @since 6.4.0 |
|
186 * |
|
187 * @var string|null |
|
188 */ |
|
189 private $last_error = null; |
|
190 |
|
191 /** |
|
192 * Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance. |
|
193 * |
|
194 * This function is created inside the class constructor so that it can be passed to |
|
195 * the stack of open elements and the stack of active formatting elements without |
|
196 * exposing it as a public method on the class. |
|
197 * |
|
198 * @since 6.4.0 |
|
199 * |
|
200 * @var closure |
|
201 */ |
|
202 private $release_internal_bookmark_on_destruct = null; |
|
203 |
|
204 /** |
|
205 * Stores stack events which arise during parsing of the |
|
206 * HTML document, which will then supply the "match" events. |
|
207 * |
|
208 * @since 6.6.0 |
|
209 * |
|
210 * @var WP_HTML_Stack_Event[] |
|
211 */ |
|
212 private $element_queue = array(); |
|
213 |
|
214 /** |
|
215 * Current stack event, if set, representing a matched token. |
|
216 * |
|
217 * Because the parser may internally point to a place further along in a document |
|
218 * than the nodes which have already been processed (some "virtual" nodes may have |
|
219 * appeared while scanning the HTML document), this will point at the "current" node |
|
220 * being processed. It comes from the front of the element queue. |
|
221 * |
|
222 * @since 6.6.0 |
|
223 * |
|
224 * @var ?WP_HTML_Stack_Event |
|
225 */ |
|
226 private $current_element = null; |
|
227 |
|
228 /** |
|
229 * Context node if created as a fragment parser. |
|
230 * |
|
231 * @var ?WP_HTML_Token |
|
232 */ |
|
233 private $context_node = null; |
|
234 |
|
235 /** |
|
236 * Whether the parser has yet processed the context node, |
|
237 * if created as a fragment parser. |
|
238 * |
|
239 * The context node will be initially pushed onto the stack of open elements, |
|
240 * but when created as a fragment parser, this context element (and the implicit |
|
241 * HTML document node above it) should not be exposed as a matched token or node. |
|
242 * |
|
243 * This boolean indicates whether the processor should skip over the current |
|
244 * node in its initial search for the first node created from the input HTML. |
|
245 * |
|
246 * @var bool |
|
247 */ |
|
248 private $has_seen_context_node = false; |
|
249 |
|
250 /* |
|
251 * Public Interface Functions |
|
252 */ |
|
253 |
|
254 /** |
|
255 * Creates an HTML processor in the fragment parsing mode. |
|
256 * |
|
257 * Use this for cases where you are processing chunks of HTML that |
|
258 * will be found within a bigger HTML document, such as rendered |
|
259 * block output that exists within a post, `the_content` inside a |
|
260 * rendered site layout. |
|
261 * |
|
262 * Fragment parsing occurs within a context, which is an HTML element |
|
263 * that the document will eventually be placed in. It becomes important |
|
264 * when special elements have different rules than others, such as inside |
|
265 * a TEXTAREA or a TITLE tag where things that look like tags are text, |
|
266 * or inside a SCRIPT tag where things that look like HTML syntax are JS. |
|
267 * |
|
268 * The context value should be a representation of the tag into which the |
|
269 * HTML is found. For most cases this will be the body element. The HTML |
|
270 * form is provided because a context element may have attributes that |
|
271 * impact the parse, such as with a SCRIPT tag and its `type` attribute. |
|
272 * |
|
273 * ## Current HTML Support |
|
274 * |
|
275 * - The only supported context is `<body>`, which is the default value. |
|
276 * - The only supported document encoding is `UTF-8`, which is the default value. |
|
277 * |
|
278 * @since 6.4.0 |
|
279 * @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances. |
|
280 * |
|
281 * @param string $html Input HTML fragment to process. |
|
282 * @param string $context Context element for the fragment, must be default of `<body>`. |
|
283 * @param string $encoding Text encoding of the document; must be default of 'UTF-8'. |
|
284 * @return static|null The created processor if successful, otherwise null. |
|
285 */ |
|
286 public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) { |
|
287 if ( '<body>' !== $context || 'UTF-8' !== $encoding ) { |
|
288 return null; |
|
289 } |
|
290 |
|
291 $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE ); |
|
292 $processor->state->context_node = array( 'BODY', array() ); |
|
293 $processor->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; |
|
294 |
|
295 // @todo Create "fake" bookmarks for non-existent but implied nodes. |
|
296 $processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 ); |
|
297 $processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 ); |
|
298 |
|
299 $processor->state->stack_of_open_elements->push( |
|
300 new WP_HTML_Token( |
|
301 'root-node', |
|
302 'HTML', |
|
303 false |
|
304 ) |
|
305 ); |
|
306 |
|
307 $context_node = new WP_HTML_Token( |
|
308 'context-node', |
|
309 $processor->state->context_node[0], |
|
310 false |
|
311 ); |
|
312 |
|
313 $processor->state->stack_of_open_elements->push( $context_node ); |
|
314 $processor->context_node = $context_node; |
|
315 |
|
316 return $processor; |
|
317 } |
|
318 |
|
319 /** |
|
320 * Constructor. |
|
321 * |
|
322 * Do not use this method. Use the static creator methods instead. |
|
323 * |
|
324 * @access private |
|
325 * |
|
326 * @since 6.4.0 |
|
327 * |
|
328 * @see WP_HTML_Processor::create_fragment() |
|
329 * |
|
330 * @param string $html HTML to process. |
|
331 * @param string|null $use_the_static_create_methods_instead This constructor should not be called manually. |
|
332 */ |
|
333 public function __construct( $html, $use_the_static_create_methods_instead = null ) { |
|
334 parent::__construct( $html ); |
|
335 |
|
336 if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) { |
|
337 _doing_it_wrong( |
|
338 __METHOD__, |
|
339 sprintf( |
|
340 /* translators: %s: WP_HTML_Processor::create_fragment(). */ |
|
341 __( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ), |
|
342 '<code>WP_HTML_Processor::create_fragment()</code>' |
|
343 ), |
|
344 '6.4.0' |
|
345 ); |
|
346 } |
|
347 |
|
348 $this->state = new WP_HTML_Processor_State(); |
|
349 |
|
350 $this->state->stack_of_open_elements->set_push_handler( |
|
351 function ( WP_HTML_Token $token ) { |
|
352 $is_virtual = ! isset( $this->state->current_token ) || $this->is_tag_closer(); |
|
353 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; |
|
354 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; |
|
355 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH, $provenance ); |
|
356 } |
|
357 ); |
|
358 |
|
359 $this->state->stack_of_open_elements->set_pop_handler( |
|
360 function ( WP_HTML_Token $token ) { |
|
361 $is_virtual = ! isset( $this->state->current_token ) || ! $this->is_tag_closer(); |
|
362 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; |
|
363 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; |
|
364 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::POP, $provenance ); |
|
365 } |
|
366 ); |
|
367 |
|
368 /* |
|
369 * Create this wrapper so that it's possible to pass |
|
370 * a private method into WP_HTML_Token classes without |
|
371 * exposing it to any public API. |
|
372 */ |
|
373 $this->release_internal_bookmark_on_destruct = function ( $name ) { |
|
374 parent::release_bookmark( $name ); |
|
375 }; |
|
376 } |
|
377 |
|
378 /** |
|
379 * Returns the last error, if any. |
|
380 * |
|
381 * Various situations lead to parsing failure but this class will |
|
382 * return `false` in all those cases. To determine why something |
|
383 * failed it's possible to request the last error. This can be |
|
384 * helpful to know to distinguish whether a given tag couldn't |
|
385 * be found or if content in the document caused the processor |
|
386 * to give up and abort processing. |
|
387 * |
|
388 * Example |
|
389 * |
|
390 * $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' ); |
|
391 * false === $processor->next_tag(); |
|
392 * WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error(); |
|
393 * |
|
394 * @since 6.4.0 |
|
395 * |
|
396 * @see self::ERROR_UNSUPPORTED |
|
397 * @see self::ERROR_EXCEEDED_MAX_BOOKMARKS |
|
398 * |
|
399 * @return string|null The last error, if one exists, otherwise null. |
|
400 */ |
|
401 public function get_last_error() { |
|
402 return $this->last_error; |
|
403 } |
|
404 |
|
405 /** |
|
406 * Finds the next tag matching the $query. |
|
407 * |
|
408 * @todo Support matching the class name and tag name. |
|
409 * |
|
410 * @since 6.4.0 |
|
411 * @since 6.6.0 Visits all tokens, including virtual ones. |
|
412 * |
|
413 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. |
|
414 * |
|
415 * @param array|string|null $query { |
|
416 * Optional. Which tag name to find, having which class, etc. Default is to find any tag. |
|
417 * |
|
418 * @type string|null $tag_name Which tag to find, or `null` for "any tag." |
|
419 * @type string $tag_closers 'visit' to pause at tag closers, 'skip' or unset to only visit openers. |
|
420 * @type int|null $match_offset Find the Nth tag matching all search criteria. |
|
421 * 1 for "first" tag, 3 for "third," etc. |
|
422 * Defaults to first tag. |
|
423 * @type string|null $class_name Tag must contain this whole class name to match. |
|
424 * @type string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`. |
|
425 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`. |
|
426 * } |
|
427 * @return bool Whether a tag was matched. |
|
428 */ |
|
429 public function next_tag( $query = null ) { |
|
430 $visit_closers = isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers']; |
|
431 |
|
432 if ( null === $query ) { |
|
433 while ( $this->next_token() ) { |
|
434 if ( '#tag' !== $this->get_token_type() ) { |
|
435 continue; |
|
436 } |
|
437 |
|
438 if ( ! $this->is_tag_closer() || $visit_closers ) { |
|
439 return true; |
|
440 } |
|
441 } |
|
442 |
|
443 return false; |
|
444 } |
|
445 |
|
446 if ( is_string( $query ) ) { |
|
447 $query = array( 'breadcrumbs' => array( $query ) ); |
|
448 } |
|
449 |
|
450 if ( ! is_array( $query ) ) { |
|
451 _doing_it_wrong( |
|
452 __METHOD__, |
|
453 __( 'Please pass a query array to this function.' ), |
|
454 '6.4.0' |
|
455 ); |
|
456 return false; |
|
457 } |
|
458 |
|
459 $needs_class = ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) ) |
|
460 ? $query['class_name'] |
|
461 : null; |
|
462 |
|
463 if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) { |
|
464 while ( $this->next_token() ) { |
|
465 if ( '#tag' !== $this->get_token_type() ) { |
|
466 continue; |
|
467 } |
|
468 |
|
469 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) { |
|
470 continue; |
|
471 } |
|
472 |
|
473 if ( ! $this->is_tag_closer() || $visit_closers ) { |
|
474 return true; |
|
475 } |
|
476 } |
|
477 |
|
478 return false; |
|
479 } |
|
480 |
|
481 $breadcrumbs = $query['breadcrumbs']; |
|
482 $match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1; |
|
483 |
|
484 while ( $match_offset > 0 && $this->next_token() ) { |
|
485 if ( '#tag' !== $this->get_token_type() || $this->is_tag_closer() ) { |
|
486 continue; |
|
487 } |
|
488 |
|
489 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) { |
|
490 continue; |
|
491 } |
|
492 |
|
493 if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) { |
|
494 return true; |
|
495 } |
|
496 } |
|
497 |
|
498 return false; |
|
499 } |
|
500 |
|
501 /** |
|
502 * Ensures internal accounting is maintained for HTML semantic rules while |
|
503 * the underlying Tag Processor class is seeking to a bookmark. |
|
504 * |
|
505 * This doesn't currently have a way to represent non-tags and doesn't process |
|
506 * semantic rules for text nodes. For access to the raw tokens consider using |
|
507 * WP_HTML_Tag_Processor instead. |
|
508 * |
|
509 * @since 6.5.0 Added for internal support; do not use. |
|
510 * |
|
511 * @access private |
|
512 * |
|
513 * @return bool |
|
514 */ |
|
515 public function next_token() { |
|
516 $this->current_element = null; |
|
517 |
|
518 if ( isset( $this->last_error ) ) { |
|
519 return false; |
|
520 } |
|
521 |
|
522 if ( 'done' !== $this->has_seen_context_node && 0 === count( $this->element_queue ) && ! $this->step() ) { |
|
523 while ( 'context-node' !== $this->state->stack_of_open_elements->current_node()->bookmark_name && $this->state->stack_of_open_elements->pop() ) { |
|
524 continue; |
|
525 } |
|
526 $this->has_seen_context_node = 'done'; |
|
527 return $this->next_token(); |
|
528 } |
|
529 |
|
530 $this->current_element = array_shift( $this->element_queue ); |
|
531 while ( isset( $this->context_node ) && ! $this->has_seen_context_node ) { |
|
532 if ( isset( $this->current_element ) ) { |
|
533 if ( $this->context_node === $this->current_element->token && WP_HTML_Stack_Event::PUSH === $this->current_element->operation ) { |
|
534 $this->has_seen_context_node = true; |
|
535 return $this->next_token(); |
|
536 } |
|
537 } |
|
538 $this->current_element = array_shift( $this->element_queue ); |
|
539 } |
|
540 |
|
541 if ( ! isset( $this->current_element ) ) { |
|
542 if ( 'done' === $this->has_seen_context_node ) { |
|
543 return false; |
|
544 } else { |
|
545 return $this->next_token(); |
|
546 } |
|
547 } |
|
548 |
|
549 if ( isset( $this->context_node ) && WP_HTML_Stack_Event::POP === $this->current_element->operation && $this->context_node === $this->current_element->token ) { |
|
550 $this->element_queue = array(); |
|
551 $this->current_element = null; |
|
552 return false; |
|
553 } |
|
554 |
|
555 // Avoid sending close events for elements which don't expect a closing. |
|
556 if ( |
|
557 WP_HTML_Stack_Event::POP === $this->current_element->operation && |
|
558 ! static::expects_closer( $this->current_element->token ) |
|
559 ) { |
|
560 return $this->next_token(); |
|
561 } |
|
562 |
|
563 return true; |
|
564 } |
|
565 |
|
566 |
|
567 /** |
|
568 * Indicates if the current tag token is a tag closer. |
|
569 * |
|
570 * Example: |
|
571 * |
|
572 * $p = WP_HTML_Processor::create_fragment( '<div></div>' ); |
|
573 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); |
|
574 * $p->is_tag_closer() === false; |
|
575 * |
|
576 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) ); |
|
577 * $p->is_tag_closer() === true; |
|
578 * |
|
579 * @since 6.6.0 Subclassed for HTML Processor. |
|
580 * |
|
581 * @return bool Whether the current tag is a tag closer. |
|
582 */ |
|
583 public function is_tag_closer() { |
|
584 return $this->is_virtual() |
|
585 ? ( WP_HTML_Stack_Event::POP === $this->current_element->operation && '#tag' === $this->get_token_type() ) |
|
586 : parent::is_tag_closer(); |
|
587 } |
|
588 |
|
589 /** |
|
590 * Indicates if the currently-matched token is virtual, created by a stack operation |
|
591 * while processing HTML, rather than a token found in the HTML text itself. |
|
592 * |
|
593 * @since 6.6.0 |
|
594 * |
|
595 * @return bool Whether the current token is virtual. |
|
596 */ |
|
597 private function is_virtual() { |
|
598 return ( |
|
599 isset( $this->current_element->provenance ) && |
|
600 'virtual' === $this->current_element->provenance |
|
601 ); |
|
602 } |
|
603 |
|
604 /** |
|
605 * Indicates if the currently-matched tag matches the given breadcrumbs. |
|
606 * |
|
607 * A "*" represents a single tag wildcard, where any tag matches, but not no tags. |
|
608 * |
|
609 * At some point this function _may_ support a `**` syntax for matching any number |
|
610 * of unspecified tags in the breadcrumb stack. This has been intentionally left |
|
611 * out, however, to keep this function simple and to avoid introducing backtracking, |
|
612 * which could open up surprising performance breakdowns. |
|
613 * |
|
614 * Example: |
|
615 * |
|
616 * $processor = WP_HTML_Processor::create_fragment( '<div><span><figure><img></figure></span></div>' ); |
|
617 * $processor->next_tag( 'img' ); |
|
618 * true === $processor->matches_breadcrumbs( array( 'figure', 'img' ) ); |
|
619 * true === $processor->matches_breadcrumbs( array( 'span', 'figure', 'img' ) ); |
|
620 * false === $processor->matches_breadcrumbs( array( 'span', 'img' ) ); |
|
621 * true === $processor->matches_breadcrumbs( array( 'span', '*', 'img' ) ); |
|
622 * |
|
623 * @since 6.4.0 |
|
624 * |
|
625 * @param string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`. |
|
626 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`. |
|
627 * @return bool Whether the currently-matched tag is found at the given nested structure. |
|
628 */ |
|
629 public function matches_breadcrumbs( $breadcrumbs ) { |
|
630 // Everything matches when there are zero constraints. |
|
631 if ( 0 === count( $breadcrumbs ) ) { |
|
632 return true; |
|
633 } |
|
634 |
|
635 // Start at the last crumb. |
|
636 $crumb = end( $breadcrumbs ); |
|
637 |
|
638 if ( '*' !== $crumb && $this->get_tag() !== strtoupper( $crumb ) ) { |
|
639 return false; |
|
640 } |
|
641 |
|
642 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) { |
|
643 $crumb = strtoupper( current( $breadcrumbs ) ); |
|
644 |
|
645 if ( '*' !== $crumb && $node->node_name !== $crumb ) { |
|
646 return false; |
|
647 } |
|
648 |
|
649 if ( false === prev( $breadcrumbs ) ) { |
|
650 return true; |
|
651 } |
|
652 } |
|
653 |
|
654 return false; |
|
655 } |
|
656 |
|
657 /** |
|
658 * Indicates if the currently-matched node expects a closing |
|
659 * token, or if it will self-close on the next step. |
|
660 * |
|
661 * Most HTML elements expect a closer, such as a P element or |
|
662 * a DIV element. Others, like an IMG element are void and don't |
|
663 * have a closing tag. Special elements, such as SCRIPT and STYLE, |
|
664 * are treated just like void tags. Text nodes and self-closing |
|
665 * foreign content will also act just like a void tag, immediately |
|
666 * closing as soon as the processor advances to the next token. |
|
667 * |
|
668 * @since 6.6.0 |
|
669 * |
|
670 * @todo When adding support for foreign content, ensure that |
|
671 * this returns false for self-closing elements in the |
|
672 * SVG and MathML namespace. |
|
673 * |
|
674 * @param ?WP_HTML_Token $node Node to examine instead of current node, if provided. |
|
675 * @return bool Whether to expect a closer for the currently-matched node, |
|
676 * or `null` if not matched on any token. |
|
677 */ |
|
678 public function expects_closer( $node = null ) { |
|
679 $token_name = $node->node_name ?? $this->get_token_name(); |
|
680 if ( ! isset( $token_name ) ) { |
|
681 return null; |
|
682 } |
|
683 |
|
684 return ! ( |
|
685 // Comments, text nodes, and other atomic tokens. |
|
686 '#' === $token_name[0] || |
|
687 // Doctype declarations. |
|
688 'html' === $token_name || |
|
689 // Void elements. |
|
690 self::is_void( $token_name ) || |
|
691 // Special atomic elements. |
|
692 in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) |
|
693 ); |
|
694 } |
|
695 |
|
696 /** |
|
697 * Steps through the HTML document and stop at the next tag, if any. |
|
698 * |
|
699 * @since 6.4.0 |
|
700 * |
|
701 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. |
|
702 * |
|
703 * @see self::PROCESS_NEXT_NODE |
|
704 * @see self::REPROCESS_CURRENT_NODE |
|
705 * |
|
706 * @param string $node_to_process Whether to parse the next node or reprocess the current node. |
|
707 * @return bool Whether a tag was matched. |
|
708 */ |
|
709 public function step( $node_to_process = self::PROCESS_NEXT_NODE ) { |
|
710 // Refuse to proceed if there was a previous error. |
|
711 if ( null !== $this->last_error ) { |
|
712 return false; |
|
713 } |
|
714 |
|
715 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) { |
|
716 /* |
|
717 * Void elements still hop onto the stack of open elements even though |
|
718 * there's no corresponding closing tag. This is important for managing |
|
719 * stack-based operations such as "navigate to parent node" or checking |
|
720 * on an element's breadcrumbs. |
|
721 * |
|
722 * When moving on to the next node, therefore, if the bottom-most element |
|
723 * on the stack is a void element, it must be closed. |
|
724 * |
|
725 * @todo Once self-closing foreign elements and BGSOUND are supported, |
|
726 * they must also be implicitly closed here too. BGSOUND is |
|
727 * special since it's only self-closing if the self-closing flag |
|
728 * is provided in the opening tag, otherwise it expects a tag closer. |
|
729 */ |
|
730 $top_node = $this->state->stack_of_open_elements->current_node(); |
|
731 if ( isset( $top_node ) && ! static::expects_closer( $top_node ) ) { |
|
732 $this->state->stack_of_open_elements->pop(); |
|
733 } |
|
734 } |
|
735 |
|
736 if ( self::PROCESS_NEXT_NODE === $node_to_process ) { |
|
737 parent::next_token(); |
|
738 } |
|
739 |
|
740 // Finish stepping when there are no more tokens in the document. |
|
741 if ( |
|
742 WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state || |
|
743 WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state |
|
744 ) { |
|
745 return false; |
|
746 } |
|
747 |
|
748 $this->state->current_token = new WP_HTML_Token( |
|
749 $this->bookmark_token(), |
|
750 $this->get_token_name(), |
|
751 $this->has_self_closing_flag(), |
|
752 $this->release_internal_bookmark_on_destruct |
|
753 ); |
|
754 |
|
755 try { |
|
756 switch ( $this->state->insertion_mode ) { |
|
757 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY: |
|
758 return $this->step_in_body(); |
|
759 |
|
760 default: |
|
761 $this->last_error = self::ERROR_UNSUPPORTED; |
|
762 throw new WP_HTML_Unsupported_Exception( "No support for parsing in the '{$this->state->insertion_mode}' state." ); |
|
763 } |
|
764 } catch ( WP_HTML_Unsupported_Exception $e ) { |
|
765 /* |
|
766 * Exceptions are used in this class to escape deep call stacks that |
|
767 * otherwise might involve messier calling and return conventions. |
|
768 */ |
|
769 return false; |
|
770 } |
|
771 } |
|
772 |
|
773 /** |
|
774 * Computes the HTML breadcrumbs for the currently-matched node, if matched. |
|
775 * |
|
776 * Breadcrumbs start at the outermost parent and descend toward the matched element. |
|
777 * They always include the entire path from the root HTML node to the matched element. |
|
778 * |
|
779 * @todo It could be more efficient to expose a generator-based version of this function |
|
780 * to avoid creating the array copy on tag iteration. If this is done, it would likely |
|
781 * be more useful to walk up the stack when yielding instead of starting at the top. |
|
782 * |
|
783 * Example |
|
784 * |
|
785 * $processor = WP_HTML_Processor::create_fragment( '<p><strong><em><img></em></strong></p>' ); |
|
786 * $processor->next_tag( 'IMG' ); |
|
787 * $processor->get_breadcrumbs() === array( 'HTML', 'BODY', 'P', 'STRONG', 'EM', 'IMG' ); |
|
788 * |
|
789 * @since 6.4.0 |
|
790 * |
|
791 * @return string[]|null Array of tag names representing path to matched node, if matched, otherwise NULL. |
|
792 */ |
|
793 public function get_breadcrumbs() { |
|
794 $breadcrumbs = array(); |
|
795 |
|
796 foreach ( $this->state->stack_of_open_elements->walk_down() as $stack_item ) { |
|
797 $breadcrumbs[] = $stack_item->node_name; |
|
798 } |
|
799 |
|
800 if ( ! $this->is_virtual() ) { |
|
801 return $breadcrumbs; |
|
802 } |
|
803 |
|
804 foreach ( $this->element_queue as $queue_item ) { |
|
805 if ( $this->current_element->token->bookmark_name === $queue_item->token->bookmark_name ) { |
|
806 break; |
|
807 } |
|
808 |
|
809 if ( 'context-node' === $queue_item->token->bookmark_name ) { |
|
810 break; |
|
811 } |
|
812 |
|
813 if ( 'real' === $queue_item->provenance ) { |
|
814 break; |
|
815 } |
|
816 |
|
817 if ( WP_HTML_Stack_Event::PUSH === $queue_item->operation ) { |
|
818 $breadcrumbs[] = $queue_item->token->node_name; |
|
819 } else { |
|
820 array_pop( $breadcrumbs ); |
|
821 } |
|
822 } |
|
823 |
|
824 if ( null !== parent::get_token_name() && ! parent::is_tag_closer() ) { |
|
825 array_pop( $breadcrumbs ); |
|
826 } |
|
827 |
|
828 // Add the virtual node we're at. |
|
829 if ( WP_HTML_Stack_Event::PUSH === $this->current_element->operation ) { |
|
830 $breadcrumbs[] = $this->current_element->token->node_name; |
|
831 } |
|
832 |
|
833 return $breadcrumbs; |
|
834 } |
|
835 |
|
836 /** |
|
837 * Returns the nesting depth of the current location in the document. |
|
838 * |
|
839 * Example: |
|
840 * |
|
841 * $processor = WP_HTML_Processor::create_fragment( '<div><p></p></div>' ); |
|
842 * // The processor starts in the BODY context, meaning it has depth from the start: HTML > BODY. |
|
843 * 2 === $processor->get_current_depth(); |
|
844 * |
|
845 * // Opening the DIV element increases the depth. |
|
846 * $processor->next_token(); |
|
847 * 3 === $processor->get_current_depth(); |
|
848 * |
|
849 * // Opening the P element increases the depth. |
|
850 * $processor->next_token(); |
|
851 * 4 === $processor->get_current_depth(); |
|
852 * |
|
853 * // The P element is closed during `next_token()` so the depth is decreased to reflect that. |
|
854 * $processor->next_token(); |
|
855 * 3 === $processor->get_current_depth(); |
|
856 * |
|
857 * @since 6.6.0 |
|
858 * |
|
859 * @return int Nesting-depth of current location in the document. |
|
860 */ |
|
861 public function get_current_depth() { |
|
862 return $this->is_virtual() |
|
863 ? count( $this->get_breadcrumbs() ) |
|
864 : $this->state->stack_of_open_elements->count(); |
|
865 } |
|
866 |
|
867 /** |
|
868 * Parses next element in the 'in body' insertion mode. |
|
869 * |
|
870 * This internal function performs the 'in body' insertion mode |
|
871 * logic for the generalized WP_HTML_Processor::step() function. |
|
872 * |
|
873 * @since 6.4.0 |
|
874 * |
|
875 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. |
|
876 * |
|
877 * @see https://html.spec.whatwg.org/#parsing-main-inbody |
|
878 * @see WP_HTML_Processor::step |
|
879 * |
|
880 * @return bool Whether an element was found. |
|
881 */ |
|
882 private function step_in_body() { |
|
883 $token_name = $this->get_token_name(); |
|
884 $token_type = $this->get_token_type(); |
|
885 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : ''; |
|
886 $op = "{$op_sigil}{$token_name}"; |
|
887 |
|
888 switch ( $op ) { |
|
889 case '#comment': |
|
890 case '#funky-comment': |
|
891 case '#presumptuous-tag': |
|
892 $this->insert_html_element( $this->state->current_token ); |
|
893 return true; |
|
894 |
|
895 case '#text': |
|
896 $this->reconstruct_active_formatting_elements(); |
|
897 |
|
898 $current_token = $this->bookmarks[ $this->state->current_token->bookmark_name ]; |
|
899 |
|
900 /* |
|
901 * > A character token that is U+0000 NULL |
|
902 * |
|
903 * Any successive sequence of NULL bytes is ignored and won't |
|
904 * trigger active format reconstruction. Therefore, if the text |
|
905 * only comprises NULL bytes then the token should be ignored |
|
906 * here, but if there are any other characters in the stream |
|
907 * the active formats should be reconstructed. |
|
908 */ |
|
909 if ( |
|
910 1 <= $current_token->length && |
|
911 "\x00" === $this->html[ $current_token->start ] && |
|
912 strspn( $this->html, "\x00", $current_token->start, $current_token->length ) === $current_token->length |
|
913 ) { |
|
914 // Parse error: ignore the token. |
|
915 return $this->step(); |
|
916 } |
|
917 |
|
918 /* |
|
919 * Whitespace-only text does not affect the frameset-ok flag. |
|
920 * It is probably inter-element whitespace, but it may also |
|
921 * contain character references which decode only to whitespace. |
|
922 */ |
|
923 $text = $this->get_modifiable_text(); |
|
924 if ( strlen( $text ) !== strspn( $text, " \t\n\f\r" ) ) { |
|
925 $this->state->frameset_ok = false; |
|
926 } |
|
927 |
|
928 $this->insert_html_element( $this->state->current_token ); |
|
929 return true; |
|
930 |
|
931 case 'html': |
|
932 /* |
|
933 * > A DOCTYPE token |
|
934 * > Parse error. Ignore the token. |
|
935 */ |
|
936 return $this->step(); |
|
937 |
|
938 /* |
|
939 * > A start tag whose tag name is "button" |
|
940 */ |
|
941 case '+BUTTON': |
|
942 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'BUTTON' ) ) { |
|
943 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. |
|
944 $this->generate_implied_end_tags(); |
|
945 $this->state->stack_of_open_elements->pop_until( 'BUTTON' ); |
|
946 } |
|
947 |
|
948 $this->reconstruct_active_formatting_elements(); |
|
949 $this->insert_html_element( $this->state->current_token ); |
|
950 $this->state->frameset_ok = false; |
|
951 |
|
952 return true; |
|
953 |
|
954 /* |
|
955 * > A start tag whose tag name is one of: "address", "article", "aside", |
|
956 * > "blockquote", "center", "details", "dialog", "dir", "div", "dl", |
|
957 * > "fieldset", "figcaption", "figure", "footer", "header", "hgroup", |
|
958 * > "main", "menu", "nav", "ol", "p", "search", "section", "summary", "ul" |
|
959 */ |
|
960 case '+ADDRESS': |
|
961 case '+ARTICLE': |
|
962 case '+ASIDE': |
|
963 case '+BLOCKQUOTE': |
|
964 case '+CENTER': |
|
965 case '+DETAILS': |
|
966 case '+DIALOG': |
|
967 case '+DIR': |
|
968 case '+DIV': |
|
969 case '+DL': |
|
970 case '+FIELDSET': |
|
971 case '+FIGCAPTION': |
|
972 case '+FIGURE': |
|
973 case '+FOOTER': |
|
974 case '+HEADER': |
|
975 case '+HGROUP': |
|
976 case '+MAIN': |
|
977 case '+MENU': |
|
978 case '+NAV': |
|
979 case '+OL': |
|
980 case '+P': |
|
981 case '+SEARCH': |
|
982 case '+SECTION': |
|
983 case '+SUMMARY': |
|
984 case '+UL': |
|
985 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { |
|
986 $this->close_a_p_element(); |
|
987 } |
|
988 |
|
989 $this->insert_html_element( $this->state->current_token ); |
|
990 return true; |
|
991 |
|
992 /* |
|
993 * > An end tag whose tag name is one of: "address", "article", "aside", "blockquote", |
|
994 * > "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset", |
|
995 * > "figcaption", "figure", "footer", "header", "hgroup", "listing", "main", |
|
996 * > "menu", "nav", "ol", "pre", "search", "section", "summary", "ul" |
|
997 */ |
|
998 case '-ADDRESS': |
|
999 case '-ARTICLE': |
|
1000 case '-ASIDE': |
|
1001 case '-BLOCKQUOTE': |
|
1002 case '-BUTTON': |
|
1003 case '-CENTER': |
|
1004 case '-DETAILS': |
|
1005 case '-DIALOG': |
|
1006 case '-DIR': |
|
1007 case '-DIV': |
|
1008 case '-DL': |
|
1009 case '-FIELDSET': |
|
1010 case '-FIGCAPTION': |
|
1011 case '-FIGURE': |
|
1012 case '-FOOTER': |
|
1013 case '-HEADER': |
|
1014 case '-HGROUP': |
|
1015 case '-LISTING': |
|
1016 case '-MAIN': |
|
1017 case '-MENU': |
|
1018 case '-NAV': |
|
1019 case '-OL': |
|
1020 case '-PRE': |
|
1021 case '-SEARCH': |
|
1022 case '-SECTION': |
|
1023 case '-SUMMARY': |
|
1024 case '-UL': |
|
1025 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) { |
|
1026 // @todo Report parse error. |
|
1027 // Ignore the token. |
|
1028 return $this->step(); |
|
1029 } |
|
1030 |
|
1031 $this->generate_implied_end_tags(); |
|
1032 if ( $this->state->stack_of_open_elements->current_node()->node_name !== $token_name ) { |
|
1033 // @todo Record parse error: this error doesn't impact parsing. |
|
1034 } |
|
1035 $this->state->stack_of_open_elements->pop_until( $token_name ); |
|
1036 return true; |
|
1037 |
|
1038 /* |
|
1039 * > A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" |
|
1040 */ |
|
1041 case '+H1': |
|
1042 case '+H2': |
|
1043 case '+H3': |
|
1044 case '+H4': |
|
1045 case '+H5': |
|
1046 case '+H6': |
|
1047 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { |
|
1048 $this->close_a_p_element(); |
|
1049 } |
|
1050 |
|
1051 if ( |
|
1052 in_array( |
|
1053 $this->state->stack_of_open_elements->current_node()->node_name, |
|
1054 array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ), |
|
1055 true |
|
1056 ) |
|
1057 ) { |
|
1058 // @todo Indicate a parse error once it's possible. |
|
1059 $this->state->stack_of_open_elements->pop(); |
|
1060 } |
|
1061 |
|
1062 $this->insert_html_element( $this->state->current_token ); |
|
1063 return true; |
|
1064 |
|
1065 /* |
|
1066 * > A start tag whose tag name is one of: "pre", "listing" |
|
1067 */ |
|
1068 case '+PRE': |
|
1069 case '+LISTING': |
|
1070 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { |
|
1071 $this->close_a_p_element(); |
|
1072 } |
|
1073 $this->insert_html_element( $this->state->current_token ); |
|
1074 $this->state->frameset_ok = false; |
|
1075 return true; |
|
1076 |
|
1077 /* |
|
1078 * > An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" |
|
1079 */ |
|
1080 case '-H1': |
|
1081 case '-H2': |
|
1082 case '-H3': |
|
1083 case '-H4': |
|
1084 case '-H5': |
|
1085 case '-H6': |
|
1086 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( '(internal: H1 through H6 - do not use)' ) ) { |
|
1087 /* |
|
1088 * This is a parse error; ignore the token. |
|
1089 * |
|
1090 * @todo Indicate a parse error once it's possible. |
|
1091 */ |
|
1092 return $this->step(); |
|
1093 } |
|
1094 |
|
1095 $this->generate_implied_end_tags(); |
|
1096 |
|
1097 if ( $this->state->stack_of_open_elements->current_node()->node_name !== $token_name ) { |
|
1098 // @todo Record parse error: this error doesn't impact parsing. |
|
1099 } |
|
1100 |
|
1101 $this->state->stack_of_open_elements->pop_until( '(internal: H1 through H6 - do not use)' ); |
|
1102 return true; |
|
1103 |
|
1104 /* |
|
1105 * > A start tag whose tag name is "li" |
|
1106 * > A start tag whose tag name is one of: "dd", "dt" |
|
1107 */ |
|
1108 case '+DD': |
|
1109 case '+DT': |
|
1110 case '+LI': |
|
1111 $this->state->frameset_ok = false; |
|
1112 $node = $this->state->stack_of_open_elements->current_node(); |
|
1113 $is_li = 'LI' === $token_name; |
|
1114 |
|
1115 in_body_list_loop: |
|
1116 /* |
|
1117 * The logic for LI and DT/DD is the same except for one point: LI elements _only_ |
|
1118 * close other LI elements, but a DT or DD element closes _any_ open DT or DD element. |
|
1119 */ |
|
1120 if ( $is_li ? 'LI' === $node->node_name : ( 'DD' === $node->node_name || 'DT' === $node->node_name ) ) { |
|
1121 $node_name = $is_li ? 'LI' : $node->node_name; |
|
1122 $this->generate_implied_end_tags( $node_name ); |
|
1123 if ( $node_name !== $this->state->stack_of_open_elements->current_node()->node_name ) { |
|
1124 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. |
|
1125 } |
|
1126 |
|
1127 $this->state->stack_of_open_elements->pop_until( $node_name ); |
|
1128 goto in_body_list_done; |
|
1129 } |
|
1130 |
|
1131 if ( |
|
1132 'ADDRESS' !== $node->node_name && |
|
1133 'DIV' !== $node->node_name && |
|
1134 'P' !== $node->node_name && |
|
1135 $this->is_special( $node->node_name ) |
|
1136 ) { |
|
1137 /* |
|
1138 * > If node is in the special category, but is not an address, div, |
|
1139 * > or p element, then jump to the step labeled done below. |
|
1140 */ |
|
1141 goto in_body_list_done; |
|
1142 } else { |
|
1143 /* |
|
1144 * > Otherwise, set node to the previous entry in the stack of open elements |
|
1145 * > and return to the step labeled loop. |
|
1146 */ |
|
1147 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) { |
|
1148 $node = $item; |
|
1149 break; |
|
1150 } |
|
1151 goto in_body_list_loop; |
|
1152 } |
|
1153 |
|
1154 in_body_list_done: |
|
1155 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { |
|
1156 $this->close_a_p_element(); |
|
1157 } |
|
1158 |
|
1159 $this->insert_html_element( $this->state->current_token ); |
|
1160 return true; |
|
1161 |
|
1162 /* |
|
1163 * > An end tag whose tag name is "li" |
|
1164 * > An end tag whose tag name is one of: "dd", "dt" |
|
1165 */ |
|
1166 case '-DD': |
|
1167 case '-DT': |
|
1168 case '-LI': |
|
1169 if ( |
|
1170 /* |
|
1171 * An end tag whose tag name is "li": |
|
1172 * If the stack of open elements does not have an li element in list item scope, |
|
1173 * then this is a parse error; ignore the token. |
|
1174 */ |
|
1175 ( |
|
1176 'LI' === $token_name && |
|
1177 ! $this->state->stack_of_open_elements->has_element_in_list_item_scope( 'LI' ) |
|
1178 ) || |
|
1179 /* |
|
1180 * An end tag whose tag name is one of: "dd", "dt": |
|
1181 * If the stack of open elements does not have an element in scope that is an |
|
1182 * HTML element with the same tag name as that of the token, then this is a |
|
1183 * parse error; ignore the token. |
|
1184 */ |
|
1185 ( |
|
1186 'LI' !== $token_name && |
|
1187 ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) |
|
1188 ) |
|
1189 ) { |
|
1190 /* |
|
1191 * This is a parse error, ignore the token. |
|
1192 * |
|
1193 * @todo Indicate a parse error once it's possible. |
|
1194 */ |
|
1195 return $this->step(); |
|
1196 } |
|
1197 |
|
1198 $this->generate_implied_end_tags( $token_name ); |
|
1199 |
|
1200 if ( $token_name !== $this->state->stack_of_open_elements->current_node()->node_name ) { |
|
1201 // @todo Indicate a parse error once it's possible. This error does not impact the logic here. |
|
1202 } |
|
1203 |
|
1204 $this->state->stack_of_open_elements->pop_until( $token_name ); |
|
1205 return true; |
|
1206 |
|
1207 /* |
|
1208 * > An end tag whose tag name is "p" |
|
1209 */ |
|
1210 case '-P': |
|
1211 if ( ! $this->state->stack_of_open_elements->has_p_in_button_scope() ) { |
|
1212 $this->insert_html_element( $this->state->current_token ); |
|
1213 } |
|
1214 |
|
1215 $this->close_a_p_element(); |
|
1216 return true; |
|
1217 |
|
1218 // > A start tag whose tag name is "a" |
|
1219 case '+A': |
|
1220 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { |
|
1221 switch ( $item->node_name ) { |
|
1222 case 'marker': |
|
1223 break; |
|
1224 |
|
1225 case 'A': |
|
1226 $this->run_adoption_agency_algorithm(); |
|
1227 $this->state->active_formatting_elements->remove_node( $item ); |
|
1228 $this->state->stack_of_open_elements->remove_node( $item ); |
|
1229 break; |
|
1230 } |
|
1231 } |
|
1232 |
|
1233 $this->reconstruct_active_formatting_elements(); |
|
1234 $this->insert_html_element( $this->state->current_token ); |
|
1235 $this->state->active_formatting_elements->push( $this->state->current_token ); |
|
1236 return true; |
|
1237 |
|
1238 /* |
|
1239 * > A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i", |
|
1240 * > "s", "small", "strike", "strong", "tt", "u" |
|
1241 */ |
|
1242 case '+B': |
|
1243 case '+BIG': |
|
1244 case '+CODE': |
|
1245 case '+EM': |
|
1246 case '+FONT': |
|
1247 case '+I': |
|
1248 case '+S': |
|
1249 case '+SMALL': |
|
1250 case '+STRIKE': |
|
1251 case '+STRONG': |
|
1252 case '+TT': |
|
1253 case '+U': |
|
1254 $this->reconstruct_active_formatting_elements(); |
|
1255 $this->insert_html_element( $this->state->current_token ); |
|
1256 $this->state->active_formatting_elements->push( $this->state->current_token ); |
|
1257 return true; |
|
1258 |
|
1259 /* |
|
1260 * > An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", |
|
1261 * > "nobr", "s", "small", "strike", "strong", "tt", "u" |
|
1262 */ |
|
1263 case '-A': |
|
1264 case '-B': |
|
1265 case '-BIG': |
|
1266 case '-CODE': |
|
1267 case '-EM': |
|
1268 case '-FONT': |
|
1269 case '-I': |
|
1270 case '-S': |
|
1271 case '-SMALL': |
|
1272 case '-STRIKE': |
|
1273 case '-STRONG': |
|
1274 case '-TT': |
|
1275 case '-U': |
|
1276 $this->run_adoption_agency_algorithm(); |
|
1277 return true; |
|
1278 |
|
1279 /* |
|
1280 * > An end tag whose tag name is "br" |
|
1281 * > Parse error. Drop the attributes from the token, and act as described in the next |
|
1282 * > entry; i.e. act as if this was a "br" start tag token with no attributes, rather |
|
1283 * > than the end tag token that it actually is. |
|
1284 */ |
|
1285 case '-BR': |
|
1286 $this->last_error = self::ERROR_UNSUPPORTED; |
|
1287 throw new WP_HTML_Unsupported_Exception( 'Closing BR tags require unimplemented special handling.' ); |
|
1288 |
|
1289 /* |
|
1290 * > A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr" |
|
1291 */ |
|
1292 case '+AREA': |
|
1293 case '+BR': |
|
1294 case '+EMBED': |
|
1295 case '+IMG': |
|
1296 case '+KEYGEN': |
|
1297 case '+WBR': |
|
1298 $this->reconstruct_active_formatting_elements(); |
|
1299 $this->insert_html_element( $this->state->current_token ); |
|
1300 $this->state->frameset_ok = false; |
|
1301 return true; |
|
1302 |
|
1303 /* |
|
1304 * > A start tag whose tag name is "input" |
|
1305 */ |
|
1306 case '+INPUT': |
|
1307 $this->reconstruct_active_formatting_elements(); |
|
1308 $this->insert_html_element( $this->state->current_token ); |
|
1309 $type_attribute = $this->get_attribute( 'type' ); |
|
1310 /* |
|
1311 * > If the token does not have an attribute with the name "type", or if it does, |
|
1312 * > but that attribute's value is not an ASCII case-insensitive match for the |
|
1313 * > string "hidden", then: set the frameset-ok flag to "not ok". |
|
1314 */ |
|
1315 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) { |
|
1316 $this->state->frameset_ok = false; |
|
1317 } |
|
1318 return true; |
|
1319 |
|
1320 /* |
|
1321 * > A start tag whose tag name is "hr" |
|
1322 */ |
|
1323 case '+HR': |
|
1324 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) { |
|
1325 $this->close_a_p_element(); |
|
1326 } |
|
1327 $this->insert_html_element( $this->state->current_token ); |
|
1328 $this->state->frameset_ok = false; |
|
1329 return true; |
|
1330 |
|
1331 /* |
|
1332 * > A start tag whose tag name is one of: "param", "source", "track" |
|
1333 */ |
|
1334 case '+PARAM': |
|
1335 case '+SOURCE': |
|
1336 case '+TRACK': |
|
1337 $this->insert_html_element( $this->state->current_token ); |
|
1338 return true; |
|
1339 } |
|
1340 |
|
1341 /* |
|
1342 * These tags require special handling in the 'in body' insertion mode |
|
1343 * but that handling hasn't yet been implemented. |
|
1344 * |
|
1345 * As the rules for each tag are implemented, the corresponding tag |
|
1346 * name should be removed from this list. An accompanying test should |
|
1347 * help ensure this list is maintained. |
|
1348 * |
|
1349 * @see Tests_HtmlApi_WpHtmlProcessor::test_step_in_body_fails_on_unsupported_tags |
|
1350 * |
|
1351 * Since this switch structure throws a WP_HTML_Unsupported_Exception, it's |
|
1352 * possible to handle "any other start tag" and "any other end tag" below, |
|
1353 * as that guarantees execution doesn't proceed for the unimplemented tags. |
|
1354 * |
|
1355 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody |
|
1356 */ |
|
1357 switch ( $token_name ) { |
|
1358 case 'APPLET': |
|
1359 case 'BASE': |
|
1360 case 'BASEFONT': |
|
1361 case 'BGSOUND': |
|
1362 case 'BODY': |
|
1363 case 'CAPTION': |
|
1364 case 'COL': |
|
1365 case 'COLGROUP': |
|
1366 case 'FORM': |
|
1367 case 'FRAME': |
|
1368 case 'FRAMESET': |
|
1369 case 'HEAD': |
|
1370 case 'HTML': |
|
1371 case 'IFRAME': |
|
1372 case 'LINK': |
|
1373 case 'MARQUEE': |
|
1374 case 'MATH': |
|
1375 case 'META': |
|
1376 case 'NOBR': |
|
1377 case 'NOEMBED': |
|
1378 case 'NOFRAMES': |
|
1379 case 'NOSCRIPT': |
|
1380 case 'OBJECT': |
|
1381 case 'OPTGROUP': |
|
1382 case 'OPTION': |
|
1383 case 'PLAINTEXT': |
|
1384 case 'RB': |
|
1385 case 'RP': |
|
1386 case 'RT': |
|
1387 case 'RTC': |
|
1388 case 'SARCASM': |
|
1389 case 'SCRIPT': |
|
1390 case 'SELECT': |
|
1391 case 'STYLE': |
|
1392 case 'SVG': |
|
1393 case 'TABLE': |
|
1394 case 'TBODY': |
|
1395 case 'TD': |
|
1396 case 'TEMPLATE': |
|
1397 case 'TEXTAREA': |
|
1398 case 'TFOOT': |
|
1399 case 'TH': |
|
1400 case 'THEAD': |
|
1401 case 'TITLE': |
|
1402 case 'TR': |
|
1403 case 'XMP': |
|
1404 $this->last_error = self::ERROR_UNSUPPORTED; |
|
1405 throw new WP_HTML_Unsupported_Exception( "Cannot process {$token_name} element." ); |
|
1406 } |
|
1407 |
|
1408 if ( ! parent::is_tag_closer() ) { |
|
1409 /* |
|
1410 * > Any other start tag |
|
1411 */ |
|
1412 $this->reconstruct_active_formatting_elements(); |
|
1413 $this->insert_html_element( $this->state->current_token ); |
|
1414 return true; |
|
1415 } else { |
|
1416 /* |
|
1417 * > Any other end tag |
|
1418 */ |
|
1419 |
|
1420 /* |
|
1421 * Find the corresponding tag opener in the stack of open elements, if |
|
1422 * it exists before reaching a special element, which provides a kind |
|
1423 * of boundary in the stack. For example, a `</custom-tag>` should not |
|
1424 * close anything beyond its containing `P` or `DIV` element. |
|
1425 */ |
|
1426 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) { |
|
1427 if ( $token_name === $node->node_name ) { |
|
1428 break; |
|
1429 } |
|
1430 |
|
1431 if ( self::is_special( $node->node_name ) ) { |
|
1432 // This is a parse error, ignore the token. |
|
1433 return $this->step(); |
|
1434 } |
|
1435 } |
|
1436 |
|
1437 $this->generate_implied_end_tags( $token_name ); |
|
1438 if ( $node !== $this->state->stack_of_open_elements->current_node() ) { |
|
1439 // @todo Record parse error: this error doesn't impact parsing. |
|
1440 } |
|
1441 |
|
1442 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { |
|
1443 $this->state->stack_of_open_elements->pop(); |
|
1444 if ( $node === $item ) { |
|
1445 return true; |
|
1446 } |
|
1447 } |
|
1448 } |
|
1449 } |
|
1450 |
|
1451 /* |
|
1452 * Internal helpers |
|
1453 */ |
|
1454 |
|
1455 /** |
|
1456 * Creates a new bookmark for the currently-matched token and returns the generated name. |
|
1457 * |
|
1458 * @since 6.4.0 |
|
1459 * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token(). |
|
1460 * |
|
1461 * @throws Exception When unable to allocate requested bookmark. |
|
1462 * |
|
1463 * @return string|false Name of created bookmark, or false if unable to create. |
|
1464 */ |
|
1465 private function bookmark_token() { |
|
1466 if ( ! parent::set_bookmark( ++$this->bookmark_counter ) ) { |
|
1467 $this->last_error = self::ERROR_EXCEEDED_MAX_BOOKMARKS; |
|
1468 throw new Exception( 'could not allocate bookmark' ); |
|
1469 } |
|
1470 |
|
1471 return "{$this->bookmark_counter}"; |
|
1472 } |
|
1473 |
|
1474 /* |
|
1475 * HTML semantic overrides for Tag Processor |
|
1476 */ |
|
1477 |
|
1478 /** |
|
1479 * Returns the uppercase name of the matched tag. |
|
1480 * |
|
1481 * The semantic rules for HTML specify that certain tags be reprocessed |
|
1482 * with a different tag name. Because of this, the tag name presented |
|
1483 * by the HTML Processor may differ from the one reported by the HTML |
|
1484 * Tag Processor, which doesn't apply these semantic rules. |
|
1485 * |
|
1486 * Example: |
|
1487 * |
|
1488 * $processor = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' ); |
|
1489 * $processor->next_tag() === true; |
|
1490 * $processor->get_tag() === 'DIV'; |
|
1491 * |
|
1492 * $processor->next_tag() === false; |
|
1493 * $processor->get_tag() === null; |
|
1494 * |
|
1495 * @since 6.4.0 |
|
1496 * |
|
1497 * @return string|null Name of currently matched tag in input HTML, or `null` if none found. |
|
1498 */ |
|
1499 public function get_tag() { |
|
1500 if ( null !== $this->last_error ) { |
|
1501 return null; |
|
1502 } |
|
1503 |
|
1504 if ( $this->is_virtual() ) { |
|
1505 return $this->current_element->token->node_name; |
|
1506 } |
|
1507 |
|
1508 $tag_name = parent::get_tag(); |
|
1509 |
|
1510 switch ( $tag_name ) { |
|
1511 case 'IMAGE': |
|
1512 /* |
|
1513 * > A start tag whose tag name is "image" |
|
1514 * > Change the token's tag name to "img" and reprocess it. (Don't ask.) |
|
1515 */ |
|
1516 return 'IMG'; |
|
1517 |
|
1518 default: |
|
1519 return $tag_name; |
|
1520 } |
|
1521 } |
|
1522 |
|
1523 /** |
|
1524 * Indicates if the currently matched tag contains the self-closing flag. |
|
1525 * |
|
1526 * No HTML elements ought to have the self-closing flag and for those, the self-closing |
|
1527 * flag will be ignored. For void elements this is benign because they "self close" |
|
1528 * automatically. For non-void HTML elements though problems will appear if someone |
|
1529 * intends to use a self-closing element in place of that element with an empty body. |
|
1530 * For HTML foreign elements and custom elements the self-closing flag determines if |
|
1531 * they self-close or not. |
|
1532 * |
|
1533 * This function does not determine if a tag is self-closing, |
|
1534 * but only if the self-closing flag is present in the syntax. |
|
1535 * |
|
1536 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1537 * |
|
1538 * @return bool Whether the currently matched tag contains the self-closing flag. |
|
1539 */ |
|
1540 public function has_self_closing_flag() { |
|
1541 return $this->is_virtual() ? false : parent::has_self_closing_flag(); |
|
1542 } |
|
1543 |
|
1544 /** |
|
1545 * Returns the node name represented by the token. |
|
1546 * |
|
1547 * This matches the DOM API value `nodeName`. Some values |
|
1548 * are static, such as `#text` for a text node, while others |
|
1549 * are dynamically generated from the token itself. |
|
1550 * |
|
1551 * Dynamic names: |
|
1552 * - Uppercase tag name for tag matches. |
|
1553 * - `html` for DOCTYPE declarations. |
|
1554 * |
|
1555 * Note that if the Tag Processor is not matched on a token |
|
1556 * then this function will return `null`, either because it |
|
1557 * hasn't yet found a token or because it reached the end |
|
1558 * of the document without matching a token. |
|
1559 * |
|
1560 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1561 * |
|
1562 * @return string|null Name of the matched token. |
|
1563 */ |
|
1564 public function get_token_name() { |
|
1565 return $this->is_virtual() |
|
1566 ? $this->current_element->token->node_name |
|
1567 : parent::get_token_name(); |
|
1568 } |
|
1569 |
|
1570 /** |
|
1571 * Indicates the kind of matched token, if any. |
|
1572 * |
|
1573 * This differs from `get_token_name()` in that it always |
|
1574 * returns a static string indicating the type, whereas |
|
1575 * `get_token_name()` may return values derived from the |
|
1576 * token itself, such as a tag name or processing |
|
1577 * instruction tag. |
|
1578 * |
|
1579 * Possible values: |
|
1580 * - `#tag` when matched on a tag. |
|
1581 * - `#text` when matched on a text node. |
|
1582 * - `#cdata-section` when matched on a CDATA node. |
|
1583 * - `#comment` when matched on a comment. |
|
1584 * - `#doctype` when matched on a DOCTYPE declaration. |
|
1585 * - `#presumptuous-tag` when matched on an empty tag closer. |
|
1586 * - `#funky-comment` when matched on a funky comment. |
|
1587 * |
|
1588 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1589 * |
|
1590 * @return string|null What kind of token is matched, or null. |
|
1591 */ |
|
1592 public function get_token_type() { |
|
1593 if ( $this->is_virtual() ) { |
|
1594 /* |
|
1595 * This logic comes from the Tag Processor. |
|
1596 * |
|
1597 * @todo It would be ideal not to repeat this here, but it's not clearly |
|
1598 * better to allow passing a token name to `get_token_type()`. |
|
1599 */ |
|
1600 $node_name = $this->current_element->token->node_name; |
|
1601 $starting_char = $node_name[0]; |
|
1602 if ( 'A' <= $starting_char && 'Z' >= $starting_char ) { |
|
1603 return '#tag'; |
|
1604 } |
|
1605 |
|
1606 if ( 'html' === $node_name ) { |
|
1607 return '#doctype'; |
|
1608 } |
|
1609 |
|
1610 return $node_name; |
|
1611 } |
|
1612 |
|
1613 return parent::get_token_type(); |
|
1614 } |
|
1615 |
|
1616 /** |
|
1617 * Returns the value of a requested attribute from a matched tag opener if that attribute exists. |
|
1618 * |
|
1619 * Example: |
|
1620 * |
|
1621 * $p = WP_HTML_Processor::create_fragment( '<div enabled class="test" data-test-id="14">Test</div>' ); |
|
1622 * $p->next_token() === true; |
|
1623 * $p->get_attribute( 'data-test-id' ) === '14'; |
|
1624 * $p->get_attribute( 'enabled' ) === true; |
|
1625 * $p->get_attribute( 'aria-label' ) === null; |
|
1626 * |
|
1627 * $p->next_tag() === false; |
|
1628 * $p->get_attribute( 'class' ) === null; |
|
1629 * |
|
1630 * @since 6.6.0 Subclassed for HTML Processor. |
|
1631 * |
|
1632 * @param string $name Name of attribute whose value is requested. |
|
1633 * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`. |
|
1634 */ |
|
1635 public function get_attribute( $name ) { |
|
1636 return $this->is_virtual() ? null : parent::get_attribute( $name ); |
|
1637 } |
|
1638 |
|
1639 /** |
|
1640 * Updates or creates a new attribute on the currently matched tag with the passed value. |
|
1641 * |
|
1642 * For boolean attributes special handling is provided: |
|
1643 * - When `true` is passed as the value, then only the attribute name is added to the tag. |
|
1644 * - When `false` is passed, the attribute gets removed if it existed before. |
|
1645 * |
|
1646 * For string attributes, the value is escaped using the `esc_attr` function. |
|
1647 * |
|
1648 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1649 * |
|
1650 * @param string $name The attribute name to target. |
|
1651 * @param string|bool $value The new attribute value. |
|
1652 * @return bool Whether an attribute value was set. |
|
1653 */ |
|
1654 public function set_attribute( $name, $value ) { |
|
1655 return $this->is_virtual() ? false : parent::set_attribute( $name, $value ); |
|
1656 } |
|
1657 |
|
1658 /** |
|
1659 * Remove an attribute from the currently-matched tag. |
|
1660 * |
|
1661 * @since 6.6.0 Subclassed for HTML Processor. |
|
1662 * |
|
1663 * @param string $name The attribute name to remove. |
|
1664 * @return bool Whether an attribute was removed. |
|
1665 */ |
|
1666 public function remove_attribute( $name ) { |
|
1667 return $this->is_virtual() ? false : parent::remove_attribute( $name ); |
|
1668 } |
|
1669 |
|
1670 /** |
|
1671 * Gets lowercase names of all attributes matching a given prefix in the current tag. |
|
1672 * |
|
1673 * Note that matching is case-insensitive. This is in accordance with the spec: |
|
1674 * |
|
1675 * > There must never be two or more attributes on |
|
1676 * > the same start tag whose names are an ASCII |
|
1677 * > case-insensitive match for each other. |
|
1678 * - HTML 5 spec |
|
1679 * |
|
1680 * Example: |
|
1681 * |
|
1682 * $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' ); |
|
1683 * $p->next_tag( array( 'class_name' => 'test' ) ) === true; |
|
1684 * $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' ); |
|
1685 * |
|
1686 * $p->next_tag() === false; |
|
1687 * $p->get_attribute_names_with_prefix( 'data-' ) === null; |
|
1688 * |
|
1689 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1690 * |
|
1691 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive |
|
1692 * |
|
1693 * @param string $prefix Prefix of requested attribute names. |
|
1694 * @return array|null List of attribute names, or `null` when no tag opener is matched. |
|
1695 */ |
|
1696 public function get_attribute_names_with_prefix( $prefix ) { |
|
1697 return $this->is_virtual() ? null : parent::get_attribute_names_with_prefix( $prefix ); |
|
1698 } |
|
1699 |
|
1700 /** |
|
1701 * Adds a new class name to the currently matched tag. |
|
1702 * |
|
1703 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1704 * |
|
1705 * @param string $class_name The class name to add. |
|
1706 * @return bool Whether the class was set to be added. |
|
1707 */ |
|
1708 public function add_class( $class_name ) { |
|
1709 return $this->is_virtual() ? false : parent::add_class( $class_name ); |
|
1710 } |
|
1711 |
|
1712 /** |
|
1713 * Removes a class name from the currently matched tag. |
|
1714 * |
|
1715 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1716 * |
|
1717 * @param string $class_name The class name to remove. |
|
1718 * @return bool Whether the class was set to be removed. |
|
1719 */ |
|
1720 public function remove_class( $class_name ) { |
|
1721 return $this->is_virtual() ? false : parent::remove_class( $class_name ); |
|
1722 } |
|
1723 |
|
1724 /** |
|
1725 * Returns if a matched tag contains the given ASCII case-insensitive class name. |
|
1726 * |
|
1727 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1728 * |
|
1729 * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive. |
|
1730 * @return bool|null Whether the matched tag contains the given class name, or null if not matched. |
|
1731 */ |
|
1732 public function has_class( $wanted_class ) { |
|
1733 return $this->is_virtual() ? null : parent::has_class( $wanted_class ); |
|
1734 } |
|
1735 |
|
1736 /** |
|
1737 * Generator for a foreach loop to step through each class name for the matched tag. |
|
1738 * |
|
1739 * This generator function is designed to be used inside a "foreach" loop. |
|
1740 * |
|
1741 * Example: |
|
1742 * |
|
1743 * $p = WP_HTML_Processor::create_fragment( "<div class='free <egg<\tlang-en'>" ); |
|
1744 * $p->next_tag(); |
|
1745 * foreach ( $p->class_list() as $class_name ) { |
|
1746 * echo "{$class_name} "; |
|
1747 * } |
|
1748 * // Outputs: "free <egg> lang-en " |
|
1749 * |
|
1750 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1751 */ |
|
1752 public function class_list() { |
|
1753 return $this->is_virtual() ? null : parent::class_list(); |
|
1754 } |
|
1755 |
|
1756 /** |
|
1757 * Returns the modifiable text for a matched token, or an empty string. |
|
1758 * |
|
1759 * Modifiable text is text content that may be read and changed without |
|
1760 * changing the HTML structure of the document around it. This includes |
|
1761 * the contents of `#text` nodes in the HTML as well as the inner |
|
1762 * contents of HTML comments, Processing Instructions, and others, even |
|
1763 * though these nodes aren't part of a parsed DOM tree. They also contain |
|
1764 * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any |
|
1765 * other section in an HTML document which cannot contain HTML markup (DATA). |
|
1766 * |
|
1767 * If a token has no modifiable text then an empty string is returned to |
|
1768 * avoid needless crashing or type errors. An empty string does not mean |
|
1769 * that a token has modifiable text, and a token with modifiable text may |
|
1770 * have an empty string (e.g. a comment with no contents). |
|
1771 * |
|
1772 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1773 * |
|
1774 * @return string |
|
1775 */ |
|
1776 public function get_modifiable_text() { |
|
1777 return $this->is_virtual() ? '' : parent::get_modifiable_text(); |
|
1778 } |
|
1779 |
|
1780 /** |
|
1781 * Indicates what kind of comment produced the comment node. |
|
1782 * |
|
1783 * Because there are different kinds of HTML syntax which produce |
|
1784 * comments, the Tag Processor tracks and exposes this as a type |
|
1785 * for the comment. Nominally only regular HTML comments exist as |
|
1786 * they are commonly known, but a number of unrelated syntax errors |
|
1787 * also produce comments. |
|
1788 * |
|
1789 * @see self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT |
|
1790 * @see self::COMMENT_AS_CDATA_LOOKALIKE |
|
1791 * @see self::COMMENT_AS_INVALID_HTML |
|
1792 * @see self::COMMENT_AS_HTML_COMMENT |
|
1793 * @see self::COMMENT_AS_PI_NODE_LOOKALIKE |
|
1794 * |
|
1795 * @since 6.6.0 Subclassed for the HTML Processor. |
|
1796 * |
|
1797 * @return string|null |
|
1798 */ |
|
1799 public function get_comment_type() { |
|
1800 return $this->is_virtual() ? null : parent::get_comment_type(); |
|
1801 } |
|
1802 |
|
1803 /** |
|
1804 * Removes a bookmark that is no longer needed. |
|
1805 * |
|
1806 * Releasing a bookmark frees up the small |
|
1807 * performance overhead it requires. |
|
1808 * |
|
1809 * @since 6.4.0 |
|
1810 * |
|
1811 * @param string $bookmark_name Name of the bookmark to remove. |
|
1812 * @return bool Whether the bookmark already existed before removal. |
|
1813 */ |
|
1814 public function release_bookmark( $bookmark_name ) { |
|
1815 return parent::release_bookmark( "_{$bookmark_name}" ); |
|
1816 } |
|
1817 |
|
1818 /** |
|
1819 * Moves the internal cursor in the HTML Processor to a given bookmark's location. |
|
1820 * |
|
1821 * Be careful! Seeking backwards to a previous location resets the parser to the |
|
1822 * start of the document and reparses the entire contents up until it finds the |
|
1823 * sought-after bookmarked location. |
|
1824 * |
|
1825 * In order to prevent accidental infinite loops, there's a |
|
1826 * maximum limit on the number of times seek() can be called. |
|
1827 * |
|
1828 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document. |
|
1829 * |
|
1830 * @since 6.4.0 |
|
1831 * |
|
1832 * @param string $bookmark_name Jump to the place in the document identified by this bookmark name. |
|
1833 * @return bool Whether the internal cursor was successfully moved to the bookmark's location. |
|
1834 */ |
|
1835 public function seek( $bookmark_name ) { |
|
1836 // Flush any pending updates to the document before beginning. |
|
1837 $this->get_updated_html(); |
|
1838 |
|
1839 $actual_bookmark_name = "_{$bookmark_name}"; |
|
1840 $processor_started_at = $this->state->current_token |
|
1841 ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start |
|
1842 : 0; |
|
1843 $bookmark_starts_at = $this->bookmarks[ $actual_bookmark_name ]->start; |
|
1844 $bookmark_length = $this->bookmarks[ $actual_bookmark_name ]->length; |
|
1845 $direction = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward'; |
|
1846 |
|
1847 /* |
|
1848 * If seeking backwards, it's possible that the sought-after bookmark exists within an element |
|
1849 * which has been closed before the current cursor; in other words, it has already been removed |
|
1850 * from the stack of open elements. This means that it's insufficient to simply pop off elements |
|
1851 * from the stack of open elements which appear after the bookmarked location and then jump to |
|
1852 * that location, as the elements which were open before won't be re-opened. |
|
1853 * |
|
1854 * In order to maintain consistency, the HTML Processor rewinds to the start of the document |
|
1855 * and reparses everything until it finds the sought-after bookmark. |
|
1856 * |
|
1857 * There are potentially better ways to do this: cache the parser state for each bookmark and |
|
1858 * restore it when seeking; store an immutable and idempotent register of where elements open |
|
1859 * and close. |
|
1860 * |
|
1861 * If caching the parser state it will be essential to properly maintain the cached stack of |
|
1862 * open elements and active formatting elements when modifying the document. This could be a |
|
1863 * tedious and time-consuming process as well, and so for now will not be performed. |
|
1864 * |
|
1865 * It may be possible to track bookmarks for where elements open and close, and in doing so |
|
1866 * be able to quickly recalculate breadcrumbs for any element in the document. It may even |
|
1867 * be possible to remove the stack of open elements and compute it on the fly this way. |
|
1868 * If doing this, the parser would need to track the opening and closing locations for all |
|
1869 * tokens in the breadcrumb path for any and all bookmarks. By utilizing bookmarks themselves |
|
1870 * this list could be automatically maintained while modifying the document. Finding the |
|
1871 * breadcrumbs would then amount to traversing that list from the start until the token |
|
1872 * being inspected. Once an element closes, if there are no bookmarks pointing to locations |
|
1873 * within that element, then all of these locations may be forgotten to save on memory use |
|
1874 * and computation time. |
|
1875 */ |
|
1876 if ( 'backward' === $direction ) { |
|
1877 /* |
|
1878 * Instead of clearing the parser state and starting fresh, calling the stack methods |
|
1879 * maintains the proper flags in the parser. |
|
1880 */ |
|
1881 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { |
|
1882 if ( 'context-node' === $item->bookmark_name ) { |
|
1883 break; |
|
1884 } |
|
1885 |
|
1886 $this->state->stack_of_open_elements->remove_node( $item ); |
|
1887 } |
|
1888 |
|
1889 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { |
|
1890 if ( 'context-node' === $item->bookmark_name ) { |
|
1891 break; |
|
1892 } |
|
1893 |
|
1894 $this->state->active_formatting_elements->remove_node( $item ); |
|
1895 } |
|
1896 |
|
1897 parent::seek( 'context-node' ); |
|
1898 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY; |
|
1899 $this->state->frameset_ok = true; |
|
1900 $this->element_queue = array(); |
|
1901 $this->current_element = null; |
|
1902 } |
|
1903 |
|
1904 // When moving forwards, reparse the document until reaching the same location as the original bookmark. |
|
1905 if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) { |
|
1906 return true; |
|
1907 } |
|
1908 |
|
1909 while ( $this->next_token() ) { |
|
1910 if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) { |
|
1911 while ( isset( $this->current_element ) && WP_HTML_Stack_Event::POP === $this->current_element->operation ) { |
|
1912 $this->current_element = array_shift( $this->element_queue ); |
|
1913 } |
|
1914 return true; |
|
1915 } |
|
1916 } |
|
1917 |
|
1918 return false; |
|
1919 } |
|
1920 |
|
1921 /** |
|
1922 * Sets a bookmark in the HTML document. |
|
1923 * |
|
1924 * Bookmarks represent specific places or tokens in the HTML |
|
1925 * document, such as a tag opener or closer. When applying |
|
1926 * edits to a document, such as setting an attribute, the |
|
1927 * text offsets of that token may shift; the bookmark is |
|
1928 * kept updated with those shifts and remains stable unless |
|
1929 * the entire span of text in which the token sits is removed. |
|
1930 * |
|
1931 * Release bookmarks when they are no longer needed. |
|
1932 * |
|
1933 * Example: |
|
1934 * |
|
1935 * <main><h2>Surprising fact you may not know!</h2></main> |
|
1936 * ^ ^ |
|
1937 * \-|-- this `H2` opener bookmark tracks the token |
|
1938 * |
|
1939 * <main class="clickbait"><h2>Surprising fact you may no… |
|
1940 * ^ ^ |
|
1941 * \-|-- it shifts with edits |
|
1942 * |
|
1943 * Bookmarks provide the ability to seek to a previously-scanned |
|
1944 * place in the HTML document. This avoids the need to re-scan |
|
1945 * the entire document. |
|
1946 * |
|
1947 * Example: |
|
1948 * |
|
1949 * <ul><li>One</li><li>Two</li><li>Three</li></ul> |
|
1950 * ^^^^ |
|
1951 * want to note this last item |
|
1952 * |
|
1953 * $p = new WP_HTML_Tag_Processor( $html ); |
|
1954 * $in_list = false; |
|
1955 * while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) { |
|
1956 * if ( 'UL' === $p->get_tag() ) { |
|
1957 * if ( $p->is_tag_closer() ) { |
|
1958 * $in_list = false; |
|
1959 * $p->set_bookmark( 'resume' ); |
|
1960 * if ( $p->seek( 'last-li' ) ) { |
|
1961 * $p->add_class( 'last-li' ); |
|
1962 * } |
|
1963 * $p->seek( 'resume' ); |
|
1964 * $p->release_bookmark( 'last-li' ); |
|
1965 * $p->release_bookmark( 'resume' ); |
|
1966 * } else { |
|
1967 * $in_list = true; |
|
1968 * } |
|
1969 * } |
|
1970 * |
|
1971 * if ( 'LI' === $p->get_tag() ) { |
|
1972 * $p->set_bookmark( 'last-li' ); |
|
1973 * } |
|
1974 * } |
|
1975 * |
|
1976 * Bookmarks intentionally hide the internal string offsets |
|
1977 * to which they refer. They are maintained internally as |
|
1978 * updates are applied to the HTML document and therefore |
|
1979 * retain their "position" - the location to which they |
|
1980 * originally pointed. The inability to use bookmarks with |
|
1981 * functions like `substr` is therefore intentional to guard |
|
1982 * against accidentally breaking the HTML. |
|
1983 * |
|
1984 * Because bookmarks allocate memory and require processing |
|
1985 * for every applied update, they are limited and require |
|
1986 * a name. They should not be created with programmatically-made |
|
1987 * names, such as "li_{$index}" with some loop. As a general |
|
1988 * rule they should only be created with string-literal names |
|
1989 * like "start-of-section" or "last-paragraph". |
|
1990 * |
|
1991 * Bookmarks are a powerful tool to enable complicated behavior. |
|
1992 * Consider double-checking that you need this tool if you are |
|
1993 * reaching for it, as inappropriate use could lead to broken |
|
1994 * HTML structure or unwanted processing overhead. |
|
1995 * |
|
1996 * @since 6.4.0 |
|
1997 * |
|
1998 * @param string $bookmark_name Identifies this particular bookmark. |
|
1999 * @return bool Whether the bookmark was successfully created. |
|
2000 */ |
|
2001 public function set_bookmark( $bookmark_name ) { |
|
2002 return parent::set_bookmark( "_{$bookmark_name}" ); |
|
2003 } |
|
2004 |
|
2005 /** |
|
2006 * Checks whether a bookmark with the given name exists. |
|
2007 * |
|
2008 * @since 6.5.0 |
|
2009 * |
|
2010 * @param string $bookmark_name Name to identify a bookmark that potentially exists. |
|
2011 * @return bool Whether that bookmark exists. |
|
2012 */ |
|
2013 public function has_bookmark( $bookmark_name ) { |
|
2014 return parent::has_bookmark( "_{$bookmark_name}" ); |
|
2015 } |
|
2016 |
|
2017 /* |
|
2018 * HTML Parsing Algorithms |
|
2019 */ |
|
2020 |
|
2021 /** |
|
2022 * Closes a P element. |
|
2023 * |
|
2024 * @since 6.4.0 |
|
2025 * |
|
2026 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. |
|
2027 * |
|
2028 * @see https://html.spec.whatwg.org/#close-a-p-element |
|
2029 */ |
|
2030 private function close_a_p_element() { |
|
2031 $this->generate_implied_end_tags( 'P' ); |
|
2032 $this->state->stack_of_open_elements->pop_until( 'P' ); |
|
2033 } |
|
2034 |
|
2035 /** |
|
2036 * Closes elements that have implied end tags. |
|
2037 * |
|
2038 * @since 6.4.0 |
|
2039 * |
|
2040 * @see https://html.spec.whatwg.org/#generate-implied-end-tags |
|
2041 * |
|
2042 * @param string|null $except_for_this_element Perform as if this element doesn't exist in the stack of open elements. |
|
2043 */ |
|
2044 private function generate_implied_end_tags( $except_for_this_element = null ) { |
|
2045 $elements_with_implied_end_tags = array( |
|
2046 'DD', |
|
2047 'DT', |
|
2048 'LI', |
|
2049 'P', |
|
2050 ); |
|
2051 |
|
2052 $current_node = $this->state->stack_of_open_elements->current_node(); |
|
2053 while ( |
|
2054 $current_node && $current_node->node_name !== $except_for_this_element && |
|
2055 in_array( $this->state->stack_of_open_elements->current_node(), $elements_with_implied_end_tags, true ) |
|
2056 ) { |
|
2057 $this->state->stack_of_open_elements->pop(); |
|
2058 } |
|
2059 } |
|
2060 |
|
2061 /** |
|
2062 * Closes elements that have implied end tags, thoroughly. |
|
2063 * |
|
2064 * See the HTML specification for an explanation why this is |
|
2065 * different from generating end tags in the normal sense. |
|
2066 * |
|
2067 * @since 6.4.0 |
|
2068 * |
|
2069 * @see WP_HTML_Processor::generate_implied_end_tags |
|
2070 * @see https://html.spec.whatwg.org/#generate-implied-end-tags |
|
2071 */ |
|
2072 private function generate_implied_end_tags_thoroughly() { |
|
2073 $elements_with_implied_end_tags = array( |
|
2074 'DD', |
|
2075 'DT', |
|
2076 'LI', |
|
2077 'P', |
|
2078 ); |
|
2079 |
|
2080 while ( in_array( $this->state->stack_of_open_elements->current_node(), $elements_with_implied_end_tags, true ) ) { |
|
2081 $this->state->stack_of_open_elements->pop(); |
|
2082 } |
|
2083 } |
|
2084 |
|
2085 /** |
|
2086 * Reconstructs the active formatting elements. |
|
2087 * |
|
2088 * > This has the effect of reopening all the formatting elements that were opened |
|
2089 * > in the current body, cell, or caption (whichever is youngest) that haven't |
|
2090 * > been explicitly closed. |
|
2091 * |
|
2092 * @since 6.4.0 |
|
2093 * |
|
2094 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. |
|
2095 * |
|
2096 * @see https://html.spec.whatwg.org/#reconstruct-the-active-formatting-elements |
|
2097 * |
|
2098 * @return bool Whether any formatting elements needed to be reconstructed. |
|
2099 */ |
|
2100 private function reconstruct_active_formatting_elements() { |
|
2101 /* |
|
2102 * > If there are no entries in the list of active formatting elements, then there is nothing |
|
2103 * > to reconstruct; stop this algorithm. |
|
2104 */ |
|
2105 if ( 0 === $this->state->active_formatting_elements->count() ) { |
|
2106 return false; |
|
2107 } |
|
2108 |
|
2109 $last_entry = $this->state->active_formatting_elements->current_node(); |
|
2110 if ( |
|
2111 |
|
2112 /* |
|
2113 * > If the last (most recently added) entry in the list of active formatting elements is a marker; |
|
2114 * > stop this algorithm. |
|
2115 */ |
|
2116 'marker' === $last_entry->node_name || |
|
2117 |
|
2118 /* |
|
2119 * > If the last (most recently added) entry in the list of active formatting elements is an |
|
2120 * > element that is in the stack of open elements, then there is nothing to reconstruct; |
|
2121 * > stop this algorithm. |
|
2122 */ |
|
2123 $this->state->stack_of_open_elements->contains_node( $last_entry ) |
|
2124 ) { |
|
2125 return false; |
|
2126 } |
|
2127 |
|
2128 $this->last_error = self::ERROR_UNSUPPORTED; |
|
2129 throw new WP_HTML_Unsupported_Exception( 'Cannot reconstruct active formatting elements when advancing and rewinding is required.' ); |
|
2130 } |
|
2131 |
|
2132 /** |
|
2133 * Runs the adoption agency algorithm. |
|
2134 * |
|
2135 * @since 6.4.0 |
|
2136 * |
|
2137 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input. |
|
2138 * |
|
2139 * @see https://html.spec.whatwg.org/#adoption-agency-algorithm |
|
2140 */ |
|
2141 private function run_adoption_agency_algorithm() { |
|
2142 $budget = 1000; |
|
2143 $subject = $this->get_tag(); |
|
2144 $current_node = $this->state->stack_of_open_elements->current_node(); |
|
2145 |
|
2146 if ( |
|
2147 // > If the current node is an HTML element whose tag name is subject |
|
2148 $current_node && $subject === $current_node->node_name && |
|
2149 // > the current node is not in the list of active formatting elements |
|
2150 ! $this->state->active_formatting_elements->contains_node( $current_node ) |
|
2151 ) { |
|
2152 $this->state->stack_of_open_elements->pop(); |
|
2153 return; |
|
2154 } |
|
2155 |
|
2156 $outer_loop_counter = 0; |
|
2157 while ( $budget-- > 0 ) { |
|
2158 if ( $outer_loop_counter++ >= 8 ) { |
|
2159 return; |
|
2160 } |
|
2161 |
|
2162 /* |
|
2163 * > Let formatting element be the last element in the list of active formatting elements that: |
|
2164 * > - is between the end of the list and the last marker in the list, |
|
2165 * > if any, or the start of the list otherwise, |
|
2166 * > - and has the tag name subject. |
|
2167 */ |
|
2168 $formatting_element = null; |
|
2169 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) { |
|
2170 if ( 'marker' === $item->node_name ) { |
|
2171 break; |
|
2172 } |
|
2173 |
|
2174 if ( $subject === $item->node_name ) { |
|
2175 $formatting_element = $item; |
|
2176 break; |
|
2177 } |
|
2178 } |
|
2179 |
|
2180 // > If there is no such element, then return and instead act as described in the "any other end tag" entry above. |
|
2181 if ( null === $formatting_element ) { |
|
2182 $this->last_error = self::ERROR_UNSUPPORTED; |
|
2183 throw new WP_HTML_Unsupported_Exception( 'Cannot run adoption agency when "any other end tag" is required.' ); |
|
2184 } |
|
2185 |
|
2186 // > If formatting element is not in the stack of open elements, then this is a parse error; remove the element from the list, and return. |
|
2187 if ( ! $this->state->stack_of_open_elements->contains_node( $formatting_element ) ) { |
|
2188 $this->state->active_formatting_elements->remove_node( $formatting_element ); |
|
2189 return; |
|
2190 } |
|
2191 |
|
2192 // > If formatting element is in the stack of open elements, but the element is not in scope, then this is a parse error; return. |
|
2193 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $formatting_element->node_name ) ) { |
|
2194 return; |
|
2195 } |
|
2196 |
|
2197 /* |
|
2198 * > Let furthest block be the topmost node in the stack of open elements that is lower in the stack |
|
2199 * > than formatting element, and is an element in the special category. There might not be one. |
|
2200 */ |
|
2201 $is_above_formatting_element = true; |
|
2202 $furthest_block = null; |
|
2203 foreach ( $this->state->stack_of_open_elements->walk_down() as $item ) { |
|
2204 if ( $is_above_formatting_element && $formatting_element->bookmark_name !== $item->bookmark_name ) { |
|
2205 continue; |
|
2206 } |
|
2207 |
|
2208 if ( $is_above_formatting_element ) { |
|
2209 $is_above_formatting_element = false; |
|
2210 continue; |
|
2211 } |
|
2212 |
|
2213 if ( self::is_special( $item->node_name ) ) { |
|
2214 $furthest_block = $item; |
|
2215 break; |
|
2216 } |
|
2217 } |
|
2218 |
|
2219 /* |
|
2220 * > If there is no furthest block, then the UA must first pop all the nodes from the bottom of the |
|
2221 * > stack of open elements, from the current node up to and including formatting element, then |
|
2222 * > remove formatting element from the list of active formatting elements, and finally return. |
|
2223 */ |
|
2224 if ( null === $furthest_block ) { |
|
2225 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) { |
|
2226 $this->state->stack_of_open_elements->pop(); |
|
2227 |
|
2228 if ( $formatting_element->bookmark_name === $item->bookmark_name ) { |
|
2229 $this->state->active_formatting_elements->remove_node( $formatting_element ); |
|
2230 return; |
|
2231 } |
|
2232 } |
|
2233 } |
|
2234 |
|
2235 $this->last_error = self::ERROR_UNSUPPORTED; |
|
2236 throw new WP_HTML_Unsupported_Exception( 'Cannot extract common ancestor in adoption agency algorithm.' ); |
|
2237 } |
|
2238 |
|
2239 $this->last_error = self::ERROR_UNSUPPORTED; |
|
2240 throw new WP_HTML_Unsupported_Exception( 'Cannot run adoption agency when looping required.' ); |
|
2241 } |
|
2242 |
|
2243 /** |
|
2244 * Inserts an HTML element on the stack of open elements. |
|
2245 * |
|
2246 * @since 6.4.0 |
|
2247 * |
|
2248 * @see https://html.spec.whatwg.org/#insert-a-foreign-element |
|
2249 * |
|
2250 * @param WP_HTML_Token $token Name of bookmark pointing to element in original input HTML. |
|
2251 */ |
|
2252 private function insert_html_element( $token ) { |
|
2253 $this->state->stack_of_open_elements->push( $token ); |
|
2254 } |
|
2255 |
|
2256 /* |
|
2257 * HTML Specification Helpers |
|
2258 */ |
|
2259 |
|
2260 /** |
|
2261 * Returns whether an element of a given name is in the HTML special category. |
|
2262 * |
|
2263 * @since 6.4.0 |
|
2264 * |
|
2265 * @see https://html.spec.whatwg.org/#special |
|
2266 * |
|
2267 * @param string $tag_name Name of element to check. |
|
2268 * @return bool Whether the element of the given name is in the special category. |
|
2269 */ |
|
2270 public static function is_special( $tag_name ) { |
|
2271 $tag_name = strtoupper( $tag_name ); |
|
2272 |
|
2273 return ( |
|
2274 'ADDRESS' === $tag_name || |
|
2275 'APPLET' === $tag_name || |
|
2276 'AREA' === $tag_name || |
|
2277 'ARTICLE' === $tag_name || |
|
2278 'ASIDE' === $tag_name || |
|
2279 'BASE' === $tag_name || |
|
2280 'BASEFONT' === $tag_name || |
|
2281 'BGSOUND' === $tag_name || |
|
2282 'BLOCKQUOTE' === $tag_name || |
|
2283 'BODY' === $tag_name || |
|
2284 'BR' === $tag_name || |
|
2285 'BUTTON' === $tag_name || |
|
2286 'CAPTION' === $tag_name || |
|
2287 'CENTER' === $tag_name || |
|
2288 'COL' === $tag_name || |
|
2289 'COLGROUP' === $tag_name || |
|
2290 'DD' === $tag_name || |
|
2291 'DETAILS' === $tag_name || |
|
2292 'DIR' === $tag_name || |
|
2293 'DIV' === $tag_name || |
|
2294 'DL' === $tag_name || |
|
2295 'DT' === $tag_name || |
|
2296 'EMBED' === $tag_name || |
|
2297 'FIELDSET' === $tag_name || |
|
2298 'FIGCAPTION' === $tag_name || |
|
2299 'FIGURE' === $tag_name || |
|
2300 'FOOTER' === $tag_name || |
|
2301 'FORM' === $tag_name || |
|
2302 'FRAME' === $tag_name || |
|
2303 'FRAMESET' === $tag_name || |
|
2304 'H1' === $tag_name || |
|
2305 'H2' === $tag_name || |
|
2306 'H3' === $tag_name || |
|
2307 'H4' === $tag_name || |
|
2308 'H5' === $tag_name || |
|
2309 'H6' === $tag_name || |
|
2310 'HEAD' === $tag_name || |
|
2311 'HEADER' === $tag_name || |
|
2312 'HGROUP' === $tag_name || |
|
2313 'HR' === $tag_name || |
|
2314 'HTML' === $tag_name || |
|
2315 'IFRAME' === $tag_name || |
|
2316 'IMG' === $tag_name || |
|
2317 'INPUT' === $tag_name || |
|
2318 'KEYGEN' === $tag_name || |
|
2319 'LI' === $tag_name || |
|
2320 'LINK' === $tag_name || |
|
2321 'LISTING' === $tag_name || |
|
2322 'MAIN' === $tag_name || |
|
2323 'MARQUEE' === $tag_name || |
|
2324 'MENU' === $tag_name || |
|
2325 'META' === $tag_name || |
|
2326 'NAV' === $tag_name || |
|
2327 'NOEMBED' === $tag_name || |
|
2328 'NOFRAMES' === $tag_name || |
|
2329 'NOSCRIPT' === $tag_name || |
|
2330 'OBJECT' === $tag_name || |
|
2331 'OL' === $tag_name || |
|
2332 'P' === $tag_name || |
|
2333 'PARAM' === $tag_name || |
|
2334 'PLAINTEXT' === $tag_name || |
|
2335 'PRE' === $tag_name || |
|
2336 'SCRIPT' === $tag_name || |
|
2337 'SEARCH' === $tag_name || |
|
2338 'SECTION' === $tag_name || |
|
2339 'SELECT' === $tag_name || |
|
2340 'SOURCE' === $tag_name || |
|
2341 'STYLE' === $tag_name || |
|
2342 'SUMMARY' === $tag_name || |
|
2343 'TABLE' === $tag_name || |
|
2344 'TBODY' === $tag_name || |
|
2345 'TD' === $tag_name || |
|
2346 'TEMPLATE' === $tag_name || |
|
2347 'TEXTAREA' === $tag_name || |
|
2348 'TFOOT' === $tag_name || |
|
2349 'TH' === $tag_name || |
|
2350 'THEAD' === $tag_name || |
|
2351 'TITLE' === $tag_name || |
|
2352 'TR' === $tag_name || |
|
2353 'TRACK' === $tag_name || |
|
2354 'UL' === $tag_name || |
|
2355 'WBR' === $tag_name || |
|
2356 'XMP' === $tag_name || |
|
2357 |
|
2358 // MathML. |
|
2359 'MI' === $tag_name || |
|
2360 'MO' === $tag_name || |
|
2361 'MN' === $tag_name || |
|
2362 'MS' === $tag_name || |
|
2363 'MTEXT' === $tag_name || |
|
2364 'ANNOTATION-XML' === $tag_name || |
|
2365 |
|
2366 // SVG. |
|
2367 'FOREIGNOBJECT' === $tag_name || |
|
2368 'DESC' === $tag_name || |
|
2369 'TITLE' === $tag_name |
|
2370 ); |
|
2371 } |
|
2372 |
|
2373 /** |
|
2374 * Returns whether a given element is an HTML Void Element |
|
2375 * |
|
2376 * > area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr |
|
2377 * |
|
2378 * @since 6.4.0 |
|
2379 * |
|
2380 * @see https://html.spec.whatwg.org/#void-elements |
|
2381 * |
|
2382 * @param string $tag_name Name of HTML tag to check. |
|
2383 * @return bool Whether the given tag is an HTML Void Element. |
|
2384 */ |
|
2385 public static function is_void( $tag_name ) { |
|
2386 $tag_name = strtoupper( $tag_name ); |
|
2387 |
|
2388 return ( |
|
2389 'AREA' === $tag_name || |
|
2390 'BASE' === $tag_name || |
|
2391 'BASEFONT' === $tag_name || // Obsolete but still treated as void. |
|
2392 'BGSOUND' === $tag_name || // Obsolete but still treated as void. |
|
2393 'BR' === $tag_name || |
|
2394 'COL' === $tag_name || |
|
2395 'EMBED' === $tag_name || |
|
2396 'FRAME' === $tag_name || |
|
2397 'HR' === $tag_name || |
|
2398 'IMG' === $tag_name || |
|
2399 'INPUT' === $tag_name || |
|
2400 'KEYGEN' === $tag_name || // Obsolete but still treated as void. |
|
2401 'LINK' === $tag_name || |
|
2402 'META' === $tag_name || |
|
2403 'PARAM' === $tag_name || // Obsolete but still treated as void. |
|
2404 'SOURCE' === $tag_name || |
|
2405 'TRACK' === $tag_name || |
|
2406 'WBR' === $tag_name |
|
2407 ); |
|
2408 } |
|
2409 |
|
2410 /* |
|
2411 * Constants that would pollute the top of the class if they were found there. |
|
2412 */ |
|
2413 |
|
2414 /** |
|
2415 * Indicates that the next HTML token should be parsed and processed. |
|
2416 * |
|
2417 * @since 6.4.0 |
|
2418 * |
|
2419 * @var string |
|
2420 */ |
|
2421 const PROCESS_NEXT_NODE = 'process-next-node'; |
|
2422 |
|
2423 /** |
|
2424 * Indicates that the current HTML token should be reprocessed in the newly-selected insertion mode. |
|
2425 * |
|
2426 * @since 6.4.0 |
|
2427 * |
|
2428 * @var string |
|
2429 */ |
|
2430 const REPROCESS_CURRENT_NODE = 'reprocess-current-node'; |
|
2431 |
|
2432 /** |
|
2433 * Indicates that the current HTML token should be processed without advancing the parser. |
|
2434 * |
|
2435 * @since 6.5.0 |
|
2436 * |
|
2437 * @var string |
|
2438 */ |
|
2439 const PROCESS_CURRENT_NODE = 'process-current-node'; |
|
2440 |
|
2441 /** |
|
2442 * Indicates that the parser encountered unsupported markup and has bailed. |
|
2443 * |
|
2444 * @since 6.4.0 |
|
2445 * |
|
2446 * @var string |
|
2447 */ |
|
2448 const ERROR_UNSUPPORTED = 'unsupported'; |
|
2449 |
|
2450 /** |
|
2451 * Indicates that the parser encountered more HTML tokens than it |
|
2452 * was able to process and has bailed. |
|
2453 * |
|
2454 * @since 6.4.0 |
|
2455 * |
|
2456 * @var string |
|
2457 */ |
|
2458 const ERROR_EXCEEDED_MAX_BOOKMARKS = 'exceeded-max-bookmarks'; |
|
2459 |
|
2460 /** |
|
2461 * Unlock code that must be passed into the constructor to create this class. |
|
2462 * |
|
2463 * This class extends the WP_HTML_Tag_Processor, which has a public class |
|
2464 * constructor. Therefore, it's not possible to have a private constructor here. |
|
2465 * |
|
2466 * This unlock code is used to ensure that anyone calling the constructor is |
|
2467 * doing so with a full understanding that it's intended to be a private API. |
|
2468 * |
|
2469 * @access private |
|
2470 */ |
|
2471 const CONSTRUCTOR_UNLOCK_CODE = 'Use WP_HTML_Processor::create_fragment() instead of calling the class constructor directly.'; |
|
2472 } |