XPath (XML Path Language) – a query language to the elements of the XML-document. It is designed to provide access to parts of an XML document in XSLT transformation files and a standard Consortium W3C. XPath is designed to implement navigation in the DOM XML. In XPath uses a compact syntax different from that used in XML. In 2007, it completed the development of version 2.0, which now is part of the XQuery 1.0 language. In December 2009 started the development of version 2.1, which uses the XQuery 1.1.
At the moment, the most popular version is the XPath 1.0. This is due to lack of support for XPath 2.0 from open source libraries. In particular, we are talking about LibXML, which determines the language support in browsers, on the one hand, and the support of the server interpreter, on the other
Basics [edit. | edit code]
XML has a tree structure. The independent XML-document always has one root element (the instruction to the element tree has nothing to do), which made a series of nested elements, some of which may also contain nested elements. The same can occur text nodes, comments and instructions. We can assume that the XML-element comprises an array embedded in it elements and an array of attributes.
In the elements of the tree are ancestor elements and child elements (in the root element of the ancestors there, and at the dead-end elements (tree leaves) no children). Each wood element is at a certain level in the hierarchy (hereinafter – the level). The elements are arranged according to their order in the XML text, so you can talk about their previous and following elements. This is very similar to the organization of directories in the file system.
The string XPath describes a method for selecting the desired elements of the array, which may contain nested elements. Starts selection from the transmitted plurality of elements, at every step of the way selected elements corresponding step expression, and as a result is selected subset of items corresponding to the given path.
XPath-path / html / body / * / span [@class] will fit in it two elements of the original document – the first unit in the third layer and the second unit in the third layer
path elements are written in XPath preferably in capsule form.. The full form resulted above path is of the form / child ,, html / child ,, body / child ,, * / child ,, span [attribute ,, class]
- axis (the default child. elements of the axis). In addition to the selection of the axis of nested elements, you can select a variety of other axes of the elements and the attributes axis (the attribute. It is denoted by the symbol @) (see. Below).
- expression defining Filter elements (in this example the selection is done by compliance elements html document names, body, span, and use the symbol *, which will select all elements of the axis)
- predicates (in this example, attribute ,, class) – additional selection criteria. There may be several. Each predicate is enclosed in square brackets, and implies a logical expression to check the sampled elements. If a predicate is not present, are selected all suitable elements.
Path analysis is carried out from left to right, and begin either in the context of the first element of the root node (in this example html element), and then on the child axis ,, will be invested in it elements (in this example, one element body), which is useful in the case of conventional processing XML-document with a single root node, or if at the beginning of XPath Set / character, in the context of all the root elements of the transmitted XML child ,, axis (in this example, it will be one element html). At each step in the current context addressing selected elements suitable under the conditions specified in step and a list of them is taken as the context for the next step or as a return value.
Thus, first step / child ,, html explicitly It makes the current context for the next step, a list of one element of html, that would be done, and so implicitly, if this step was not indicated.
in the second step of addressing in this example (step child ,, body) context is a list of one of the html element. Axis child ,, suggests that it is necessary to look at the names of the nested elements in the current context, and body condition test indicates that the generated set of elements you want to include those nodes whose body name. Thus, during the second step of addressing obtain a set of nodes, consisting of only one element body, which becomes the context for the third step
The third step is addressing,. Child ,, *. Axis child ,, includes all direct descendants of the element body, and check the condition * it says that in the generated list to include the main elements of any type name. During this step, we get a list consisting of three elements div, span of one and the same element img – a total of five elements
The fourth step is addressing,. Child ,, span / @ class. Its context is a list of five elements, so outgoing list created in five passes (five iterations). The first iteration of the context node becomes the first div. According to a given child ,, axis and span checking rule in the set of direct descendants have included this div-and whose name still span. There’s a one. In the second iteration is not set to add anything, as the second div has no children. The third iteration immediately sees three elements span. The fourth did not see, because the span element has no children span, and the fact that he himself span – it does not matter, because it is viewed descendants. The fifth also do not see anything, the img element is also no descendants span. So, in the course of verification could be obtained a set of nodes consisting of four elements span. This would be the context for the subsequent treatment, not be indicated at this stage of the predicate.
But as the predicate in the fourth step is, as you complete each of the five passages will be selected by the additional filtering components. In this case, the predicate attribute ,, axis suggests the need to check whether the sampled node attributes, and condition class requires leave only those nodes that have defined an attribute named class. And so the first iteration only found span filtering predicate does not pass, the third iteration of the filtering will be two elements of the three, and in the end, despite the fact that the filtering takes place in five iterations, in the final set will only include two items span.
Axis [edit | edit code]
- child ,, – comprises a plurality of descendant elements (elements located one level below). This name is completely reduced, that is, it can be omitted altogether
- descendant ,, -. Contains a complete set of child elements (ie, as the next child elements, and all of them descendant elements)
- descendant-or-self ,, -. contains a complete set of elements-descendants and the current element. Expression / descendant-or-self ,, node () / can be reduced to //. With this axis, for example, it is possible to arrange the second step selection of elements from any node instead of only from the root, the first step is sufficient to take all the descendants of the root. For example, the path // span will select all units span the document, regardless of their position in the hierarchy, looking at both the root name, and the names of all its children, to the depth of their nesting.
- ancestor ,, – contains many ancestor elements
- ancestor-or-self ,, -. contains many elements, the ancestors and the current item
- self ,, – contains the current element. This treatment can be replaced by
- following ,, -. Contains many elements located below the current element of the tree (at all levels and layers), excluding children of their own
- following-sibling ,, – fraternal comprises a plurality of elements of the same level, following the current layer
- preceding ,, -. comprises a plurality of elements located above the current element in wood (at all levels and layers) excluding their ancestors plurality
- preceding-sibling ,, -. fraternal comprises a plurality of elements of the same level, preceding the current layer
- attribute ,, -. comprises a plurality of attributes of the current element. This treatment can be replaced by a symbol @
- namespace ,, – contains many elements relating to a particular namespace (that is, there is the xmlns attribute)
- specific name, then the selected items axis corresponding to the name
- Set the symbol *, which will select all elements of the axis
- specified expression composed of the functions, and then will be selected by the results of evaluating the expression in the context of each
<. li> parent ,, – contains an element-ancestor back one level. This treatment can be replaced ..
An expression that specifies the elements to select [edit | edit code]
As part of the contents of the axis selection is performed according to the expression that defines the Filter elements.