XPath (XML Path Language) – a query language to the elements of the XML-document. It is designed to provide access to parts of an XML document in XSLT transformation files and a standard Consortium W3C. XPath is designed to implement navigation in the DOM XML. In XPath uses a compact syntax different from that used in XML. In 2007, it completed the development of version 2.0, which now is part of the XQuery 1.0 language. In December 2009 started the development of version 2.1, which uses the XQuery 1.1.

At the moment, the most popular version is the XPath 1.0. This is due to lack of support for XPath 2.0 from open source libraries. In particular, we are talking about LibXML, which determines the language support in browsers, on the one hand, and the support of the server interpreter, on the other


Basics [edit. | edit code]

XML has a tree structure. The independent XML-document always has one root element (the instruction to the element tree has nothing to do), which made a series of nested elements, some of which may also contain nested elements. The same can occur text nodes, comments and instructions. We can assume that the XML-element comprises an array embedded in it elements and an array of attributes.

In the elements of the tree are ancestor elements and child elements (in the root element of the ancestors there, and at the dead-end elements (tree leaves) no children). Each wood element is at a certain level in the hierarchy (hereinafter – the level). The elements are arranged according to their order in the XML text, so you can talk about their previous and following elements. This is very similar to the organization of directories in the file system.

The string XPath describes a method for selecting the desired elements of the array, which may contain nested elements. Starts selection from the transmitted plurality of elements, at every step of the way selected elements corresponding step expression, and as a result is selected subset of items corresponding to the given path.

XPath-path / html / body / * / span [@class] will fit in it two elements of the original document – the first unit in the third layer and the second unit in the third layer

path elements are written in XPath preferably in capsule form.. The full form resulted above path is of the form / child ,, html / child ,, body / child ,, * / child ,, span [attribute ,, class]

  • axis (the default child. elements of the axis). In addition to the selection of the axis of nested elements, you can select a variety of other axes of the elements and the attributes axis (the attribute. It is denoted by the symbol @) (see. Below).
  • expression defining Filter elements (in this example the selection is done by compliance elements html document names, body, span, and use the symbol *, which will select all elements of the axis)
  • predicates (in this example, attribute ,, class) – additional selection criteria. There may be several. Each predicate is enclosed in square brackets, and implies a logical expression to check the sampled elements. If a predicate is not present, are selected all suitable elements.

Path analysis is carried out from left to right, and begin either in the context of the first element of the root node (in this example html element), and then on the child axis ,, will be invested in it elements (in this example, one element body), which is useful in the case of conventional processing XML-document with a single root node, or if at the beginning of XPath Set / character, in the context of all the root elements of the transmitted XML child ,, axis (in this example, it will be one element html). At each step in the current context addressing selected elements suitable under the conditions specified in step and a list of them is taken as the context for the next step or as a return value.

Thus, first step / child ,, html explicitly It makes the current context for the next step, a list of one element of html, that would be done, and so implicitly, if this step was not indicated.

in the second step of addressing in this example (step child ,, body) context is a list of one of the html element. Axis child ,, suggests that it is necessary to look at the names of the nested elements in the current context, and body condition test indicates that the generated set of elements you want to include those nodes whose body name. Thus, during the second step of addressing obtain a set of nodes, consisting of only one element body, which becomes the context for the third step

The third step is addressing,. Child ,, *. Axis child ,, includes all direct descendants of the element body, and check the condition * it says that in the generated list to include the main elements of any type name. During this step, we get a list consisting of three elements div, span of one and the same element img – a total of five elements

The fourth step is addressing,. Child ,, span / @ class. Its context is a list of five elements, so outgoing list created in five passes (five iterations). The first iteration of the context node becomes the first div. According to a given child ,, axis and span checking rule in the set of direct descendants have included this div-and whose name still span. There’s a one. In the second iteration is not set to add anything, as the second div has no children. The third iteration immediately sees three elements span. The fourth did not see, because the span element has no children span, and the fact that he himself span – it does not matter, because it is viewed descendants. The fifth also do not see anything, the img element is also no descendants span. So, in the course of verification could be obtained a set of nodes consisting of four elements span. This would be the context for the subsequent treatment, not be indicated at this stage of the predicate.

But as the predicate in the fourth step is, as you complete each of the five passages will be selected by the additional filtering components. In this case, the predicate attribute ,, axis suggests the need to check whether the sampled node attributes, and condition class requires leave only those nodes that have defined an attribute named class. And so the first iteration only found span filtering predicate does not pass, the third iteration of the filtering will be two elements of the three, and in the end, despite the fact that the filtering takes place in five iterations, in the final set will only include two items span.

Axis [edit | edit code]

  • child ,, – comprises a plurality of descendant elements (elements located one level below). This name is completely reduced, that is, it can be omitted altogether
  • descendant ,, -. Contains a complete set of child elements (ie, as the next child elements, and all of them descendant elements)
  • descendant-or-self ,, -. contains a complete set of elements-descendants and the current element. Expression / descendant-or-self ,, node () / can be reduced to //. With this axis, for example, it is possible to arrange the second step selection of elements from any node instead of only from the root, the first step is sufficient to take all the descendants of the root. For example, the path // span will select all units span the document, regardless of their position in the hierarchy, looking at both the root name, and the names of all its children, to the depth of their nesting.
  • ancestor ,, – contains many ancestor elements
  • ancestor-or-self ,, -. contains many elements, the ancestors and the current item
  • <. li> parent ,, – contains an element-ancestor back one level. This treatment can be replaced ..

  • self ,, – contains the current element. This treatment can be replaced by
  • following ,, -. Contains many elements located below the current element of the tree (at all levels and layers), excluding children of their own
  • .

  • following-sibling ,, – fraternal comprises a plurality of elements of the same level, following the current layer
  • preceding ,, -. comprises a plurality of elements located above the current element in wood (at all levels and layers) excluding their ancestors plurality
  • preceding-sibling ,, -. fraternal comprises a plurality of elements of the same level, preceding the current layer
  • attribute ,, -. comprises a plurality of attributes of the current element. This treatment can be replaced by a symbol @
  • namespace ,, – contains many elements relating to a particular namespace (that is, there is the xmlns attribute)
  • An expression that specifies the elements to select [edit | edit code]

    As part of the contents of the axis selection is performed according to the expression that defines the Filter elements.

    • specific name, then the selected items axis corresponding to the name
    • Set the symbol *, which will select all elements of the axis
    • specified expression composed of the functions, and then will be selected by the results of evaluating the expression in the context of each

    Functions on sets of nodes [edit | edit code]



    node-set node ()

    Returns the node itself. Instead, this function is often used substitute *, but, in contrast to the asterisk, node () returns and text sites

    string text ()

    Returns the unit if it text

    node-set current ()

    Returns a set of one element, which is the current. If we do the processing sets with predicates, the only way to reach this predicate to the current element is the function

    number position < / b> ()

    Returns the axis position in the set of elements. Works correctly only in the cycle

    number last ()

    Returns the number of the last element axis elements in the set. It works correctly only in the cycle

    number count (node-set)

    Returns the number of elements in the node-set.

    string name (node-set?) < td> Returns the full name of the first tag in the set

    string namespace-url (node-set? )

    Returns a reference to the URL, which determines the namespace

    string local-name (node-set?)

    Returns the name of the first tag in the set, no namespace

    node- set id (object)

    Finds the element with the unique identifier

    String functions [edit | edit code]



    string string (object?)

    Returns the text content of the element. In fact, returns united set of text elements in the one is below

    string concat (string, string, string *)

    Joins the lines specified in the arguments

    number string-length (string?)

    Returns the length of the string

    boolean contains (string, string)

    Returns true , if the first line contains a second, differently – false < / td>

    string substring (string, number, number?)

    Returns the string cut from a string, starting at the specified number, and when a second number – the number of characters

    string substring-before (string, string)

    If found in the second line of the first return line to the first occurrence of the second row

    string substring-after (string, string)

    If found second row in the first return line after the first occurrence of the second row < / tr>

    boolean starts-with (string, string)

    Returns true , if the second line comes to the top of the first, anyway – false

    boolean ends-with (string, string)

    Returns true , if the second string is included in the end of the first, anyway – false

    string normalize-space (string?)

    Removes superfluous and repetitive spaces, as well as control characters, replacing them with a space

    string translate (string, string, string)

    Replaces the characters of the first line, which occur in the second row on the respective positions of the symbols of the second line of characters from the third row. For example, translate ( bar, abc, ABC) returns BAr

    Logic Functions and Operators [edit. | edit code]

    symbol, the operator



    logical or»


    logical and» < tr>


    logical equal»

    (> (>)

    logical more»

    (> = (> =)

    Boolean greater than or equal»



    boolean boolean (object)

    Causes the object-boolean

    boolean true ()

    Returns true

    boolean false ( )

    Returns false

    boolean not (boolean) < td> Denial, returns true if the argument is false, and vice versa

    Numerical functions and operators [edit | edit to d]

    < td> 02 +

    symbol, operator



    03 –


    01 *



    usual division ( not entirely! )


    remainder of dividing



    number number ( object?)

    Puts the object to a number

    number sum (node-set )

    This will return the amount set. Each tag of the set is converted into a string and from the obtained number

    number floor (number)

    Returns the largest integer not greater than the argument (rounding to less)

    number ceiling (number)

    Returns the smallest integer not less
    han argument (rounded to greater)

    ​​number round (number)

    Rounds a number of mathematical rules

    System functions [edit | edit code]

    The predicates [edit | edit code]

    The predicates are logical expressions in square brackets, made according to the same principles as the selection expression. Expressions returning is not a Boolean value, and the empty set of elements that are considered to be false. An expression that returns the number of considered expression, comparing the number with the position (). When more than one predicate, each filter of filtration prior predicate.