Table of Contents

Metadata Storage

If the content stored in a wiki page is data, things like the time of last update, who updated it, the filesize etc. could all be regarded as metadata for the wiki page. This page describes where and how such additional data is stored in DokuWiki.

Metadata can also be used by plugins for different purposes, apart from storing obvious metadata for the page it can also be used to store data that can be used to determine whether a cache can be used or settings like if a certain feature of the plugin should be enabled on a page.

Storage

DokuWiki does not store all metadata at central place (like a database or registry). Metadata can basically be the own datafile's properties (eg. filesize, last modified date), the other metadata are kept by DokuWiki within the meta directory. Metadata are found within the .meta file corresponding to the wiki page name. There is also an index in which selected metadata can be searched.

Metadata Renderer

Info in the meta directory is initially written by the metadata renderer. It creates a parallel file for each page named <pageid>.meta in the meta directory. The file is a serialized multi-dimensional PHP array whose keys follow the Dublin Core element names.

Data Structure

Currently, the following metadata is saved by the core metadata renderer:

Additionally, plugins can support more metadata elements. Currently used:

It's recommended to use keys from the Dublin Core element set for any metadata that might be interesting for external use.

For plugin internal data it is recommended to store your keys under the plugin key:

This data is stored in an associative array with two keys: 'current' for all current data (including persistent one), 'persistent' for data that shall be kept over metadata rendering.

Metadata Persistence

Internally DokuWiki maintains two arrays of metadata, current & persistent. The persistent array holds duplicates of those key/values which should not be cleared during the rendering process. All requests for metadata values using p_get_metadata() are met using the current array.

Examples of persistent metadata keys are:

Running of metadata rendering

The metadata rendering is only started by the p_get_metadata() and p_set_metadata(). This differs from the xhtml renderer. The wikipage parsing process has two stages: generation of the instructions by the Handler and next the generation of xhtml output with these instructions as input. As all Renderers the metadata renderer uses the same instructions as input. In the metadata renderer the metadata can directly be accessed at renderer->meta and renderer->persistent. Some examples and bit of explanation can be found at syntax plugins development documentation.

The metadata renderer creates also an short raw text abstract. The abstract is created from the rendered instruction by adding compact text without html to $this->doc. Use the $this->capture to check whether the renderer still collects text for the abstract.

// capture only the first few sections. 
// Is switched off as well by eg. section metarenderer
if ($this->capture){ 
    if($linktitle) {
        $this->doc .= $linktitle;
    } else {
        $this->doc .= '<'.$url.'>';
    }
}

The timing is thus not equal to xhtml renderer, but depends on render flags given to the p_get_metadata() and the cache status. The logic here is to guarantee the metadata renderer is running when needed, but not unnecessary. Read more about render flags in functions to Get and Set Metadata below.

Metadata and Plugins

There are two ways for plugins to interact with metadata rendering:

Persistent metadata can also be set at any time using the p_set_metadata function that is described below, current metadata should only be set in the context of the renderer as it will be overwritten the next time metadata is rendered.

Metadata can be retrieved using the p_get_metadata function that is described below. Plugins can also add metadata to the metadata index and search the indexed metadata. This is used in the tag plugin.

Note that persistent metadata is never cleaned and always used as basis for the current metadata so when switching from persistent to non-persistent metadata in a plugin make sure you implement a cleanup routine which removes persistent metadata from your plugin whenever it exists. For this reason non-persistent metadata should also be preferred whenever possible.

If you want to make sure that your plugin's metadata doesn't interfere with other plugins or DokuWiki itself consider using plugin_$plugin as prefix/top level key (especially for persistent metadata, current metadata that fits in the Dublin Core element set should be stored as outlined above).

As it is very difficult to cleanly update persistent metadata properties that are arrays from various places (in most cases you don't know which is old metadata that should be cleaned up and which is metadata from other plugins that should be kept - or not because the plugin was disabled) consider using keys that are unique to your plugin for this case and merge them manually into the current metadata using the PARSER_METADATA_RENDER event, that way you can for example store custom tags in the persistent metadata and add them to the subject metadata. Then your plugin's metadata also won't be used anymore when your plugin is disabled.

Functions to Get and Set Metadata

There are two functions in inc/parserutils.php to deal with metadata:

Metadata and caching

In general, metadata is rendered on demand when p_get_metadata is called. This happens normally right after the redirect after saving a page but also from time to time when the cache expires or is expired by a plugin using the PARSER_CACHE_USE event or when caching has been disabled in the renderer (but at most once in every request). In the cache file itself only a timestamp is stored. The timestamp is always updated when metadata is rendered, the .meta file only when the metadata was actually changed (the xhtml cache depends on it, that way it is only updated when really needed).

When metadata is requested inside the cache handler the old metadata is returned, that way you can compare new data to the old stored metadata in order to decide whether to use the cache or not. In the xhtml cache handler you get the new metadata but as the xhtml cache depends on the metadata whenever you change the metadata the xhtml will be updated.

In versions prior to 2011, metadata was only rendered when the xhtml was rendered. Back then you got the old metadata in the xhtml cache handler, plugins that still rely on this need to be updated.

Metadata index

Since the 2011-05-25 (“Rincewind”) release there is an index where metadata properties can be stored. It is organized in a similar manner as the fulltextindex and uses the same page list but different word indexes for each indexed metadata property, they are named $metaname_w.idx, $metaname_i.idx and $metaname_p.idx. In DokuWiki itself currently the properties relation_references and title are indexed. Plugins can add their own metadata keys and it is also possible to add arbitrary data to the index. This can be done with the INDEXER_PAGE_ADD event. Plugins need to make sure they add themselves to the indexer version using the INDEXER_VERSION_GET event, the index of a page is re-created when this version is different from the version with which it has been indexed before. All metadata indexes are recorded in the metadata.idx index so deleted pages can be removed from all metadata indexes.

The data is updated right after the fulltextindex so it can be regenerated in the same way, when a plugin wants to force an update of the index of a certain page it can delete the .indexed meta file of that page (the index is not automatically updated when metadata is changed but only when the page itself is changed).

The indexer object (which can be obtained by using idx_get_indexer) supports the following methods for metadata:

Example for getting the ids of all pages that link to a certain page:

$result = idx_get_indexer()->lookupKey('relation_references', $id);

(note that this functionality including an ACL check is available as ft_backlinks($id)).

For more advanced queries (like getting all values stored for a certain metadata property) can be needed to access the index files directly using idx_getIndex, feel free to suggest additional features for the metadata index in the bug tracker.

The tag plugin uses the metadata index, in its helper part there are example of how the index can be queried, in its action part you can see how the index is written.