(anonymous guest) (logged out)

Copyright (C) by the contributors. Some rights reserved, license BY-SA.

Sponsored by the Wiki Symposium and the Nuveon GmbH.

Add new attachment

Only authorized users are allowed to upload new attachments.

This page (revision-108) was last changed on 24-Sep-2008 09:01 by

This page was created on 03-May-2007 01:01 by

Only authorized users are allowed to rename pages.

Only authorized users are allowed to delete pages.

Difference between version and

At line 51 added 31 lines
From my experience it is possible to break creole elements down to a reasonable set of //features// for processing on a case by case basis. The main feature is the creole //line type//, the remaining features are //open// and //closed// creole elements. If we treat creole elements as either //line type// elements, //open// elements, or //closed// elements it then becomes possible to easily determine when to escape and when not to escape.
**Line type elements** describe the block and, depending on their actual type, can contain regular text and/or other open or closed creole elements.
Line type elements include: //heading//, //horizontal rule//, //lists// (ordered and unordered), //paragraph//, //placeholder//, //preformatted//, and //table//.
With line type elements, starting characters are escaped. Escaped line type elements change the line type itself (e.g. {{{~==}}} changes from heading to paragraph). The contents of the block are escaped on a case by case basis. With preformatted, nothing is escaped. However, there is a special case for one or more escape characters (i.e. tilde) followed by three curly braces on a line by themselves. In such cases, following tradition, one tilde will be dropped to effectively escape and also allow representing any possible text inside the preformatted block. With placeholder, nothing is escaped. Perhaps it should have a special case like preformatted?
**Open elements** contain regular text and other open or closed creole elements.
Open elements include: //bold//, //italics//, and //table// (cell separators).
With open elements, all regular text is escaped. A tilde followed by a non alpha numeric character which is not a tab or a space (e.g. {{{[^\t 0-9A-Za-z]}}}) will drop the tilde and remove any special meaning from the following character.
**Closed elements** contain regular text, optional modifiers generally followed by regular text, and the necessary closing characters or end of line or file.
Closed elements include: //links// (regular and free standing external links), //image//, //nowiki//, and //placeholder//.
With closed elements, nothing is escaped.
When a series of closing characters exceeds the minimum requirement (e.g. {{{]]]}}}), only the final characters are used to close the element. This technique allows natural nesting of special characters to achieve results which might otherwise require escaping (e.g. {{{[[Home|[{{home.jpg|{Home!~}}}]]]}}} produces {{{<a href="Home">[<img src="home.jpg" title="{Home!}" />]</a>}}}). //Note: It was necessary to escape the example to produce the desired effect here.//
I don't believe pipe is legal in filenames and URLs (please correct me if I'm wrong!) and therefore doesn't present any issue here. Image source and link references are always specified first in their respective creole elements so pipe as an optional modifier works as expected, and does not require escaping.
This escape design provides a simple, safe and effective escaping mechanism which does not force authors to change important filenames and URLs to avoid accidentally escaping them. Of course, this idea depends a lot on the available parsing tools. It is implemented and working well in my Ragel based creole parser.
I must admit that I tend to over complicate things. If there's a better way to escape without forcing authors to change important filenames and URLs then let's do that. If I need to change my thinking around this issue then please help me with it. //Actually, now that I've gone and written all this, it looks way too complicated! I'll respond later.//
[[MarkWharton]] 2007-06-07
Version Date Modified Size Author Changes ... Change note
108 24-Sep-2008 09:01 47.251 kB to previous
107 24-May-2008 21:26 47.247 kB to previous | to last escape characacter in url answer
106 23-May-2008 19:53 46.839 kB to previous | to last Escape characters in URLs considered bad
105 19-May-2008 14:58 46.067 kB StephenDay to previous | to last answer
104 18-May-2008 17:23 45.863 kB to previous | to last Added discussion on TT tag
103 14-May-2008 08:30 45.52 kB BenKovitz to previous | to last Thanks for the clarification
102 13-May-2008 20:11 45.29 kB StephenDay to previous | to last answer
101 12-May-2008 22:45 45.05 kB BenKovitz to previous | to last Is the example with two opening braces correct?
« This page (revision-108) was last changed on 24-Sep-2008 09:01 by