(anonymous guest) (logged out)

Copyright (C) by the contributors. Some rights reserved, license BY-SA.

Sponsored by the Wiki Symposium and the Nuveon GmbH.

This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

I have been thinking more about escape. I have started leaning towards the "Escape character should be core" idea (in conjunction with "Escape character should escape a single character" and "Escape character should be ~"). I think it fits well with any future expansion of Creole. However, my real concern is what to do with preformatted block content and external links containing escape characters which are not rendered in the final output. For me, escaping whole Creole markup sequences is not a broad enough solution and imposes hidden constraints and limits future expansion. It was an excellent idea, but I don't believe it's the right way to go. Either we have all or we have nothing.

In the meantime, the simple escape mechanism proposed here for preformatted blocks, I believe, is both adequate and safe. If we don't end up with an escape character then this is the best alternative, without a doubt. Having said that, I will endeavour to work through my escape thinking over the next few weeks and input into any escape arguments.

-- MarkWharton 2007-05-10

We changed the two angle brackets to three angle brackets, to make it easier to write plugin syntax (see GenericExtensionElementProposal, BlockMarkupNotionCriticism). Placeholder syntax is generated by the wiki engine, while plugin syntax has to be written by users. I think this last small change will not influence any implementation since nobody uses placeholders so far, and on the other hand will make it easier to evolve creole additions in a way that is consistent with the goals of fast to type and readable markup.

-- ChristophSauer, 2007-Jun-01 15:54 (CEST)

The escape character should become part of the core Creole 1.0. Adding a simple and generic escaping rule now isn't that big deal. However, adding it later (or make it optional) will cause some headaches I think...

-- OliverHorn 2007-06-05

I agree. With you now we have a majority ;-) Let's add it. Let's replace Creole1.0#section-Creole1.0-EscapingPreformattedNowiki with the more generic escape character rule from Creole Additions. Radomir? Yves?

-- ChristophSauer, 2007-Jun-06 10:19 (CEST)

I still don't like having an escape character. If we must have one, however, I prefer Radomir's mechanics, as explained on Talk.Escape Character Decision.

-- AlexSchroeder

Yes, I still agree with an escape character in Creole Core which escapes the next non-alphanumeric character only, everywhere but in inline nowiki and block preformatted. Wrt Radomir's mechanics, everything is fine, except that I wouldn't convert tilde+newline to forced newline; trailing invisible blanks would have an effect very difficult to track down for most authors. GNU make has this problem, which is a real nuisance imo.

Ideally, what's between angle brackets for plugins and placeholders should be better defined (escape, quotes, etc.), but I'm affraid we won't reach easily a consensus. I'm most probably going to have block plugins similar to block preformatted where nothing is interpreted except for the left-aligned end mark, and inline plugins where the first occurence of right angle brackets marks their end (with "block" and "inline" I mean in the Creole text, not in the parser output); i.e. I'll keep my current implementation, adding basic support for placeholder's triple angle brakcets. So the escape character wouldn't have an effect there either.

-- YvesPiguet, 2007-Jun-06

1) (view wiki source to see the correct markup :-) ) Inconsistency for "Nowiki (Preformatted)". The text says "As a block, the three curly braces should be on one line by itself to open and another line of three curly braces should be on a line by itself to close. In a block, characters are displayed in monospace. For inline nowiki text, wiki implementers can decide whether to display this text regularly or in monospace."

The example says: "Some examples of markup are: ** <i>this</i> ** "

Is it ok for inline nowiki to be formatted like this?

2) Really no definition lists? ;-)

-- MaxVoelkel, 2007-Jun-06

1) Why would it not be ok for inline nowiki to be formatted like that?

2) Definition lists can be added to Creole Additions.

-- ChuckSmith, 2007-Jun-06

2) Done.

-- YvesPiguet, 2007-Jun-06

From my experience it is possible to break creole elements down to a reasonable set of features for processing on a case by case basis. The main feature is the creole line type, the remaining features are open and closed creole elements. If we treat creole elements as either line type elements, open elements, or closed elements it then becomes possible to easily determine when to escape and when not to escape.

Line type elements describe the block and, depending on their actual type, can contain regular text and/or other open or closed creole elements.

Line type elements include: heading, horizontal rule, lists (ordered and unordered), paragraph, placeholder, preformatted, and table.

With line type elements, starting characters are escaped. Escaped line type elements change the line type itself (e.g. ~== changes from heading to paragraph). The contents of the block are escaped on a case by case basis. With preformatted, nothing is escaped. However, there is a special case for one or more escape characters (i.e. tilde) followed by three curly braces on a line by themselves. In such cases, following tradition, one tilde will be dropped to effectively escape and also allow representing any possible text inside the preformatted block. With placeholder, nothing is escaped. Perhaps it should have a special case like preformatted?

Open elements contain regular text and other open or closed creole elements.

Open elements include: bold, italics, and table (cell separators).

With open elements, all regular text is escaped. A tilde followed by a non alpha numeric character which is not a tab or a space (e.g. [^\t 0-9A-Za-z]) will drop the tilde and remove any special meaning from the following character.

Closed elements contain regular text, optional modifiers generally followed by regular text, and the necessary closing characters or end of line or file.

Closed elements include: links (regular and free standing external links), image, nowiki, and placeholder.

With closed elements, nothing is escaped.

When a series of closing characters exceeds the minimum requirement (e.g. ]]]), only the final characters are used to close the element. This technique allows natural nesting of special characters to achieve results which might otherwise require escaping (e.g. [[Home|[{{home.jpg|{Home!}}}]]] produces <a href="Home">[<img src="home.jpg" title="{Home!}" />]</a>). Note: It was necessary to escape the example to produce the desired effect here.

I don't believe pipe is legal in filenames and URLs (please correct me if I'm wrong!) and therefore doesn't present any issue here. Image source and link references are always specified first in their respective creole elements so pipe as an optional modifier works as expected, and does not require escaping.

This escape design provides a simple, safe and effective escaping mechanism which does not force authors to change important filenames and URLs to avoid accidentally escaping them. Of course, this idea depends a lot on the available parsing tools. It is implemented and working well in my Ragel based creole parser.

I must admit that I tend to over complicate things. If there's a better way to escape without forcing authors to change important filenames and URLs then let's do that. If I need to change my thinking around this issue then please help me with it. Actually, now that I've gone and written all this, it looks way too complicated! I'll respond later.

-- MarkWharton 2007-06-07

OK, so I have gone through the current spec and notes on escape character etc. Everything looks mostly fine with the current spec in regards to the escape character. However, I feel there are a small number of issues which still need to be clarified...

Escaping Nowiki and Preformatted

The current spec implies escaping nowiki and preformatted, however this will not work in practice, particularly with preformatted. I believe the original ideas presented in AddNoWikiEscapeProposal for preformatted are still useful. Therefore please consider the following.

Preformatted escape specification requires:

  1. a similar special case for preformatted, or
  2. to specify that preformatted cannot be escaped.
(The special case for preformatted being my preference.)

Nowiki escape specification requires:

  1. to specify greedy closure,
  2. allow full escaping as is possible elsewhere in a wiki document,
  3. allow minimum escaping of the nowiki close character and the escape character itself, or
  4. to specify that nowiki cannot be escaped.
(I have no particular preference, it just needs to be clear. Maybe 1 or 3, if I had to choose.)

Escaping Placeholders and Plugins etc.

Similar issues. Should the same rules for nowiki and preformatted apply?

Apologies for being such a late comer to the escape discussions. Obviously, I should have followed the original escape discussions more closely! It would be great to get some feedback here. ;-) A simple discussion could resolve these issues. I would change the spec directly but I feel it could lead to discontent.

-- MarkWharton 2007-06-09

I have difficulties in understanding exactly where this is leading to. I don't know what is the more generic escape character rule from Creole Additions mentionned above by Christoph. I thought the one-escaped-char rule would finally be adopted (I think Christoph or Chuck said so recently but I can't find it now), but it isn't in 1.0, so I don't know what will be retained in the final spec.

Basically I share all the concerns of Mark; I just didn't know I had to have such concerns. What I'd propose:

  1. tildes escape one nonalphanumeric nonblank character outside nowiki, block preformatted, plugins and placeholders and are rendered as tildes everywhere else
  2. greedy rule for nowiki is preserved
  3. end marker of block preformatted must be aligned to the left; so the "space escape" still works
  4. "block plugins" with lines containing only double angle brackets make possible to embed >> in the body of the plugin.

Except for this last rule, I believed this was what Christoph and Chuck had accepted. I know I shouldn't propose the last rule here, just use it quietly in my implementation, because the time for discussion is over.

-- YvesPiguet, 2007-Jun-9

I'm confused. Can somebody rewrite the above and add some examples?

-- AlexSchroeder

Alex, on the poll page you are proposing that the escape character should have no effect inside URLs. I'm against such a proposal because a) it complicates things, and b) it is not not needed.

Nearly all occurences of tildes in URLs are at the beginning of path segments. But then they are usually followed by an alphanumeric character and are rendered as-is with current escape rule (which escapes only when followed by a non-alphanumeric non-whitespace character).

-- OliverHorn, 2007-06-10

Well, we have the choice between two complications: Only use the tilde as escape character when followed by a non-alpanumeric and non-whitespace character, or not use the tilde as an escape character inside URLs. I think that the second option is easier to implement and easier to understand.

-- AlexSchroeder

I personally think it should be obvious to a user that he/she should not not have to escape a tilde without a URL, because most likely such addresses will just be cut and pasted anyway.

-- ChuckSmith, 2007-06-11

I tried to write a grammar parser for the Creole 0.6 specification. Thus the specification is desired for regular expression based translation to HTML, I have some issues you might not consider and I hope it’s could be interesting for you. I mean if the 1.0 specification should be valid for two years, you could/should consider that some people want to use a scanner/parser created from a grammar. Why not, it offers much more potential for the future. Further you can offer a grammar instead of a prose specification. Wiki engine developer could easily use different scanner/parser generator for different target programming languages.
"The bold/italic text will end at the end of paragraphs, list items and table cells": This implies that closing bold/italic markup is optional in a grammar and this implies that the unacceptable **//bolditalic** cannot throw an error. I think a user who can handle an escape character that only escapes in front of an non-alphanumerical and non-whitespace character can close the markup or at least can see on the rendered page that something is wrong.
The leading spaces before list items are user-friendly, but imo not necessary, because you get an indentation by the number of asterisks/pounds. On the other hand, why isn’t it allowed before a heading? (This conflicts with usability.) Using a greater look-ahead for skipping the leading spaces decreases the performance of the parser and complicates the definition of a grammar.
Paragraphs: "A list, table or preformatted block end paragraphs too." I think it is mentioned that each list, table, preformatted block is its own paragraph, isn’t it? Btw, it would be much easier to parse if every time a blank line separates the paragraphs.
Independent from any implementation, an escape character should be context-free. I would expect this behavior and I think it's easier for a user to remember that there is an escape character and what it effects than giving a specification where it works and where not.
Annotations for publishing a clearer specification:
Lists: "Bold, italics, links, nowiki can be used in list items". Nowiki could be nowiki inline or nowiki block. Later this is called preformatted. It becomes more confusing as the monospaced occurs (in Tables). It would be easier to use a fixed nomenclature. The example "* This is a single list item
followed by a paragraph?" does not really fit to forced linebreaks. It should be in the Lists section to clarify the end of a list.

-- Martin Junghans, 2007-06-11

About closing bold and italics, I had implicitly put in the examples that if there is an opening double slash without a closing double slash, then it would be rendered as just a double slash without italic markup. However, since then, many developers have coded the functionality that an opening double slash would just start italics and it will automatically close at the end of the paragraph. Is this worth changing? I hate to change things now so close to when we planned 1.0 to go live, but then again, it will be frozen for 2 years. I am now going back to the spec to fix the monospace, preformatted, nowiki inconsistency.

-- Chuck Smith, 2007-Jun-12

Add new attachment

Only authorized users are allowed to upload new attachments.

« This particular version was published on 12-Jun-2007 10:44 by ChuckSmith.