(anonymous guest) (logged out)

Copyright (C) by the contributors. Some rights reserved, license BY-SA.

Sponsored by the Wiki Symposium and the Nuveon GmbH.

 
This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

&I have been thinking more about escape. I have started leaning towards the ;Escape character should be core" idea (in conjunction with "Escape character should escape a single character" and "Escape character should be ~"). I think it fits well with any future expansion of Creole. However, my real concern is what to do with preformatted block content and external links containing escape characters which are not rendered in the final output. For me, escaping whole Creole markup sequences is not a broad enough solution and imposes hidden constraints and limits future expansion. It was an excellent idea, but I don't believe it's the right way to go. Either we have all or we have nothing.

In the meantime, the simple escape mechanism proposed here for preformatted blocks, I believe, is both adequate and safe. If we don't end up with an escape character then this is the best alternative, without a doubt. Having said that, I will endeavour to work through my escape thinking over the next few weeks and input into any escape arguments.

-- MarkWharton 2007-05-10

We changed the two angle brackets to three angle brackets, to make it easier to write plugin syntax (see GenericExtensionElementProposal, BlockMarkupNotionCriticism). Placeholder syntax is generated by the wiki engine, while plugin syntax has to be written by users. I think this last small change will not influence any implementation since nobody uses placeholders so far, and on the other hand will make it easier to evolve creole additions in a way that is consistent with the goals of fast to type and readable markup.

-- ChristophSauer, 2007-Jun-01 15:54 (CEST)

The escape character should become part of the core Creole 1.0. Adding a simple and generic escaping rule now isn't that big deal. However, adding it later (or make it optional) will cause some headaches I think...

-- OliverHorn 2007-06-05

I agree. With you now we have a majority ;-) Let's add it. Let's replace Creole1.0#section-Creole1.0-EscapingPreformattedNowiki with the more generic escape character rule from Creole Additions. Radomir? Yves?

-- ChristophSauer, 2007-Jun-06 10:19 (CEST)

I still don't like having an escape character. If we must have one, however, I prefer Radomir's mechanics, as explained on Talk.Escape Character Decision.

-- AlexSchroeder

Yes, I still agree with an escape character in Creole Core which escapes the next non-alphanumeric character only, everywhere but in inline nowiki and block preformatted. Wrt Radomir's mechanics, everything is fine, except that I wouldn't convert tilde+newline to forced newline; trailing invisible blanks would have an effect very difficult to track down for most authors. GNU make has this problem, which is a real nuisance imo.

Ideally, what's between angle brackets for plugins and placeholders should be better defined (escape, quotes, etc.), but I'm affraid we won't reach easily a consensus. I'm most probably going to have block plugins similar to block preformatted where nothing is interpreted except for the left-aligned end mark, and inline plugins where the first occurence of right angle brackets marks their end (with "block" and "inline" I mean in the Creole text, not in the parser output); i.e. I'll keep my current implementation, adding basic support for placeholder's triple angle brakcets. So the escape character wouldn't have an effect there either.

-- YvesPiguet, 2007-Jun-06

1) (view wiki source to see the correct markup :-) ) Inconsistency for "Nowiki (Preformatted)". The text says "As a block, the three curly braces should be on one line by itself to open and another line of three curly braces should be on a line by itself to close. In a block, characters are displayed in monospace. For inline nowiki text, wiki implementers can decide whether to display this text regularly or in monospace."

The example says: "Some examples of markup are: ** <i>this</i> ** "

Is it ok for inline nowiki to be formatted like this?

2) Really no definition lists? ;-)

-- MaxVoelkel, 2007-Jun-06

1) Why would it not be ok for inline nowiki to be formatted like that?

2) Definition lists can be added to Creole Additions.

-- ChuckSmith, 2007-Jun-06

2) Done.

-- YvesPiguet, 2007-Jun-06

From my experience it is possible to break creole elements down to a reasonable set of features for processing on a case by case basis. The main feature is the creole line type, the remaining features are open and closed creole elements. If we treat creole elements as either line type elements, open elements, or closed elements it then becomes possible to easily determine when to escape and when not to escape.

Line type elements describe the block and, depending on their actual type, can contain regular text and/or other open or closed creole elements.

Line type elements include: heading, horizontal rule, lists (ordered and unordered), paragraph, placeholder, preformatted, and table.

With line type elements, starting characters are escaped. Escaped line type elements change the line type itself (e.g. ~== changes from heading to paragraph). The contents of the block are escaped on a case by case basis. With preformatted, nothing is escaped. However, there is a special case for one or more escape characters (i.e. tilde) followed by three curly braces on a line by themselves. In such cases, following tradition, one tilde will be dropped to effectively escape and also allow representing any possible text inside the preformatted block. With placeholder, nothing is escaped. Perhaps it should have a special case like preformatted?

Open elements contain regular text and other open or closed creole elements.

Open elements include: bold, italics, and table (cell separators).

With open elements, all regular text is escaped. A tilde followed by a non alpha numeric character which is not a tab or a space (e.g. [^\t 0-9A-Za-z]) will drop the tilde and remove any special meaning from the following character.

Closed elements contain regular text, optional modifiers generally followed by regular text, and the necessary closing characters or end of line or file.

Closed elements include: links (regular and free standing external links), image, nowiki, and placeholder.

With closed elements, nothing is escaped.

When a series of closing characters exceeds the minimum requirement (e.g. ]]]), only the final characters are used to close the element. This technique allows natural nesting of special characters to achieve results which might otherwise require escaping (e.g. [[Home|[{{home.jpg|{Home!}}}]]] produces <a href="Home">[<img src="home.jpg" title="{Home!}" />]</a>). Note: It was necessary to escape the example to produce the desired effect here.

I don't believe pipe is legal in filenames and URLs (please correct me if I'm wrong!) and therefore doesn't present any issue here. Image source and link references are always specified first in their respective creole elements so pipe as an optional modifier works as expected, and does not require escaping.

This escape design provides a simple, safe and effective escaping mechanism which does not force authors to change important filenames and URLs to avoid accidentally escaping them. Of course, this idea depends a lot on the available parsing tools. It is implemented and working well in my Ragel based creole parser.

I must admit that I tend to over complicate things. If there's a better way to escape without forcing authors to change important filenames and URLs then let's do that. If I need to change my thinking around this issue then please help me with it. Actually, now that I've gone and written all this, it looks way too complicated! I'll respond later.

-- MarkWharton 2007-06-07

OK, so I have gone through the current spec and notes on escape character etc. Everything looks mostly fine with the current spec in regards to the escape character. However, I feel there are a small number of issues which still need to be clarified...

Escaping Nowiki and Preformatted

The current spec implies escaping nowiki and preformatted, however this will not work in practice, particularly with preformatted. I believe the original ideas presented in AddNoWikiEscapeProposal for preformatted are still useful. Therefore please consider the following.

Preformatted escape specification requires:

  1. a similar special case for preformatted, or
  2. to specify that preformatted cannot be escaped.
(The special case for preformatted being my preference.)

Nowiki escape specification requires:

  1. to specify greedy closure,
  2. allow full escaping as is possible elsewhere in a wiki document,
  3. allow minimum escaping of the nowiki close character and the escape character itself, or
  4. to specify that nowiki cannot be escaped.
(I have no particular preference, it just needs to be clear. Maybe 1 or 3, if I had to choose.)

Escaping Placeholders and Plugins etc.

Similar issues. Should the same rules for nowiki and preformatted apply?

Apologies for being such a late comer to the escape discussions. Obviously, I should have followed the original escape discussions more closely! It would be great to get some feedback here. ;-) A simple discussion could resolve these issues. I would change the spec directly but I feel it could lead to discontent.

-- MarkWharton 2007-06-09

I have difficulties in understanding exactly where this is leading to. I don't know what is the more generic escape character rule from Creole Additions mentionned above by Christoph. I thought the one-escaped-char rule would finally be adopted (I think Christoph or Chuck said so recently but I can't find it now), but it isn't in 1.0, so I don't know what will be retained in the final spec.

Basically I share all the concerns of Mark; I just didn't know I had to have such concerns. What I'd propose:

  1. tildes escape one nonalphanumeric nonblank character outside nowiki, block preformatted, plugins and placeholders and are rendered as tildes everywhere else
  2. greedy rule for nowiki is preserved
  3. end marker of block preformatted must be aligned to the left; so the "space escape" still works
  4. "block plugins" with lines containing only double angle brackets make possible to embed >> in the body of the plugin.

Except for this last rule, I believed this was what Christoph and Chuck had accepted. I know I shouldn't propose the last rule here, just use it quietly in my implementation, because the time for discussion is over.

-- YvesPiguet, 2007-Jun-9

I'm confused. Can somebody rewrite the above and add some examples?

-- AlexSchroeder

Alex, on the poll page you are proposing that the escape character should have no effect inside URLs. I'm against such a proposal because a) it complicates things, and b) it is not not needed.

Nearly all occurences of tildes in URLs are at the beginning of path segments. But then they are usually followed by an alphanumeric character and are rendered as-is with current escape rule (which escapes only when followed by a non-alphanumeric non-whitespace character).

-- OliverHorn, 2007-06-10

Well, we have the choice between two complications: Only use the tilde as escape character when followed by a non-alpanumeric and non-whitespace character, or not use the tilde as an escape character inside URLs. I think that the second option is easier to implement and easier to understand.

-- AlexSchroeder

I personally think it should be obvious to a user that he/she should not not have to escape a tilde without a URL, because most likely such addresses will just be cut and pasted anyway.

-- ChuckSmith, 2007-06-11

I tried to write a grammar parser for the Creole 0.6 specification. Thus the specification is desired for regular expression based translation to HTML, I have some issues you might not consider and I hope it’s could be interesting for you. I mean if the 1.0 specification should be valid for two years, you could/should consider that some people want to use a scanner/parser created from a grammar. Why not, it offers much more potential for the future. Further you can offer a grammar instead of a prose specification. Wiki engine developer could easily use different scanner/parser generator for different target programming languages.
"The bold/italic text will end at the end of paragraphs, list items and table cells": This implies that closing bold/italic markup is optional in a grammar and this implies that the unacceptable **//bolditalic** cannot throw an error. I think a user who can handle an escape character that only escapes in front of an non-alphanumerical and non-whitespace character can close the markup or at least can see on the rendered page that something is wrong.
The leading spaces before list items are user-friendly, but imo not necessary, because you get an indentation by the number of asterisks/pounds. On the other hand, why isn’t it allowed before a heading? (This conflicts with usability.) Using a greater look-ahead for skipping the leading spaces decreases the performance of the parser and complicates the definition of a grammar.
Paragraphs: "A list, table or preformatted block end paragraphs too." I think it is mentioned that each list, table, preformatted block is its own paragraph, isn’t it? Btw, it would be much easier to parse if every time a blank line separates the paragraphs.
Independent from any implementation, an escape character should be context-free. I would expect this behavior and I think it's easier for a user to remember that there is an escape character and what it effects than giving a specification where it works and where not.
Annotations for publishing a clearer specification:
Lists: "Bold, italics, links, nowiki can be used in list items". Nowiki could be nowiki inline or nowiki block. Later this is called preformatted. It becomes more confusing as the monospaced occurs (in Tables). It would be easier to use a fixed nomenclature. The example "* This is a single list item
followed by a paragraph?" does not really fit to forced linebreaks. It should be in the Lists section to clarify the end of a list.

-- Martin Junghans, 2007-06-11

About closing bold and italics, I had implicitly put in the examples that if there is an opening double slash without a closing double slash, then it would be rendered as just a double slash without italic markup. However, since then, many developers have coded the functionality that an opening double slash would just start italics and it will automatically close at the end of the paragraph. Is this worth changing? I hate to change things now so close to when we planned 1.0 to go live, but then again, it will be frozen for 2 years. I am now going back to the spec to fix the monospace, preformatted, nowiki inconsistency.

-- Chuck Smith, 2007-Jun-12

Automatic closing on paragraph end enables one-pass parsers, without the requirement to parse complete paragraphs before generating any output. That's why I think it's a good thing.

-- YvesPiguet, 2007-Jun-12

Many things remain unclear in the current Creole 1.0 draft:

( I have answered you as a subpoint under each appropriate item. --Chuck Smith, 2007-Jun-12 )

  • Some block markup must be aligned on the left (headings), some must not (list items), and some lack information (block preformatted); any justification?
    • Do you mean justification as in alignment of text or as in reasoning of why we did it that way? -CS
      • I think it would be easier to remember if there is a general rule, with maybe an exception with good, explicit explanation. For instance all block markup should be left-aligned, except list items where indenting makes multilevel lists easier to read and titles where the optional end tags are on the same line. In the current draft, for headings, it's written that "whitespace is not allowed before the left-side equal signs"; for lists and horizontal rules, that "whitespace is optional before and after the * or # characters"; for tables, no indication; and for block preformatted, I guess it's more or less implicit that spaces aren't permitted before braces. -YP
  • Escape character introduction should be rewritten ("it would be useful", "it would certainly never escape"...)
  • Mix between escape+nonalphanumeric ("certainly never escape if followed by an alphanumeric character"), escape+any single character (recommendation for camelCase), escape+markup ("only trigger if you use it in combination with a character that has special meaning in creole"), and ad hoc ("Also note that tildes within URLs should not be escaped")
    • Rewrote section on escape character to make it clearer. -CS
      • Could you add that a tilde does not escape alphanumeric characters and blanks (general rules which makes URL stuff useless) except before camelCase (useful exception for some wikis), and remove completely the part related to existing markup? Or may I do it? That's the way I interpret all the polls and discussions. -YP
  • It isn't specified if escape characters are recognized inside nowiki and preformatted (I fear they are). What about double-tilde?
    • I think we should not interpret escape characters within nowiki except for the closing triple braces. Do you agree? If so, I can change the spec to make this clearer. -CS
      • I'd prefer no exception, since then one should add double-tilde, and all fragments where a tilde appears couldn't be used verbatim. Triple closing braces in the middle of nowiki are rare enough that one can end a nowiki fragment and start a new one (like (((some))))))(((unlikely fragment))) to be rendered as some)))unlikely fragment, where I've replaced braces with parenthesis to make sure this sample will always be rendered as I want it now); tildes are much more frequent. -YP
  • If tildes aren't interpreted as escape characters in preformatted/nowiki (which I'm not the only one to wish), "removed escaping closing nowiki triple curly brackets" should be reverted
    • I think it would be best if only the closing triple curly brackets could be escaped within a nowiki and that tildes cannot escape anything else in a nowiki block. What do you think? -CS
      • Same as above; if you want to use tildes to escape braces, you must also have a way to escape tildes, and it complicates the description a lot (and the implementation, but it's less of a problem). -YP
  • "An escape character can be escaped by putting a space after it, since a space cannot be escaped": wrong, this would be rendered as tilde+space instead of a single tilde for a truly escaped tilde
    • We had agreed before that a space could escape an escape character, but perhaps this is no longer a good idea since we have now chosen the tilde instead of a backslash. Do you agree? -CS
      • You mean tilde+space would be rendered as a tilde? That's backward escaping, one more exception I'd rather avoid. -YP
  • What does "added escape character to core (as requested in Creole 1.0 Poll) with exception to URLs" mean?
    • It meant that tildes would be escape characters, but would have no effect within URLs, which was unspecified before. -CS
      • But since there are other exceptions (spaces, alphanumeric, in nowiki/preformatted), it'd be clearer not to mention it here. -YP
  • May an implementation which fixes all these issues still be named "Creole"?
    • I believe so. -CS

-- YvesPiguet, 2007-Jun-12

Yves, your recommendations sound reasonable, so I would ask that you edit the spec directly with your corrections and then we can see if anyone complains. I think if I saw your recommendations in the spec, it would help to see what you want to change. There's always the revert button, but I think you'll be conservative in your edits. Thanks! Afterwards, we'll probably drop the acceptance date back another week to give others a chance to respond.

-- ChuckSmith, 2007-June-12

Ok, thanks, I've done it. I've also rephrased optional markup interpreting in headings (slightly less negative) and removed "This is the only new markup introduced in Creole" in the placeholder section (not relevant anymore, imo).

There are still the leading spaces before the four hyphens for horizontal rules which I'm not sure about.

-- YvesPiguet, 2007-June-12

I approve your changes.

-- ChuckSmith, 2007-June-12

Chuck Smith wrote: "I had implicitly put in the examples that if there is an opening double slash without a closing double slash, then it would be rendered as just a double slash without italic markup."
Okay, I didn't find it, but is this really a good idea to have several meanings of the markup depending on the text following somewhere far behind? IMHO it's the worst issue of the specification. ;-)

-- Martin Junghans, 2007-06-12

Martin, thanks for your comments. Not sure if you know but there was an early attempt to provide a context-free grammar for Creole. It was before I joined the Creole community. For whatever reasons, the original author (I believe) has written it off as a "failed attempt". It was an interesting direction but was not pursued.

On the subject of compiler tools, I was able to create a reasonable state chart for an earlier version of Creole using the Ragel State Machine Compiler. IMO, it is entirely possible to generate parsers for the current Creole specification using grammar and/or state machine definitions with these types of tools.

I'd be opposed to changing the spec now to accomodate another attempt to provide a formal grammar for Creole.

-- MarkWharton 2007-06-13

Yves, thanks for all your hard work on the spec. It looks much better! The escape character description is logical and clear. It makes sense to keep things simple but how would you feel about adding a small note describing the greedy rule for nowiki and the "space escape" for block preformatted. I'd feel more comfortable if that was specified too.

-- MarkWharton 2007-06-13

Thanks Yves.

Generally, I'd avoid this kind of sentences, as they contrast the Extensible by omission goal:

"Whitespace is not allowed before the left-side equal signs."

I agree syntax can be simplified making it stricter, but implementations should be free to extend it.

-- Michele Tomaiuolo, 2007-06-13

Thanks for the kind words! I've added the escaping rules of Add No Wiki Escape Proposal. The sample code isn't rendered correctly yet by the current engine.

Michele, I did it to be consistent with other places where whitespace was specified (explicitly forbidden before headings, explicitly permitted before horizontal rules and lists).

-- YvesPiguet, 2007-Jun-13

1. Escaping recommendation for WikiWords: I think this scenario should be covered. But I don't like the current recommendation because it contradicts the general escaping rule (which says that the tilde followed by a letter does not escape!). Such discrepancies in behavior limit interoperability between Creole engines: A Creole engine with WikiWord support would render these words without the tildes whereas a pure Creole engine would render them with the tilde.

2. Escaping free-standing URLs: Shouldn't it be possible to escape free-standing URLs too to prevent them from becoming a link?

3. Closing braces in (inline) nowiki: I'm not really satisfied with the current escaping rule because it only works at the end of the nowiki text. It is not possible to include (escape) three or more closing braces anywhere within the inline nowiki text.

-- OliverHorn, 2007-06-13

1,2) Very good points. I will try to add them to the spec, then you can check to see that I added them as you wish.

3) Can you see when it would be useful to escape triple curly braces in inline text? It seems like it is such a rare case that if it were that important in a specific case, users would be able to do it in block nowiki.

-- ChuckSmith, 2007-Jun-14

3) The current ways to do it inline are with two consecutive inline nowikis or with the escape character outside nowiki.If you support a more general escape mechanism in nowiki, you either need to escape the escape character itself (painful since tildes are common in many programming languages and nowiki is especially well suited to code fragments) or to have more exceptions. I much prefer the current rules.

-- YvesPiguet, 2007-Jun-14

I don't like the way wikis which don't support camelcase autolinks should still recognize camelcase words following tildes. I suggest the following rules:

Outside nowiki, preformatted, and URL, the escape character only escapes the character immediately following it, provided that it is not a blank (space or line feed).

The escape character disables the automatic conversion of URL to links and any similar mechanism supported by the wiki engine (camelcase wikiwords, copyright sign, etc.)

Unix paths where the tilde usually represents the home directory would have to be escaped, but this is not a big problem since they're likely to be put in inline nowiki anyway. And we can't be expected to handle automatically all situations where some syntactic construct of a language or shell might collide with Creole.

-- YvesPiguet, 2007-Jun-14

Escaping an url is easy: http~://something

-- Michele Tomaiuolo, 2007-06-14


Yves, I modified your add to the "Escaping Nowiki". I agree that it is useful to not use tilde in nowiki to escape, but using a completely different mechanism now that we have a general escaping mechanism agreed upon feels odd. I think saying: Tilde does not escape in nowiki except with use on three closing curly braces is much simpler to understand, furthermore it is not new: JSPWiki uses it.

-- ChristophSauer, 2007-Jun-16 14:48 (CEST)

Michele, the URL escaping via http~:// may work but it is not very intuitive. I personally like the rule that a tilde in front of a link (i.e. a URL or a WikiWord) disables the link behavior.

Two address Yves' concern about the current recommendation that engines not supporting WikiWords should still recognize them: I think that engines whether or not they support WikiWords should behave equally. But maybe we could cover these cases in a general way by a slight change to the escape rule. Why not saying that a tilde at a beginning of word always escapes. If it is a link, then the tilde will additionally disables the link behavior...

-- OliverHorn, 2007-06-16 16:33

Add new attachment

Only authorized users are allowed to upload new attachments.

« This particular version was published on 16-Jun-2007 16:52 by 84.61.61.168.