(anonymous guest) (logged out)

Copyright (C) by the contributors. Some rights reserved, license BY-SA.

Sponsored by the Wiki Symposium and the Nuveon GmbH.

 
This is version . It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]

There was a consensus at the wikisim 06 to use dashes for unnumbered lists because otherwise it would be confusing with the markup for bold (x). As far as i can remember all but Chuck and Me voted for dashes. We voted against it because the asterisk is simply used by almost all wiki engines for this. Eugene pointed out that he used dashes before there was the markup and as far as I remember I did that in simple text files as well. One has just to look in old readme.txt files. You often find dashes for unnumbered list there. Since I think x are the best markup for bold, I would go for dashes as well now.

Dashes interfere with the syntax for Lines, but i think lines are not that important anyway. If we leave out lines in creole or find an other syntax for it i would not mind.

Option one with *

== Option 1

This is a **bold** text

* element 1
* element 2
* ** this is a bold text element 3**
** this is a second level element

Option two with dash -

== Option 2

This is a **bold** text

- element 1
- element 2
- ** this is a bold text element 3**
-- this is a second level element
--Christoph 28-Aug-06

There is a somewhat weird behavior of some web browsers (so far tested it only on the Gecko-based ones) that convert first-level bullets to some characters when copied as text. So far I've seen '*' and '#' used, and I'm not sure what it depends on. I know it might not be very important.

As for conflicts with bold and hr, I think the hr conflict can be resolved easier:

** is this text bold
** or are these just two second-level list items
---- This is a fourth-level list item
----
This is a horizontal rule. It must be alone on a line.
Also note that 4th-level list items are going to be rather rare...

One problem with dashes for bullets is for lists of numbers:

- 10
- 15
- -32

Are those number positive or negative? This can be confusing. I know that YAML also uses dashes for lists, maybe we could learn from their experiences?

-- RadomirDopieralski 2006-08-30

I like the idea of dashes as a separate character from asterisks for bold. As you note this allows for bold within a list easily. Hey, I'd like the idea of different characters for different actions where possible right across Creole :-) MarkGaved 31-Aug-06

I wonder what is the rationale behind forbiding white space before the bullets. I can guess it's about ambiguity, but I'd like to have it written. I can tell that I often indent lists -- just by reflex -- and also many text editors do that automatically.

-- RadomirDopieralski, 2006-09-01

The next revision of the recommendation should be explicit about multiline list items and empty lines between list items. -- AlexSchroeder

Should there be a limit to the nesting depth of the lists? It would greatly simplify writing regular-expression-based parsers. And there is a limit to what can fit on a page anyways. -- RadomirDopieralski, 2006-09-04

i like the idea of limiting it to 4 levels, someone noted at the workshop (I think it was Janne) that more than this is considered bad style in writing anyway.

--Christoph 4-Sep-06


If we choose to support multi-level lists, these examples could be useful:

Good:

- One
- Two
- Three
- One
-- One One
-- One Two
--- One Two One
-Two
- One
## One One
## One Two
--- One Two One
- Two
Some paragraph text
- One
Some text
- One
- Two

- One

Bad:

-- Zero One
- One
- One
# One
- One
--- One Zero Two
 - One
   -- One Two
- One
-- One One
Some text
-- Zero One

-- Zero One

-- Anon

Christoph convinced me that allowing leading whitespace is a good thing. When I read other blog entries, however, I see that you reached a totally different conclusion:

whitespace before not allowed
* bullet list
# number list
See http://www.blogschmog.net/blog/?p=420. What's going on?

-- AlexSchroeder

Ok, one more issue encountered when implementing mixing of the list types. How should they be mixed?

- One
## Two
--- Three
- One
-# Two
-#- Three
- One
## Two
#-- Three

Right now I require the first one -- all bullets on a line the same. I can also allow the last one -- allow any of -/# and only look at the last one to determine the list type. Enforcing the second example would require some more code.

-- RadomirDopieralski, 2006-09-06

There was a consensus at the wikisim 06 to use dashes for unnumbered lists because otherwise it would be confusing with the markup for bold (x). As far as i can remember all but Chuck and Me voted for dashes. We voted against it because the asterisk is simply used by almost all wiki engines for this. Eugene pointed out that he used dashes before there was the markup and as far as I remember I did that in simple text files as well. One has just to look in old readme.txt files. You often find dashes for unnumbered list there. Since I think x are the best markup for bold, I would go for dashes as well now.

Dashes interfere with the syntax for Lines, but i think lines are not that important anyway. If we leave out lines in creole or find an other syntax for it i would not mind.

Option one with *

== Option 1

This is a **bold** text

* element 1
* element 2
* ** this is a bold text element 3**
** this is a second level element

Option two with dash -

== Option 2

This is a **bold** text

- element 1
- element 2
- ** this is a bold text element 3**
-- this is a second level element
--Christoph 28-Aug-06

There is a somewhat weird behavior of some web browsers (so far tested it only on the Gecko-based ones) that convert first-level bullets to some characters when copied as text. So far I've seen '*' and '#' used, and I'm not sure what it depends on. I know it might not be very important.

As for conflicts with bold and hr, I think the hr conflict can be resolved easier:

** is this text bold
** or are these just two second-level list items
---- This is a fourth-level list item
----
This is a horizontal rule. It must be alone on a line.
Also note that 4th-level list items are going to be rather rare...

One problem with dashes for bullets is for lists of numbers:

- 10
- 15
- -32

Are those number positive or negative? This can be confusing. I know that YAML also uses dashes for lists, maybe we could learn from their experiences?

-- RadomirDopieralski 2006-08-30

I like the idea of dashes as a separate character from asterisks for bold. As you note this allows for bold within a list easily. Hey, I'd like the idea of different characters for different actions where possible right across Creole :-) MarkGaved 31-Aug-06

I wonder what is the rationale behind forbiding white space before the bullets. I can guess it's about ambiguity, but I'd like to have it written. I can tell that I often indent lists -- just by reflex -- and also many text editors do that automatically.

-- RadomirDopieralski, 2006-09-01

The next revision of the recommendation should be explicit about multiline list items and empty lines between list items. -- AlexSchroeder

Should there be a limit to the nesting depth of the lists? It would greatly simplify writing regular-expression-based parsers. And there is a limit to what can fit on a page anyways. -- RadomirDopieralski, 2006-09-04

i like the idea of limiting it to 4 levels, someone noted at the workshop (I think it was Janne) that more than this is considered bad style in writing anyway.

--Christoph 4-Sep-06


If we choose to support multi-level lists, these examples could be useful:

Good:

- One
- Two
- Three
- One
-- One One
-- One Two
--- One Two One
-Two
- One
## One One
## One Two
--- One Two One
- Two
Some paragraph text
- One
Some text
- One
- Two

- One

Bad:

-- Zero One
- One
- One
# One
- One
--- One Zero Two
 - One
   -- One Two
- One
-- One One
Some text
-- Zero One

-- Zero One

-- Anon

Christoph convinced me that allowing leading whitespace is a good thing. When I read other blog entries, however, I see that you reached a totally different conclusion:

whitespace before not allowed
* bullet list
# number list
See http://www.blogschmog.net/blog/?p=420. What's going on?

-- AlexSchroeder

Ok, one more issue encountered when implementing mixing of the list types. How should they be mixed?

- One
## Two
--- Three
- One
-# Two
-#- Three
- One
## Two
#-- Three

Right now I require the first one -- all bullets on a line the same. I can also allow the last one -- allow any of -/# and only look at the last one to determine the list type. Enforcing the second example would require some more code. -- RadomirDopieralski, 2006-09-06

I like to use numbered lists for step-by-step procedures. In Dokuwiki I found a shortcoming of the numbered list implementation: I cannot enter anything other than inline text into such step-by-step procedures. I would love to insert a box with some command line example or some screen output. Whenever I try this my numbering starts again.

There might be implications for the XML representation if we introduced some optional "numbering starts here". In Framemaker you use a special paragraph format "first step" to set numbering to 1. The normal "step" pragraph format simply increments the step counter.

In HTML I would need something like

<ol>
<li>...
<li>... <some specific element>
<li>...
</ol>

-- Alexander von Obert 13.12.06 17:45:11

Hello Alexander, I moved your post to the bottom of the page to preserve the chronological order.

The use case you mention seems to be pretty popular, unfortunately very hard to do right in a language that doesn't allow arbitrary nesting. And languages with such nesting are usually pretty tricky to learn and use.

I think that adding this amount of complexity to Creole might be unwise.

As of reseting numbering, (X)HTML has it really, really awkward -- you'd need to fall back to the parser keeping track of the numbers -- at this point you can as well just do it manually.

I don't want to go against your habits, but I believe there is a number of alternate ways of putting this kind of list down on a page -- the most obvious seems to be using headings that are, after all, designed to separate the text into sections, not just merely list some points. There are wiki engines that will even allow you number the headings automatically.

In addition, meaningful titles intead of numbers make more sense in dynamic medium like wiki -- especially if you want to make sure that the item you refer to didn't move in the mean time...

-- RadomirDopieralski, 2006-12-13

For Creole 0.4 I'd like to bring out the issue of spaces after the bullets. The current (0.3) draft and previous specs have this ugly special case:

About unordered lists and bold: a line starting with ** (including optional whitespace before and afterwards), immediately following an unordered list element a line above, will be treated as a nested unordered list element. Otherwise it will be treated as the beginning of bold text. Also note that bold and/or italics cannot span lines in  a list.

I think it's ugly and complicates the parser needlessly. Also, many wikis already have very similar list markup, just without this special case -- making them accept both Creole and native markup at the same time would require some sort of a hack (I can't even imagine it curently).

One possible way of getting rid of that special case and still keeping list markup unambigous with bold markup is requiring a space after the bullet.

Now, this is a different case than with space before the bullet. There are wiki engines that don't allow space before the bullet, and those that require it -- making it optional is really the only way to make them agree.

On the other hand, no wiki engine I know prohibits the space after the bullet. Some require it.

Moreover, putting a space after most punctuation characters is a tradition, and for many people -- a reflex. I can see nothing unnatural in requiring it -- and it simplifies the parsers and the specs -- making Creole both easier to implement and to teach.

By the way, there is a (pretty ugly) hack to get a bold line even if the above special case is removed (remove the single space):

 {{{}} }**bold line**
}}}
-- [RadomirDopieralski], 2006-12-14

Why not accept both (asterisks and dashes)? And it goes with the unofficial [Goals] {{{Rule of least surprise}}} and some others...

-- [EricChartre], 2006-12-28

Regarding the possible ambiguity of the asterisks, there are none (for the parser anyway) if the specs do not allow for bold text to span multiple lines and that bold text must end at some point with __. Also, I __don't__ think that a user would ever, on purpose, do something like:__

{{{
** is this text bold
** or are these just two second-level list items
}}}

meaning 

{{{
<em> is this text bold<br />
</em> or are these just two second-level list items
}}}


However, the parser must do a look-ahead or a two-level parsing...

-- [EricChartre], 2006-12-28

I don't think there is any ambiguity, in the example given above. I believe the asterix signify strong, as it seems illogical to start a sub-list directly.

And the following would be considered list items.
{{{
* List
** SubItem 1
** SubItem 2
}}}

-- [JaredWilliams], 2006-12-30

Yes, the problem is rather with these examples:
{{{
**foo**bar**baz
**one**two
}}}

They could be parsed as:
----
__foo__bar__baz__
__one__two
----
or
----
__ foo__bar__baz
__ one__two__
----
or
----
__ foo__bar__baz
__one__two
----
You can't really decide without infinite (unbound) lookahead -- and that's a great problem if you need to use a ready parsing algorithm or parser framework -- this rules out most of the extensible, plugin-based wiki engines.__

You can't just make list or bold the default here -- because there are popular use cases for both:

__Paragraph titles__ are often integrated in the paragraph, like in this example. They are tradidtionally distinguished by making them bold. Italics won't do.

* multilevel lists
** can contain __bold__ fragments

Really, I think that requiring a space after the list bullets is a simple and effective solution. And it also removes the conflict with {{{#pragma}}} and {{{# numbered list}}} for many wiki engines.

-- RadomirDopieralski, 2006-12-30

I have my parser doing this

{{{
**foo**bar**baz
**one**two
}}}
is
{{{<div><p>
<strong>foo</strong>bar<strong>baz</strong>one<strong>two</strong>
</p></div>}}}
But
{{{
*list
**foo**bar**baz
**one**two
}}}
is
{{{<div><ul><li>list<ul>
   <li>foo<strong>bar</strong>baz</li>
   <li>one<strong>two</strong></li>
</ul></li></ul></div>}}}

Which I think covers it.

-- [JaredWilliams], 2006-12-30

How does it looks in the regular expressions? Something like:
{{{
(?=\n\s*\*+\s*.*)\n\s*\*+\s*(.*)
}}}
as an additional rule for the lists? Or did you just write your own algorithm and remember the state between the lines?

-- RadomirDopieralski, 2006-12-30

I don't use regular expressions. 

But here is the algorithm in PHP in anycase, called when the parse has seen {{{\n[*-#]}}}, with $i holding the position of the {{{[*-#]}}}.

{{{
/*
 * $text is the creole text
 * $i is the current position in $text
 * $l is the strlen($text)
 * $doc is the DOM Document
 * $node is the current position in the DOM Document
 * $listMap = array('-' => 'ul', '*' => 'ul', '#' => 'ol');
 */

// Traverse up the DOM tree, from our current position, looking for open lists.
$lists = array();
for($n = $node; $n; $n = $n->parentNode)
	if ($n->nodeName == 'ol' || $n->nodeName == 'ul')
		array_unshift($lists, $n);

// See how many lists we can match... from the $text 
$j = 0;
while (isset($text[$i + $j], $lists[$j], $listMap[$text[$i + $j]])
		&& $listMap[$text[$i + $j]] == $lists[$j]->nodeName)
	++$j;

// See how many list markers left...
$k = strspn($text, '-#*', $i + $j);
switch ($k)
{
	case 1:
		// Going a level deeper..
		if (isset($lists[$j - 1]))
			$node = $lists[$j - 1]->lastChild;
		else if ($j == 0 && $node->nodeName == 'li')
			$node = $node->parentNode;

                // Create UL or UL...
		$node = $this->insertElement($node, $listMap[$text[$i + $j]]);

		$node = $node->appendChild($doc->createElement('li'));
		$i += $j + $k;
		break;

	case 0:
		// List item of the most recent open list.
		$node = $this->insertElement($lists[$j - 1], 'li');
		$i += $j;
		break;

	default:
		// Horizontal line...
		if (strspn($text, '-', $i) >= 4)
		{
			$this->insertElement($node, 'hr');
			$i += $j + $k;
		}
		break;
}
}}}

So __foo__bar__baz doesn't get recognised as a list, as $k = 2, and gets left alone for the inline parser to interpret as <strong>. But *list\n__foo__bar__baz, $k = 1, for both lines.

-- [JaredWilliams], 2006-12-30

----

As I've mentioned in [Raph's 0.4 recommendations], I'm in favor of using trailing whitespace to disambiguate second level list bullets from bold. It's simple and easy to understand. I am not in favor of "magic" algorithms to resolve the ambiguity. I think that non-local algorithms are especially undesirable for bullet lists, because they're often rearranged by cutting and pasting. Requiring trailing whitespace is also NotNew.

From what I can tell in the above tangled discussion, it's also Radomir's favored solution. It seems to me we should be able to reach consensus on this issue fairly easily. Am I off base?

-- [RaphLevien], 2007-01-07

Add new attachment

Only authorized users are allowed to upload new attachments.

« This particular version was published on 07-Jan-2007 23:22 by RaphLevien.