I'm getting this:
Unknown option --htmlbody Usage: /home/www/.../web/cgi-bin/creole [options] Filter Creole stdin and renders it to another format. --body naked body without header and footer --creole Creole output --help this help message --html HTML output (default) --latex LaTeX output --rtf RTF output --test test input (stdin ignored) --text plain text output
-- Radomir Dopieralski, 2007-Mar-06
Oops, sorry. I'd forgotten to upload one of the files I'd modified. That should be fixed now.
Thanks for your feedback!
-- YvesPiguet, 2007-Mar-06
Tilde doesn't escape pipes in tables. Also, putting the tilde before closing "=" characters of a title only escapes one of them. Escaping the pipe in a link disables the whole link (the [[ and ]] and url are still consumed though).
-- Radomir Dopieralski, 2007-Mar-06
- Tilde-pipe in tables: |abc~|def produces <table><tr><td>abc|def</td></tr></table> as it should. Do you have a counter-example?
- Tilde-closing "=" in titles: the tilde escapes one character, not the whole markup. Remaining "=" are consumed as the end-title markup (the parser doesn't care if the number isn't correct)
- Tilde-pipe in link: it's what I wanted, even if it isn't what I endorsed or documented. I'm not sure it's wise either. In my parser, all Creole markup is ignored in links, including tilde. I have to check if pipes are valid in URLs. Considering that links aren't always URLs, it's probably better to recognize tildes as escape characters also there.
Thanks,
-- YvesPiguet, 2007-Mar-06
I'm sorry about the pipe in tables -- indeed, I cannot replicate this now. I must have left a space between the tilde and the space.
Great work!
By the way, what parsing technique do you use? How many passes? Do you create a document tree or generate the output immediately?
-- Radomir Dopieralski, 2007-Mar-06
Thanks!
It's a parser written in C which performs one pass and generates outputs immediately. Here is a sketch of its main loop:
set state to "between par"
while not finished
{
  read next token (single char, or markup taking context into account)
  switch state
  {
    case ...
      switch token
      {
        case char
          if start and/or end of element, write corresponding fragment
          write char, encoding it if necessary
          change state if necessary
        case some markup token
          if start and/or end of element, write corresponding fragment
          change state if necessary
        ...
      }
    ...
  }
}
write end of element corresponding to current state, if any
Styles are pushed in a stack and popped in such a way to always produce matching pairs in output. I've chosen C to be able to embed it easily into different projects, some of them running on platforms with very tight resources, such as small embedded systems or PDA.
As you must have guessed with the error message above, for tests, I've compiled it as a stand-alone command-line app and I run it from a simple CGI script, written in
- sigh- sh.
-- YvesPiguet, 2007-Mar-06
Hmm... Maybe I should try to roll my own state machine too? The build-in regexp parser is faster in Python, though, even when I do three pasess -- at least on such short input as wiki pages.
-- Radomir Dopieralski, 2007-Mar-06
Do you plan to keep the source closed or would you publish your code? Looking at the code of my Regexp based parser I think it could be better to use a state machine. In the beginning I planned to use one, but I must admit that I failed. My code got a bit complicated and finally I decided just to do it with RegExp. But regular expressions have limitations, so a state machine would definitely be better.
I also have an idea right now: Assuming that the state machine solves all our parsing problems (your implementation seems to be one of the best Creole parsing implementations), and the code is easy understandable: Why not implement it for all Wiki engines? The state machine could be documented in a language independent format (e,g UML). Your C implementation would be the working example implementation. Then it could be reimplemented in Perl Code, Python Code, Java Code and so on. The Creole markup would not only have its grammar, but also its documented way of parsing it. So instead of wasting time as every implementor struggles with its own implementation, everyone could work on the same parser. The more I think about it, the more I like this.
Of course you don't have to publish your code, if you don't want to. But even in this case we should focus on building the one Creole parser that works, is documented and can be implemented for all Wiki engines with reasoable effort. I'm not sure whether this approach works as good as I currently "dream" about it, but I had this idea right now and wanted to publish it.
-- Steffen Schramm, 2007-Mar-16
I'm flattered by this request, and open. However, I'm not certain what I want to do with it will suit all participants. What I list as requirements on YvesPiguet would be difficult to negotiate for me. If that doesn't match Creole evolution, I'll end with a non-Creole parser. This is a freedom I want to keep.
My implementation is 2800 lines of ISO C (C90), very easy to compile on any platform; it doesn't rely on any library. It's documented with Doxygen comments. It's still a work in progress. I won't be able to spend much time on a long-term commitment.
I'd be curious to have more opinions.
-- YvesPiguet, 2007-Mar-20
There are several options:
- Your code could be used as a reference parser and it could be improved by anyone, and also be ported to other languages
- Your code could also just be used as an example. It could help others (e.g. me), as I'd like to see how it works.
- If you do not opensource your code, I still would propose to keep the idea of writing one working parser in such a way that it can be easily adopted for all Wiki engines.
What I am currently interested: Your code is able to convert the Creole markup into several other languages like HTML or LaTeX. I currently assume that your parser reads in the Creole markup independently of the required output format, and what is actually written out can be easily changed. Or did you write separate parsers for each of the output languages?
It could also be that it would not be that useful to adopt your parser for others, for example because their Creole parser is integrated into their engines existing markup parser. But for JSPWiki the Creole markup is just converted by a separate page filter to normal JSPWiki markup and then rendered by the default JSPWiki parser. Not the best way, but with a flexible output format it could also be changed to output HTML directly instead of JSPWiki markup.
Some questions:
- Do you think your code could be easily adopted for other languages (perl, python, java, php, ...)?
- Would it make sense to do this?
- Is it easy to change the parser when the input markup changes?
- Is it easy to customize the output?
-- SteffenSchramm, 2007-Mar-20
I've added a link to Doxygen documentation of Nyctergatis engine interface to YvesPiguet. This will make it more difficult to retract now :-) If I opensource the engine, it should be under the BSD license.
Answers to your questions:
- Java and Python, most probably. PHP, probably easy, maybe not very efficient because the engine doesn't rely on any library, so it goes down to low-level stuff such as folding CRLF sequences. Concerning Perl, based on my ancient experience, it would be possible, but I wouldn't like to do it myself... I think Perl is more suited to tasks where some of its built-in capabilities, such as regexp, can be exploited; that would require more work.
- My approach would be to compile the engine either as a standalone command-line application (as it's done now) and to call it from these languages, or maybe to compile it as an extension for these languages. Maintaining separate implementations in parallel would require much more work.
- Yes, I think so. Things which require lookahead over more than a few characters would be more difficult. The engine performs one pass, writing directly its output without intermediate storage of the document (just the state which includes nested lists and nested styles).
- Yes, very much. Output is completely factored out from parsing. It relies mainly on strings, with a few functions for character encoding, link encoding (URL), interwiki, etc, all optional.
I've added JSPWiki to the output formats supported by my sandbox...
-- YvesPiguet, 2007-Mar-20
Ok, my engine is now opensource, under the new BSD license. The
doxygen documentation covers the whole source code, including the command-line application. I'll add a downloadable
archive very soon. Feedback welcome, of course.
covers the whole source code, including the command-line application. I'll add a downloadable
archive very soon. Feedback welcome, of course.
-- YvesPiguet, 2007-Mar-20
I've renamed the library "Nyctergatis Markup Engine" (NME), made its source code available in a downloadable archive and rewritten the pages @nyctergatis.com. I hope I haven't broken anything.
-- YvesPiguet, 2007-Mar-20
Just wanted to mention that I haven't forgotten NME, but plan to test it as soon as I have time.
-- SteffenSchramm, 2007-Mar-28
No problem! I'll continue improving it, so you'd better download it right before taking a look.
-- YvesPiguet, 2007-Mar-28

