Monday, May 10, 2010

Truth and web-content markup

My preference has been to work with smart languages and smart markup.  My early bias was for Prolog and I am still hopeful of using Rebol + Curl or Oz + Curl or ObjectIcon + Curl.

But the challenge of markup is marginalia - not so much notes as my preferred vertical lines, double vertical lines and lines with a horizonal "proof" bar.  Not to mention the long curly brace pointing to a question mark.

If you have tried to convert PDF to text you will know that - especially in the case of foreign languages - Adobe is no match for any vertical line running in the margin close to the text.

But the more basic issue is markup versus "plain text".  I argue that if you look at text by a poet or philosopher, there is no "plain text".  There are sections of text. One text section may relate to another through mere allusion. Arguments are not confined by paragraph indentation. Suppressed premises are critical by there very absence.

I have not wanted to retreat to text-as-array where the text becomes enumerated "lines" or enumerated sentences. Let me offer a sentences example:
This is what Schopenhauer wrote. And it is not so.
This is two sentences? Consider this variant:
This is what Schopenhauer wrote.

But it is not so.  This paragraph ...
Now I have traversed both lines and paragraphs.  Matters are worse for sentences across editions and translations.

I was in a grad phil course of a Quebecois philosopher - a polyglot - who was convinced that all philosophy texts reduce to a hierarchy of numbered propositions and our task as readers was to reproduce these propositions as a sequence of hierarchical statements.  A mono-mania if ever I encountered such.

Even if Carnap's Aufbau reduced to such propositions, a poem does not.

Two stanzas of a poem do not map as do two equations.  And even two equations sharing identical bracketed portions relate in a manner very different from two sequential equations sharing variables or constants.

A chapter is not a proof and not likely even a single self-contained argument (compare Arendt's use of section numbers in The Human Condition to Wollheim's use of section numbers in Art and its Objects.)
I continue to look at variants of reST (reStructuredText) as an alternative to hierarchical markup.

One clue may lie in marginalia itself: to embrace this form of outer-markup.  This is non-trivial.

Consider a vertical marginal line traversing sentences A,B,C and D. Suppose the intended lines were B and C.  But the end of A is "covered" as is the beginning of D. Then there is line thickness. Pen versus pencil. Erasures.

In many ways this would be more of a challenge that leaving behind the one-character-per-byte in moving from ASCII text to UNICODE (see my posts elsewhere on the challenge faced by swi-prolog, Rebol and UNICON.)

No comments:

Post a Comment