Tuesday, September 28, 2010

More abour UDSN

Paricular Cases for UDSN

Idioms and Expressions: will be treated as group of words with only one semantic id:
56342 could mean: |listen up|_1 or simply |listen up| (idiom "listen up", meaning number 1).
Note that the use of | delimiter (or any other agreed-upon delimiter) is required for idioms and expressions in UDSN.

Quotations and Escapes: could be treated as idioms and expressions: delimited likewise, but with a meaning id of 0 (zero): |"Some quoted text that is impossible to translate to USDN or escaped for a while. Maybe because the original author is dead or other reasons - like laziness."|_0.
Note that the id of 0 is also required as it signifies that the meaning is undefined at the text's TRP.

More on TRP:

  • Any TRP should provide the meaning definitions written in UDSN.
  • Any multiple-language TRP version should have unique semantic id for the same meaning in all languages. e.g.: word_1 in English has id of say 9387, same as mot_1 in French.
  • Any multiple-language TRP version should have same ordinal meanings for every word in all languages. e.g.: word_x in English has the same meaning as mot_x in French.

Advantages of UDSN:

  • Any word, sentence, paragraph, or text should become uniquely-defined as semantic value.That is to say: without ambiguities.
  • An official/public text will increase its clarity and transparency.
  • Less conflict due to misunderstandings. Fewer jobs for lawyers ;)
  • The text will be ready to be consumed by artificial intelligence or automated processors.
  • The text will have an uniquely-defined translation into other languages. Ready for correct automated translation.
Disadvantages of UDSN:
  • Time-consuming composition and/or notation.
  • Errors may appear in the process of notation.

Sunday, September 26, 2010

Proposed Uniquely-Defined Semantic Notation (UDSN)

The law was written long,long ago.

When we consult the dictionary for a word we read and don't understand, we often times are presented with a list of choices for "disambiguation". Then we have to use heuristics to choose the definition. This method does not guarantee that we are looking at the definition the author intended...
In the following I will propose a simple semantic notation that prepares a text for automatic consumption or easy referencing.
Let us take the present-day example of reference for the word "law/Law": we can get it from http://encyclopedia.thefreedictionary.com/Law+%28disambiguation%29
and we are presented with about 40 valid choices (and counting). A heuristic search for the most fit definition is certainly prone to error...
The obvious solution is to write the text using not words but rather semantic values. First we should try to make such notation human readable, optional and terse.
Let us consider a trusted reference provider (TRP): say site http://encyclopedia.thefreedictionary.com. The TRP has to maintain reference to the previous versions of their site and direct link to the semantic values that should resemble URLs of the following format:
http://encyclopedia.thefreedictionary.com/version_4623/semantic_id_887432
so we have access to the semantic value (SV) of a word at the time the word was used (if version 4623 was used by text author)
In this case, a uniquely-defined semantical text sentence would look like:

TRP: http://encyclopedia.thefreedictionary.com, version 4623
3467 876545 23453 4476778 84577 2867, 2867 253534.

That is to say: a string of semantic ids with syntactic signs (or sugar).
Note that the TRP heading is necessary for shortening the full semantic coordinates ([876545, 4623, TRP] for the second SV in the sentence) and implies that the whole sentence was authored with the same TRP. We will maintain the same supposition for texts of any length.

Next step is to make the text human-readable. Since the content and the order of meanings of every word does not change at the TRP inside the same version, we should be able to find an equivalent for every semantic id of the form:
semantic id = (word, word meaning ordinal) e.g.
876545 = law, 2 - that is to say: the word "law", second meaning. In our case, considering TRP version 4623 to be the actual version of http://encyclopedia.thefreedictionary.com, would point us to http://encyclopedia.thefreedictionary.com/Religious+law.
To translate our text to this human-readable form:
3467 876545 23453 4476778 84577 2867, 2867 253534. =
The_1 law_2 was_1 written_1 sometime_2 long_4, long_4 ago_1.
For the first meaning (ordinal 1), we can drop the additional notation thus:
The law_2 was written sometime_2 long_4, long_4 ago.
And instead of underscore-number, we may adopt another style, perhaps even a stylesheet class in CSS if the text is in XML/HTML. So:
The law2 was written sometime2 long4, long4 ago.
Much easier to read. Furthermore, one should be able to define the style of the defined class as invisible, that will bring us to the rendered form:
The law was written sometime long , long ago.

Therefore, an uniquely-defined semantic notation text of:

TRP: http://encyclopedia.thefreedictionary.com, version 4623
3467 876545 23453 4476778 84577 2867, 2867 253534.
may become equivalent to:
TRP: http://encyclopedia.thefreedictionary.com, version 4623
The law2 was written sometime2 long4, long4 ago.
with the option of not seeing the notation, but only placeholders (the placeholders may in turn become reference links for obscure meanings):

The law was written sometime long , long ago.