[Date Prev][Date Next][Thread Prev][Thread Next][Author Index][Date Index][Thread Index]

Re: Fwd: In-Band Information Considered Harmful



>Rich Pasco wrote:
>> Why harmful?  Looking to see your article.
>
>For the purposes of Perl, because this:
>   <EM>H</EM>elpful info
>does not match if you search for "Helpful info" which makes
>it bad.

In fact the above is abuse of markup for typographical purposes. EM was
never intended to be used that way, if I'm not much mistaken. But of course
people can and thus will abuse markup that way.

>This can be a major problem for search engines etc.
>
>If you have the markup OUT of band (ie, as suggested, in a separate
>file, or at the end of the file, etc.) you don't get the problem.
>
>Yes, you -could- teach the search engines HTML, but then we have to
>keep updating them when HTML changes, or for a different sort of
>markup, etc.  If it's out of band, it's EASY.

Indeed, the Macintosh word-processor Nisus uses out-of-band formatting
code, so the data fork is a plain ASCII file. (The formatting is stored in
the resource fork, a special feature of MacOS files. SimpleText which comes
with the MacOS does the same, indeed I believe Nisus will use formatting
produced by SimpleText correctly.)

Another advantage to out-of-band markup is that it would simplify
MIME-messages. I never understood why the MIME standard seems to recommend
sending two versions as multipart/alternative (often seen when people post
to Usenet with HTML enabled.) It would be much better with a plain-text
version, and a formatting section or layer that can be applied optionally
by readers.

It is true that it would require a reader that is capable of mapping the
markup onto the text, but the alternative is that the reader has to do the
job mentally while reading unparsed HTML/CTML (Currently Trendy Markup
Language).

Also, I believe making a reader application that uses out-of-band markup is
at least as easy, if not indeed a lot easier, than writing a CTML parser.
Of course DTD-based CTMLs could in principle be parsed just by having the
DTD, but then we'd not have all the discrepancies between browsers like we
have today, in practice.

I don't know if the MIME type multipart/mixed would cover this, or if a
multipart/layered is advisable. In any case it would solve many problems,
including one which has annoyed me a lot while doing generated webpages.

My problem: I want to generate a page containing (GIF) images, which are
themselves generated from a database lookup. The information from the
database lookup is used both for the HTML of the page and the generation of
the images.

Alas, I cannot generate the HTML and the GIFs in the same process, unless I
cache the images temporarily in a web-accessible location. I cannot embed
the images directly in the HTML, although they are logically a part of the
page.

My currently preferred technique (because it is the simplest) is to use the
same CGI several times:
host/cgi-bin/page -> HTML
host/cgi-bin/page?imageX -> GIF

where the HTML refers the same script. However this will perform the same
database lookup twice or more.

The ideal solution would be to generate a multipart message with HTML and
GIF data all at once, but as far as I know, this is not possible.

Again, a multiple-band solution would solve this with far better integrity.

Of course, cascading style sheets are the currently trendy way to retrofit
out-of-band style information, so it seems things are moving in that
direction, at least a little.

-Lasse