RSS widget XML-to-HTML conversion mishandles CDATA

  • Unknown's avatar

    I believe the XML-to-HTML conversion for the RSS widget, for displaying a feed on blog page, does not work correctly regarding CDATA.

    RSS/XML specifications allow for the use of ‘CDATA’ (character data) encoding of the <description></description> field of the RSS Item Elements. One uses ‘CDATA’ to span data that is to be treated strictly as character data without carrying meaning, i.e., without being parsed or otherwise interpreted. Instead, such character data is to be passed through without alteration. If this were being done, than the description so encapsulated in ‘CDATA’ after conversion to html would be exactly as contained in the CDATA such that the html tags would “work”, i.e., carry their html meaning. However, looking at the source code on my blog page, it’s clear that WordPress is violating the meaning of CDATA by converting special characters to their corresponding entities, thus causing the special characters to be displayed rather than have their specified effect as tags. 

    For example, here’s what an excerpt of my RSS feed code looks like after WordPress RSS widget converts the feed to html:

     <a href="http://www.runawaygetaway.com/destination/North-Lake-Tahoe.html">Reserve your vacation condo now</a>

    Notice all the entities. And here’s what it should look like if WP treated the description as CDATA:

    <a href="http://www.runawaygetaway.com/destination/North-Lake-Tahoe.html">Reserve your vacation condo now</a>

    You can see that the < character was replaced by its entity… < likewise for >. But since my item description was encapsulated as CDATA, those special characters should have been left as-is.

    The consequences are that the where tags are encountered, the html code is displayed as plain text because the code produced by the XML-to-HTML converter substitutes in the entities. Browsers expect a font tag, for example, to be <font>, not <font>

  • Unknown's avatar

    What is the URL of your WordPress.com blog? Starting with http, please.

  • Unknown's avatar

    Looking at my post, I see that it’s not possible to show entities in a post to illustrate the problem. In the first <a> tag snippet, the < and > were types in as their entities when I wrote the post. Not so in the second <a> code snippet. Likewise, there are two occurrences in the last two paragraphs where I types in the entities but they appeared as < and >.

    So I guess this is gonna be a hard topic to discuss. Bottom line is, the widget XML-to-HTML convertor is changing special characters that is specified as CDATA to entities, and should not be doing so.

  • Unknown's avatar

    URL of the blog is http://roadhogblog.wordpress.com/

    See problem in right hand column.

  • Unknown's avatar

    Not particularly knowledgeable about the makeup of RSS codes myself, but because of security issues here on WordPress.com, images that appear in RSS feeds are stripped out. Why the code should be appearing instead of being stripped out I can’t answer. Hopefully someone will be along soon who can.

  • Unknown's avatar

    OK-the quickest way to clean up the content is to open the RSS widget in your Dashboard’s Design>Widget options. Uncheck ” Display item content?” and save your changes. All the messy code in the widget should go away.

    Maybe not the answer you were hoping for, but that’s the best I can come up with.

  • The topic ‘RSS widget XML-to-HTML conversion mishandles CDATA’ is closed to new replies.