Import of .com posts breaking due to XML escape characters

  • Unknown's avatar

    I am trying to migrate a client’s site from WordPress.com to a self-hosted installation. I used the Export function in WordPress to export the posts to an XML file.

    When I import the posts into my self-hosted installation, WordPress is trying to escape any single and double quotes from the XML file by adding a character, which is causing issues. You can see a screenshot of what I’m experiencing at https://imgur.com/a/dEntZ1H.

    The WordPress.org support forum tried to help (https://wordpress.org/support/topic/import-of-com-posts-breaking-due-to-xml-escape-characters/), but also suggested I check here.

    The blog I need help with is: (visible only to logged in users)

  • Hi there! We use Latin-1 by default. The issue seems possibly related to importing into a UTF-8 database instead as the characters being escaped look like Unicode equivalents of single and double quotes.

    You could try opening the XML file in your favorite text editor and running search/replaces for all of these characters. They may have even exported as Unicode instead. For example, should likely be ” and should likely be '.

    Hope this helps!

  • Unknown's avatar

    Thank you for the suggestion. I’ve been using a sample of the XML file, and I did not find any Unicode characters in that file. However, I did find some Unicode characters in the full (71,000+ line) XML file.

    With the sample file (200 lines), I can spot-check any characters that might cause an issue. I’ve also been using Gremlins tracker for Visual Studio Code to double-check my work.

    So, we’re making progress. But I’m still experiencing the same issue.

  • Unknown's avatar

    @mgozdis, the exported XML file starts off with declaring it as UTF-8. That seems contrary to what you’re saying about Latin-1.

  • Hi there! The encoding of the file vs the charset/collation used in the database are two separate things. The file will always be UTF-8, but the data in the database will generally be Latin-1, which causes the issues you are experiencing. You would need to properly convert and search/replace any “special characters” in the XML file you are having issues with.

    When I paste your sample XML in a text editor and try viewing it in a browser, there is an error on line 41, column 110 as shown here: https://d.pr/i/D7b4iq

  • The topic ‘Import of .com posts breaking due to XML escape characters’ is closed to new replies.