Importing LiveJournal: Data Massaging?

  • Unknown's avatar

    I’m about to import my LiveJournal entries into WP.com and I’m wondering what kind of data massaging I can/should do with the XML file (which I have exported via the program ljArchive) prior to the import. What happens to LJ specific tags like <lj user> and <lj-cut>? Should I replace them with something else? What about embedded YouTube/Google videos and Imeem widgets, what happens to them? Do ‘<img>’ tags get broken if I point to a picture in LJ’s scrapbook? Should I do anything to fix any internal links I have within my own LJ entries? I’m probably asking a lot, but I only want to import my entries once without having to retry again ;-) Thanks!

  • Unknown's avatar

    Sorry I don’t know but you can try reading the other posts on the topic to see if anything helps:

    https://en.forums.wordpress.com/search.php?q=livejournal

  • Unknown's avatar

    The one thing I would suggest is to delete any spam comments that may be in your livejournal blog spam filter it will make the file smaller.

  • Unknown's avatar

    Your embedded media will need to be replaced; we have custom code here at WP.com, and nothing but that custom code will work. Some of your media may not be in allowable format, either.

  • Unknown's avatar

    Thanks, everyone. I think rather than importing everything at once, I’ll do it in pieces (i.e. one month at a time) just to get a feel for how things actually work. I imported a test entry just now, and found out that the <lj user> tag gets obliterated on import. So if I have Hello <lj user="drchuck"> in my XML, all I get in WP.com is Hello.

  • Unknown's avatar

    You should be able to fake lj-cuts by opening up your XML file in a text editor and replacing the lj-cut code with <!--more--> (As on LJ, you can also customise the link text). Only one per entry, mind.

    Find/replace is also a good way of dealing with lj user links. With a bit of inline styling you can get them looking exactly as they did on LJ, should you so choose.

  • Unknown's avatar

    there’s a wiki page on converting LJ2WP:
    http://wiki.noljads.com/LJ_to_WordPress

    surely, it’s primarily geared towards standalone self-hosted WP script, so all workarounds there get done through numerous WP plugins.

    and since you have got your blog on the wp.com, which is essentially the very restrictive multi-blog version of WP script mentioned above (i.e. no plugins, uneditable templates and so on), you’ve certainly got to search&replace all LJ-specific markup/embeds/all what’s banned here/ with WP-specific one.

  • Unknown's avatar
  • Unknown's avatar

    I’ve been doing a find/replace for all the lj user links, and that has worked out. I’ve also found that by replacing embedded YouTube videos with the [youtube] code used on WP.com, the videos get imported right over. Unsupported media types (like imeem) will have to suffice with a link. References to scrapbook pictures remain intact (they don’t show up broken), but since I may not be using LJ’s scrapbook in the near future I’ve uploaded the images to WP.com.

    I had some deleted comments that had a date of 0001-01-01 (obviously an invalid date). I found that if I changed the date to 2001-01-01, I could easily delete them within WP.com.

  • Unknown's avatar

    @drchuck
    Thanks for keeping us updated on what you are finding and doing. It will make it much easier for people switching over in the future.

  • Unknown's avatar

    @options
    Thanks for the links, yes they are mostly applicable to self-hosted WP

    @thesacredpath
    I’d like to think of it as my way of contributing to open source software ;-)

    Just to update on my progress: I’ve had to search/replace lj user and div class=”ljuser” because I had alternated between their HTML and Rich Text editor (I had used their Rich Text editor for several months). I’ve replaced my lj-cut with <!--more--> and what few lj-raw tags were present were ignored, as they should have been, as they indicate HTML code inside the tags. I had one LJ poll, no way is that coming over.

    In recent months, LJ has been using the lj embed tag, so that any type of embeddable object can be used in a post. I have things like <lj-embed id="1"> which could be a YouTube video, Google video, imeem object, or anything else embeddable. This requires a trip back to LJ to see what the heck I embedded in the first place. I have 17 of them, and I’ve done 5 so far.

  • Unknown's avatar

    hello drchuck,

    thanks for keeping us updated, good work, keep it up.

    just for the record, in case you might be interested:

    while the LJ used to be a community-driven and user-run service based on truly open source project (actually it still is as its sources available now as well),

    wordpress.com is no way an open source software and it never was such. it’s a closed source commercial project run by for profit Automattic Inc., hence wp.com userbase don’t have any means to contribute in the project or influence on its developement/whatever in any way.

  • Unknown's avatar

    <off-topic> didn’t Six Apart close a fair bit of the livejournal code or at any rate drag their heels about keeping the OS stuff up to date? And selected portions of wordpress.com code does find its way back into MU and .org, though it’s true that users can’t contribute to .com. I’d say Six Apart and Automattic were pretty much morally equivalent; releasing enough to keep the OS enthusiasts quiet but not so much that they’re uncompetitive.

  • Unknown's avatar

    I’ve finally reached the end of my LJ archive, thanks to everyone for their input. I’m glad I went through the archive one month at a time, rather than doing the whole thing at once. It took longer, what with the searching/replacing I had to do for each month, but it was worth it.

    The internal links to my own LJ were just a little tricky. I had full URLs to some of my prior posts (I had to do it that way so they would work in my friends’ friend pages), and I had to manually figure out what the URL would be in WP.com, as the URL naming scheme here is different. i.e. http://username.livejournal.com/1234.html would have to be turned into /2005/10/13/subject-line-here/ If I was linking back to something older than one month, I could go to WP.com and see what the URL was. If it was within the same month, I had to figure out the URL from the date and subject.

  • The topic ‘Importing LiveJournal: Data Massaging?’ is closed to new replies.