Keystonejs Migrate from WordPress Prototype

April 26, 2017 admin

Since I put some effort into looking at possible solutions for blogging using Node/MongoDB as underlying platform I am documenting a test I did with Keystonejs. It's website describes it as a Node.js CMS & Web Application Platform.

Need Node.js 0.10+ and MongoDB v2.4+ and high level steps on Ubuntu 17.04 as follow:
1. Setup MongoDB from package management.
2. Install a current version of node in /usr/local. Not using the version from the Ubuntu repository.
3. Install keystonejs generator(using npm) and run generator(using yo).
4. Export WordPress xml.
5. Convert WordPress xml to MongoDB import format(json). Used python lxml.
6. Run mongo import of json

STEP 1: Setup MongoDB from package management.

# apt-get install -y mongodb-org
# systemctl start mongod

STEP 2: Install a current version of node in /usr/local. Not using the version from the Ubuntu repository.
Grabbed from here: https://nodejs.org/dist/v7.9.0/node-v7.9.0-linux-x64.tar.xz

$ sudo mv node-v7.9.0-linux-x64/ /usr/local/
$ export PATH=$PATH:/usr/local/node-v7.9.0-linux-x64/bin

$ npm version
{ npm: '4.2.0',
<snip>

$ node -v
v7.9.0

STEP 3: Install keystonejs generator(using npm) and run generator(using yo).

$ npm install -g generator-keystone
$ npm install -g yo

$ cd test-project/
$ yo keystone
Your KeystoneJS project is ready to go!
<snip>

$ cd my-site/
$ node keystone
Applying update 0.0.1-admins...
Successfully created:
*   1 User
KeystoneJS Started:
My Site is ready on http://0.0.0.0:3000
------------------------------------------------

STEP 4. Export WordPress xml.

STEP 5: Convert WordPress xml to MongoDB import format(json). Used python lxml and input file riaan039ssysadminblog.wordpress.2017-04-25.xml.

$ python xmlconvert6.py > 6.json

STEP 6: Run mongo import of json

$ mongoimport --db my-site --collection posts  --drop --file 6.json

As mentioned this is a prototype and not really an option for me at this point. The xmlconvert.py script will need a lot of work to make the posts convert cleaner and keep formatting. Also this is not a full featured blog application like WordPress but rather just a framework to build on. So for me to replace my blogs I am missing things like code highlighting, permalinks etc

Example code for WordPress xml to MongoDB import json format.

import xml.etree.ElementTree as ET
from datetime import datetime
#https://docs.python.org/2/library/xml.etree.elementtree.html

#### Get to this kind of format for mongo import
#### { "slug" : "post-2", "title" : "Post 2", "categories" : [ ], "state" : "published", "__v" : 0, "content" : { "brief" : "<p>Blah 2</p>", "extended" : "" }, "publishedDate" : ISODate("2017-04-25T05:00:00Z") }

it = ET.iterparse('riaan039ssysadminblog.wordpress.2017-04-25.xml')
for _, el in it:
  #if "{http://wordpress.org/export/1.2/}" in el.tag:
  if "{" in el.tag:
    el.tag = el.tag.split('}', 1)[1]  # strip all namespaces

tree = it.root

channel = tree.find("channel")

i=0

for item in channel.findall('item'):
  i = i + 1
  slug = 'slug' + str(i)
  title = item.find('title').text
  if not title:
    title = 'no title'
  title = title.replace('"','')
  #<category domain="category" nicename="kvm"><![CDATA[KVM]]></category>
  #<category domain="category" nicename="lvm"><![CDATA[LVM]]></category>
  ## TBD: Convert categories just leaving empty for now
  categories=[]

  state=item.find('status').text
  if state == "publish":
    state = "published"
  v=0

  ## "content" : { "brief" : "<p>Test1 Brief</p>", "extended" : "<p>Test1 Extended</p>" }
  content=item.find('encoded').text
  if not content:
    content=" "
  content=content.encode('utf-8')
  content=content.replace('\n', '<br>')
  content=content.replace('\t', ' ')
  #content=content.replace('<', ' ')
  #content=content.replace('>', ' ')
  #content=content.replace('</', '<//')
  content=content.replace('"', ' ')
  content=content.replace("\\", "/")   # This is a junk cheat for now.  Need a better way for C:\Program Files\...

  content_brief=content[:30]
  content_extended=content

  ## Convert Tue, 06 Nov 2012 08:49:21 +0000 to look like this: ISODate("2017-04-25T05:00:00Z")
  d2 = item.find('pubDate').text.split(' +0000')
  d1 = d2[0].strip()
  #######pubDate = item.get('pubDate')
  #######publishedDate="2017-04-25T05:00:00Z"
  #d = datetime.strptime(d1, '%a, %d %b %Y %H:%M:%S')
  #publishedDate = d.strftime('%Y-%m-%dT%H:%M:%SZ')

  # pubDate form WordPress xml had some odd dates that could not be used with strptime so using post_date
  #<wp:post_date><![CDATA[2012-11-06 20:28:06]]></wp:post_date>
  d1 = item.find('post_date').text 
  d = datetime.strptime(d1, '%Y-%m-%d %H:%M:%S')
  publishedDate = d.strftime('%Y-%m-%dT%H:%M:%SZ')

  print ' {{ "slug": "{}", "title": "{}", "categories": {}, "state": "{}", "__v": {}, "content": {{ "brief":"<p>{}</p>", "extended":"<p>{}</p>" }}, "publishedDate": ISODate("{}") }}'.format(slug, title, categories, state, v, content_brief, content_extended, publishedDate)