Keystonejs Migrate from WordPress Prototype
Since I put some effort into looking at possible solutions for blogging using Node/MongoDB as underlying platform I am documenting a test I did with Keystonejs. It's website describes it as a Node.js CMS & Web Application Platform.
Need Node.js 0.10+ and MongoDB v2.4+ and high level steps on Ubuntu 17.04 as follow:
1. Setup MongoDB from package management.
2. Install a current version of node in /usr/local. Not using the version from the Ubuntu repository.
3. Install keystonejs generator(using npm) and run generator(using yo).
4. Export WordPress xml.
5. Convert WordPress xml to MongoDB import format(json). Used python lxml.
6. Run mongo import of json
STEP 1: Setup MongoDB from package management.
# apt-get install -y mongodb-org # systemctl start mongod
STEP 2: Install a current version of node in /usr/local. Not using the version from the Ubuntu repository.
Grabbed from here: https://nodejs.org/dist/v7.9.0/node-v7.9.0-linux-x64.tar.xz
$ sudo mv node-v7.9.0-linux-x64/ /usr/local/ $ export PATH=$PATH:/usr/local/node-v7.9.0-linux-x64/bin $ npm version { npm: '4.2.0', <snip> $ node -v v7.9.0
STEP 3: Install keystonejs generator(using npm) and run generator(using yo).
$ npm install -g generator-keystone $ npm install -g yo $ cd test-project/ $ yo keystone Your KeystoneJS project is ready to go! <snip> $ cd my-site/ $ node keystone Applying update 0.0.1-admins... Successfully created: * 1 User KeystoneJS Started: My Site is ready on http://0.0.0.0:3000 ------------------------------------------------
STEP 4. Export WordPress xml.
STEP 5: Convert WordPress xml to MongoDB import format(json). Used python lxml and input file riaan039ssysadminblog.wordpress.2017-04-25.xml.
$ python xmlconvert6.py > 6.json
STEP 6: Run mongo import of json
$ mongoimport --db my-site --collection posts --drop --file 6.json
As mentioned this is a prototype and not really an option for me at this point. The xmlconvert.py script will need a lot of work to make the posts convert cleaner and keep formatting. Also this is not a full featured blog application like WordPress but rather just a framework to build on. So for me to replace my blogs I am missing things like code highlighting, permalinks etc
Example code for WordPress xml to MongoDB import json format.
import xml.etree.ElementTree as ET from datetime import datetime #https://docs.python.org/2/library/xml.etree.elementtree.html #### Get to this kind of format for mongo import #### { "slug" : "post-2", "title" : "Post 2", "categories" : [ ], "state" : "published", "__v" : 0, "content" : { "brief" : "<p>Blah 2</p>", "extended" : "" }, "publishedDate" : ISODate("2017-04-25T05:00:00Z") } it = ET.iterparse('riaan039ssysadminblog.wordpress.2017-04-25.xml') for _, el in it: #if "{http://wordpress.org/export/1.2/}" in el.tag: if "{" in el.tag: el.tag = el.tag.split('}', 1)[1] # strip all namespaces tree = it.root channel = tree.find("channel") i=0 for item in channel.findall('item'): i = i + 1 slug = 'slug' + str(i) title = item.find('title').text if not title: title = 'no title' title = title.replace('"','') #<category domain="category" nicename="kvm"><![CDATA[KVM]]></category> #<category domain="category" nicename="lvm"><![CDATA[LVM]]></category> ## TBD: Convert categories just leaving empty for now categories=[] state=item.find('status').text if state == "publish": state = "published" v=0 ## "content" : { "brief" : "<p>Test1 Brief</p>", "extended" : "<p>Test1 Extended</p>" } content=item.find('encoded').text if not content: content=" " content=content.encode('utf-8') content=content.replace('\n', '<br>') content=content.replace('\t', ' ') #content=content.replace('<', ' ') #content=content.replace('>', ' ') #content=content.replace('</', '<//') content=content.replace('"', ' ') content=content.replace("\\", "/") # This is a junk cheat for now. Need a better way for C:\Program Files\... content_brief=content[:30] content_extended=content ## Convert Tue, 06 Nov 2012 08:49:21 +0000 to look like this: ISODate("2017-04-25T05:00:00Z") d2 = item.find('pubDate').text.split(' +0000') d1 = d2[0].strip() #######pubDate = item.get('pubDate') #######publishedDate="2017-04-25T05:00:00Z" #d = datetime.strptime(d1, '%a, %d %b %Y %H:%M:%S') #publishedDate = d.strftime('%Y-%m-%dT%H:%M:%SZ') # pubDate form WordPress xml had some odd dates that could not be used with strptime so using post_date #<wp:post_date><![CDATA[2012-11-06 20:28:06]]></wp:post_date> d1 = item.find('post_date').text d = datetime.strptime(d1, '%Y-%m-%d %H:%M:%S') publishedDate = d.strftime('%Y-%m-%dT%H:%M:%SZ') print ' {{ "slug": "{}", "title": "{}", "categories": {}, "state": "{}", "__v": {}, "content": {{ "brief":"<p>{}</p>", "extended":"<p>{}</p>" }}, "publishedDate": ISODate("{}") }}'.format(slug, title, categories, state, v, content_brief, content_extended, publishedDate)