Automating the Migration of Your Ghost Publication to Medium with Node.js and Python

Automating the Migration of Your Ghost Publication to Medium with Node.js and Python

Are you looking to switch your blogging platform from Ghost Publication to Medium? If so, you probably want to import all of your existing articles over. But doing this manually can be quite a pain, especially if you have a large number of posts.

In this tutorial, we present a step-by-step guide to automate this process using Node.js and Python. We'll use a Node.js script to automate a browser session for the import of posts into Medium and a Python script to parse and categorize posts from the Ghost Publication's sitemap. This setup also includes running Chrome in WebSocket mode.

Prerequisites

Before you begin, make sure you have the following:

  • Node.js installed.
  • Python 3 installed.
  • Puppeteer (a Node.js library to control Chrome).
  • Beautiful Soup (a Python library for web scraping).

To run Chrome in WebSocket mode, execute the following command:

google-chrome-stable --remote-debugging-port=9222 --headless

This starts an instance of Google Chrome that Puppeteer can connect to through a WebSocket connection.

The Node.js Script: Index.cjs

const { spawn } = require("child_process");
const puppeteer = require('puppeteer');

const PYTHON_SCRIPT_PATH = "./sitemap_reader.py";
const BROWSER_URL = 'http://localhost:9222';
const MEDIUM_IMPORT_URL = 'https://medium.com/p/import';

The script begins by importing the necessary dependencies and defining some constants. We will use the child_process module to spawn a Python process. The puppeteer library is used to control a Chrome browser. The PYTHON_SCRIPT_PATH is the path to our Python script, BROWSER_URL is the URL of the Chrome browser running in WebSocket mode, and MEDIUM_IMPORT_URL is the URL for importing posts to Medium.

The script also contains helper functions convertToCamelCase and sanitizeString, which are used to format the categories obtained from the Python script appropriately.

The processPythonScript function spawns a Python process that runs the sitemap_reader.py script. It sends a list of sitemap URLs to the Python script and then waits for it to return the list of blog post URLs and their categories.

The importToMedium function uses Puppeteer to automate the import process on the Medium website. It opens the import URL, enters a post URL, waits for Medium to finish importing the post, sets the category (tags), and finally publishes the post.

Finally, the main function brings everything together. It processes the Python script and imports each post returned by the script to Medium.

The Python Script: sitemap_reader.py

import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
import requests
import sys
import json

The Python script starts by importing necessary libraries: Beautiful Soup for web scraping, requests for sending HTTP requests, and json and sys for handling input and output.

  • The script defines three functions: extract_urls_from_sitemap_url, get_post_category_by_visiting_url, and categorize_urls_based_on_text_in_it.
  • extract_urls_from_sitemap_url takes a sitemap URL as input and returns a list of all post URLs in the sitemap.
  • get_post_category_by_visiting_url takes a post URL as input, visits the URL, and returns the post's category.
  • categorize_urls_based_on_text_in_it uses the above two functions to generate a list of posts (in the form of dictionaries with 'url' and 'category' keys) from a given sitemap URL. This list is then printed out.

Using The Scripts

To use these scripts, you'll need to clone the repository at the following URL: https://github.com/AskSnehasish/Ghost-To-Medium-Import.git

Once cloned, navigate into the directory containing the scripts. Start Chrome in WebSocket mode, and then execute the following command to run the Node.js script:

node index.cjs

This will trigger the process, the Node.js script will spawn the Python script, process the sitemap, and return categorized URLs. Then, Node.js will take this information to automate the process of importing each article into Medium.

Please note, both scripts rely on certain specific HTML elements and classes being present in the sitemap and on the Medium import page. If these web pages' structures change in the future, the scripts may need to be updated accordingly.

With these scripts, the process of moving your Ghost Publication blog posts to Medium becomes much easier. Make sure to check each imported post for correct importation and formatting. Happy blogging!

Important Note: Always respect Medium's rules when using automation tools. Overuse or misuse of these tools could violate Medium's terms of service and may lead to the suspension of your account. Use automation responsibly.