Using Telegra.ph as external editor for articles
In this article I will show an interesting use case about using telegraph as external wysiwyg editor. Actually you can use this approach as a grabber from any resource or for parsing/converting html to different structure.
So, let’s begin. We will be using Python 3 and just standard library without any third-party modules and extensions. The initial purpose is to grab desired html structure for the article and save images to serve them from our server.
First we need to download the page, urllib is on a duty
from urllib import request
After that we need to omit tags that will not be used, update their attributes or replace with another one. The tool for parsing the stucture is html.parser
There are three main function each behaves as a callback when openning tag is encountered, closed tag is currently in the feed or data within a tag that is processing now. We will use stack to restore correct structure of the document and transorm it on the fly as needed. Flag appending shows whether we add data to the resulting document of skip it.
On image tag we need to download it first from remote url and function below will help us with that
And the full code of
def handle_starttag(self, tag, attrs):
Another helper method that we use is
wrap_in_tag. It ensures that data will be properly enclosed within a tag
Function for handling closing tag should be symmetrical to one the handles oppening like this
def handle_endtag(self, tag):
Also this code does simple validation of balancing tags and shows errors if any.
Finally we are handling the data enclosed and append it to an intermediate buffer
def handle_data(self, data):
The same result can be achieved with a help of regular expressions but that would be much complex and error prone. For example we can look up for a title to the article using such a helper method
def find_tag(tag_name, html_data):
We have built a grabber + parser for articles to fetch and format them in a way we want evaluating only tools from standard library. You might extend this example adding different providers and that can be a tool for populating your own blog with aggregated articles from different resources. If you want to rely on more user-friendly libraries see links below.