07 April 2015

Combinatorics of compound keyword system for blog content classification

Knowing how to classify blog content is important for three reasons:
  1. Search Engine Optimization. The keywords that you write in the Labels of your blog post will be used by Search Engines to determine whether your article is relevant to the keywords typed by the searcher.
  2. Related Posts Gadget. The related posts widget in Blogger uses the blog post labels in order to find the most relevant blog articles that may interest the reader to read more. 
  3. Content Focus.  Knowing your blog categories keeps you focused: you can't just write about anything under the sun.
A. Blog Content Category Classification

One way to classify your blog content is to pick keywords or phrases from your article, e.g. Library Science, Blogging, Web Design.  The other way is to pick category facets that are mutually orthogonal, i.e. the Colon system.

The Colon system is a library classification system developed by Raganathan. This system has five facets:
  • Personality--the most specific or focal subject. 
  • Matter or property--the substance, properties or materials of the subject. 
  • Energy--the processes, operations and activities
  • Space--geographic location of the subject. 
  • Time--the dates or seasons of the subject.
It requires a book to discuss this classification in detail. So in this blog post, we shall try to adapt the system into a form that may be hopefully useful to bloggers.

We can restrict our content categories to five labels per blog post.  Since there are too many keywords possible, we cannot possibly write them all as our blog post labels.  So I what I propose to do is to use compound labeling system per facet.  In this system, if a facet has several possible labels--text, photo, and video--then we arrange these labels in alphabetical order separated by hyphens, e.g. photo-text-video. In this way we remove the permutations of label combinations, so that they won't greatly eat up Google's allowed budget of 2,000 labels per blog. The only drawback in this method is that you need to have a sizable number of blog posts, so that there is a considerable chance of having common compound keywords.

In order to determine the number of combinations of compound keywords, we use some theorems in Combinatorics.  If $n$ is the number of keywords for a particular facet and $r$ is the number of keywords in a compound, then the total keyword combinations that can be formed is \begin{equation} C(n,r) = \frac{n!}{(n-r)!r!}. \end{equation} For example, if there are $n=6$ keywords and you take them $r=4$ at a time, then the total number of keyword combinations is $C(6,4) = 6!/(4!2!)=15$.  Now, if we sum up all combinations for $r=0$ to $r=n$, we obtain the binomial theorem: \begin{equation} \sum_{r=0}^{r=n} C(n,r) = 2^n. \end{equation} For example, if there are $n=6$ keywords for a particular facet, then the total number of compound keywords that can be made is $2^6=64$. If there are 5 facets with the same number of keywords each, then the total number of compound keywords for the whole blog is $65\times 5=325$.

B. Compound Keyword Category System

1. Personality
  • Programming, Marketing, Design, Physics, Mathematics, Literature, Writing
For example, if the article is about programming, physics, and literature, we write the label as "literature-physics-programming".

2. Matter or Property
  • Text, Graphics, Photo, Audio, Video, Equation
For example, if the article has texts, graphics, and videos, we write the label as "graphics-text-videos". 
3. Energy
  • Social, Physical, Emotional, Spiritual, Financial
For example, if the article is about cost of marketing, we can label it as "financial-social."
4. Space
  • Nation, Sphere, 
For example, if the topic is about the interaction of the lithosphere, ionosphere, and magnetosphere, we can write the label as "ionosphere-lithosphere-magnetosphere".
5. Time
  • History, News, Future, Fiction
For example, if the article is a news with some thoughts on the future, we can write the label as "future-news".


The author is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for him to earn fees by linking to Amazon.com and affiliated sites.