Proper Title Case creation in Python and C#

Python Logo
Unity Logo

Platform: Any/Unity
Language: Python/C#
Requirements: None

Many of you may be familiar with the term Title Case in conjunction with text handling. It is typically used for things, such as book titles, game titles, movie titles, and so forth.
The common perception appears to be that writing a name or headline in title case means to simply capitalize the first letter of each word, like this.

Snow White And The Seven Dwarfs

This, however, is not really how title cases work. The correct application of title cases asks for the capitalization of the first letter of each word, except for certain small words, such as articles and short prepositions. Therefore, the correct way to write the example above would be…

Snow White and the Seven Dwarfs

Unfortunately, most programming languages and libraries get this completely wrong and, by default, offer title case implementations that are simply not correct. To make up for that deficiency, I decided to write my own implementation that I am making available to you here in Python and C#, for use in Unity.

import re            # Import the regular expression library

Exclusions = [
  "a", "an", "the",                                                                             # Articles
  "and", "but", "or",  "by", "nor", "yet", "so",                                                # Conjunctions
  "about", "above", "across", "after", "against", "along", "among", "around", "at", "before",   # Prepositions
  "behind", "between", "beyond", "but", "by", "concerning", "despite", "down", "during",
  "except", "following", "for", "from", "in", "including", "into", "like", "near", "of",
  "off", "on", "out", "over", "plus", "since", "through", "throughout", "to", "towards",
  "under", "until", "up", "upon", "with", "within", "without"
 ]

def TitleCase( curText: str ) -> str:
  """ Take a string and return it in a fashion that follows proper title case guidelines """

  outString = ""
  fragments = re.split( r'(\".*?\")|(\'.*?\')|(“.*?”)|(‘.*?’)', curText )    # Extract titles in quotation marks from string

  for fragment in fragments:                                  # Treat and re-assemble all fragments
    if fragment:                                              # skip empty matches generated by the OR in regex  
      fragString = ""
      tokens = fragment.split();                              # Break string into individual words

      if tokens:
        for word in tokens:                                   # Check each word

          punct = word[-1]                                    # Check for trailing punctuation mark
          if punct.isalpha():
            punct = ""
          else:
            word = word[:-1]

          if word.lower() in Exclusions:                      # if it is excluded,
            fragString += word.lower() + punct + " "          # make it lowercase
          else:                                               # otherwise,
            fragString += word.capitalize() + punct + " "     # capitalize it

        cap = 1
        if not fragString[0].isalpha():
          cap = 2

        outString += ( fragString[:cap].upper() + fragString[cap:]).strip() + " "

  return (outString[:1].upper() + outString[1:]).strip()      # Capitalize first letter and strip trailing space

The approach is simple. I create a list containing articles, conjunctions, and prepositions that I am using to identify words in a string that should not be capitalized. Going through the string, word by word, I then create a new output string with the respective words starting with a capital or lowercase letter. Couldn’t be simpler, really, and the advantage of the list is that it can be easily adapted to include certain words for you individual special cases.

Simple steps to put it to work

Here’s a small test-suite to see the implementation at work…

TestText = [
  "how to interpret \"the hitchhiker's guide to the galaxy\"",
  "snow white\nand the seven dwarfs",
  "newcastle upon tyne",
  "brighton on sea ",
  "A dog's Tale",
  "the last of the mohicans",
  "how to be smart",
  "about a boy",
  "reading \'fight club\' through a postmodernist lens",
  "how to interpret “the hitchiker's guide to the galaxy”",
  "reading ‘fight club’ through a postmodernist lens"
]

for text in TestText:
  print ( TitleCase( text ))

As you will see, it correctly generates the following output, just the way it should be.


How to Interpret "The Hitchhiker's Guide to the Galaxy"
Snow White and the Seven Dwarfs
Newcastle upon Tyne
Brighton on Sea
A Dog's Tale
The Last of the Mohicans
How to Be Smart
About a Boy
Reading 'Fight Club' Through a Postmodernist Lens
How to Interpret “The Hitchhiker's Guide to the Galaxy”
Reading ‘Fight Club’ Through a Postmodernist Lens

Let’s do title cases in Unity

I’ve also implemented the same thing in C#, as an extension to the string class.

public static class Extensions
{
  static readonly List Exclusions = new List ()
  {
    "a", "an", "the",                                                                              // Articles
    "and", "but", "or",  "by", "nor", "yet", "so",                                                // Conjunctions
    "about", "above", "across", "after", "against", "along", "among", "around", "at", "before",    // Prepositions
    "behind", "between", "beyond", "but", "by", "concerning", "despite", "down", "during",
    "except", "following", "for", "from", "in", "including", "into", "like", "near", "of",
    "off", "on", "out", "over", "plus", "since", "through", "throughout", "to", "towards",
    "under", "until", "up", "upon", "with", "within", "without"
  };

  public static string ProperTitleCase ( this string curText )                        // Take a string and return it in a
  {                                                                                   // fashion that follows proper title case guidelines
    string outString = "";
    string[] fragments = Regex.Split ( curText, @"("".*?"")|('.*?')|(“.*?”)|(‘.*?’)" );

    foreach ( string fragment in fragments )
    {
      if ( !string.IsNullOrEmpty ( fragment ) )
      {
        string fragString = string.Empty;
        string[] tokens = Regex.Split ( fragment, "[ \n\r\t]" );                      // Break string into individual words

        foreach ( string word in tokens )                                             // Check each word
        {
          if ( word != "" )                                                           // Make sure it is a valid word
          {
            string curWord = word;
            string punct = word.Substring ( word.Length - 1, 1 );                      // Check punctuation marks
            if ( char.IsLetter( punct[0] ))
            {
              punct = "";
            }
            else
            {
              curWord = word.Substring ( 0, word.Length - 1 );                        // Remove punctuation from word
            }

            if ( Exclusions.Contains ( word.ToLower () ) )                            // if word is excluded,
            {
              fragString += word.ToLower () + punct + " ";                            // make it lowercase
            }
            else                                                                      // otherwise,
            {
              int fcap = (char.IsLetter ( word[ 0 ] )) ? 1 : 2;
              fragString += word.Substring( 0, fcap-1 ) + char.ToUpper ( word[ fcap-1 ] ) + word.Substring ( fcap ) + punct + " ";         // capitalize it
            }
          }
        }
        int cap = (char.IsLetter ( fragString[ 0 ] )) ? 1 : 2;
        outString += fragString.Substring ( 0, cap - 1 ) + 
          (char.ToUpper ( fragString[ cap - 1 ] ) + 
          fragString.Substring ( cap )).TrimEnd () + " ";
      }
    }
    return (char.ToUpper ( outString[ 0 ] ) + outString.Substring ( 1 )).TrimEnd ();    // Capitalize first letter and strip trailing space
  }
}

Implementing it as an extension of the string class makes the usage of the function a lot easier and more seamless. It allows you to call directly on any string via the string.ProperTitleCase() method.

Here is a small code snippet that shows you how you would use this in a script in Unity.

using System.Collections;
using System.Collections.Generic;
using UnityEngine;

public class TitleCase : MonoBehaviour
{
  string[] TestText =
  {
    "how to interpret \"the hitchhiker's guide To The galaxy\"",
    "snow white\nand the seven dwarfs",
    "newcastle upon tyne",
    " brighton on sea ",
    "A dog's Tale",
    "the last of the\tmohicans",
    "how to be smart, or not",
    "about a boy",
    "reading \'fight club\' through a postmodernist lens",
    "how to interpret “the hitchiker's guide to the galaxy”",
    "reading ‘fight club’ through a postmodernist lens"  
  };

  void Start ()
  {
    foreach ( string example in TestText )
    {
      Debug.Log ( example.ProperTitleCase () );
    }
  }
}

There you go! Let’s make half-baked title cases a thing of the past.

Facebooktwitterredditpinterestlinkedinmail

One Reply to “Proper Title Case creation in Python and C#”

Leave a Reply

Your email address will not be published.