Proper Title Case creation in Python and C#
Platform: Any/Unity
Language: Python/C#
Requirements: None
Many of you may be familiar with the term Title Case in conjunction with text handling. It is typically used for things, such as book titles, game titles, movie titles, and so forth.
The common perception appears to be that writing a name or headline in title case means to simply capitalize the first letter of each word, like this.
This, however, is not really how title cases work. The correct application of title cases asks for the capitalization of the first letter of each word, except for certain small words, such as articles and short prepositions. Therefore, the correct way to write the example above would be…
Unfortunately, most programming languages and libraries get this completely wrong and, by default, offer title case implementations that are simply not correct. To make up for that deficiency, I decided to write my own implementation that I am making available to you here in Python and C#, for use in Unity.
import re # Import the regular expression library Exclusions = [ "a", "an", "the", # Articles "and", "but", "or", "by", "nor", "yet", "so", # Conjunctions "about", "above", "across", "after", "against", "along", "among", "around", "at", "before", # Prepositions "behind", "between", "beyond", "but", "by", "concerning", "despite", "down", "during", "except", "following", "for", "from", "in", "including", "into", "like", "near", "of", "off", "on", "out", "over", "plus", "since", "through", "throughout", "to", "towards", "under", "until", "up", "upon", "with", "within", "without" ] def TitleCase( curText: str ) -> str: """ Take a string and return it in a fashion that follows proper title case guidelines """ outString = "" fragments = re.split( r'(\".*?\")|(\'.*?\')|(“.*?”)|(‘.*?’)', curText ) # Extract titles in quotation marks from string for fragment in fragments: # Treat and re-assemble all fragments if fragment: # skip empty matches generated by the OR in regex fragString = "" tokens = fragment.split(); # Break string into individual words if tokens: for word in tokens: # Check each word punct = word[-1] # Check for trailing punctuation mark if punct.isalpha(): punct = "" else: word = word[:-1] if word.lower() in Exclusions: # if it is excluded, fragString += word.lower() + punct + " " # make it lowercase else: # otherwise, fragString += word.capitalize() + punct + " " # capitalize it cap = 1 if not fragString[0].isalpha(): cap = 2 outString += ( fragString[:cap].upper() + fragString[cap:]).strip() + " " return (outString[:1].upper() + outString[1:]).strip() # Capitalize first letter and strip trailing space
The approach is simple. I create a list containing articles, conjunctions, and prepositions that I am using to identify words in a string that should not be capitalized. Going through the string, word by word, I then create a new output string with the respective words starting with a capital or lowercase letter. Couldn’t be simpler, really, and the advantage of the list is that it can be easily adapted to include certain words for you individual special cases.
Simple steps to put it to work
Here’s a small test-suite to see the implementation at work…
TestText = [ "how to interpret \"the hitchhiker's guide to the galaxy\"", "snow white\nand the seven dwarfs", "newcastle upon tyne", "brighton on sea ", "A dog's Tale", "the last of the mohicans", "how to be smart", "about a boy", "reading \'fight club\' through a postmodernist lens", "how to interpret “the hitchiker's guide to the galaxy”", "reading ‘fight club’ through a postmodernist lens" ] for text in TestText: print ( TitleCase( text ))
As you will see, it correctly generates the following output, just the way it should be.
How to Interpret "The Hitchhiker's Guide to the Galaxy"
Snow White and the Seven Dwarfs
Newcastle upon Tyne
Brighton on Sea
A Dog's Tale
The Last of the Mohicans
How to Be Smart
About a Boy
Reading 'Fight Club' Through a Postmodernist Lens
How to Interpret “The Hitchhiker's Guide to the Galaxy”
Reading ‘Fight Club’ Through a Postmodernist Lens
Let’s do title cases in Unity
I’ve also implemented the same thing in C#, as an extension to the string
class.
public static class Extensions { static readonly List Exclusions = new List () { "a", "an", "the", // Articles "and", "but", "or", "by", "nor", "yet", "so", // Conjunctions "about", "above", "across", "after", "against", "along", "among", "around", "at", "before", // Prepositions "behind", "between", "beyond", "but", "by", "concerning", "despite", "down", "during", "except", "following", "for", "from", "in", "including", "into", "like", "near", "of", "off", "on", "out", "over", "plus", "since", "through", "throughout", "to", "towards", "under", "until", "up", "upon", "with", "within", "without" }; public static string ProperTitleCase ( this string curText ) // Take a string and return it in a { // fashion that follows proper title case guidelines string outString = ""; string[] fragments = Regex.Split ( curText, @"("".*?"")|('.*?')|(“.*?”)|(‘.*?’)" ); foreach ( string fragment in fragments ) { if ( !string.IsNullOrEmpty ( fragment ) ) { string fragString = string.Empty; string[] tokens = Regex.Split ( fragment, "[ \n\r\t]" ); // Break string into individual words foreach ( string word in tokens ) // Check each word { if ( word != "" ) // Make sure it is a valid word { string curWord = word; string punct = word.Substring ( word.Length - 1, 1 ); // Check punctuation marks if ( char.IsLetter( punct[0] )) { punct = ""; } else { curWord = word.Substring ( 0, word.Length - 1 ); // Remove punctuation from word } if ( Exclusions.Contains ( word.ToLower () ) ) // if word is excluded, { fragString += word.ToLower () + punct + " "; // make it lowercase } else // otherwise, { int fcap = (char.IsLetter ( word[ 0 ] )) ? 1 : 2; fragString += word.Substring( 0, fcap-1 ) + char.ToUpper ( word[ fcap-1 ] ) + word.Substring ( fcap ) + punct + " "; // capitalize it } } } int cap = (char.IsLetter ( fragString[ 0 ] )) ? 1 : 2; outString += fragString.Substring ( 0, cap - 1 ) + (char.ToUpper ( fragString[ cap - 1 ] ) + fragString.Substring ( cap )).TrimEnd () + " "; } } return (char.ToUpper ( outString[ 0 ] ) + outString.Substring ( 1 )).TrimEnd (); // Capitalize first letter and strip trailing space } }
Implementing it as an extension of the string class makes the usage of the function a lot easier and more seamless. It allows you to call directly on any string via the string.ProperTitleCase()
method.
Here is a small code snippet that shows you how you would use this in a script in Unity.
using System.Collections; using System.Collections.Generic; using UnityEngine; public class TitleCase : MonoBehaviour { string[] TestText = { "how to interpret \"the hitchhiker's guide To The galaxy\"", "snow white\nand the seven dwarfs", "newcastle upon tyne", " brighton on sea ", "A dog's Tale", "the last of the\tmohicans", "how to be smart, or not", "about a boy", "reading \'fight club\' through a postmodernist lens", "how to interpret “the hitchiker's guide to the galaxy”", "reading ‘fight club’ through a postmodernist lens" }; void Start () { foreach ( string example in TestText ) { Debug.Log ( example.ProperTitleCase () ); } } }
There you go! Let’s make half-baked title cases a thing of the past.
Very helpful saved me a ton of time, thank you very much