Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Title-case a String in Python

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
5 min. read Python 3.9—3.13
Share
Copied to clipboard.
Tags

You'd like to capitalize every character of a string. How do you do it?

You might think the string title method is the answer. But that's likely not the solution you're looking for.

The string title method

Python's string (str) type has a title method for title-casing:

>>> name = "Title-case a string in Python"
>>> name.title()
'Title-Case A String In Python'

This method works pretty well for some strings, but it's far from perfect. Did we want to capitalize A and In? Maybe, maybe not. What about the C after the - in title-case? Should it be capitalized?

Most style guides agree that articles (like a) and short prepositions (like in) shouldn't be capitalized in titles. Some style guides say that words just after a - should be capitalized and some say they shouldn't.

But there's an even bigger issue with the string title method.

The title method sees the beginning of a word as any letter that isn't preceded by another letter. So letters after hyphens are capitalized, but so are letters after apostrophes!

>>> name = "I don't know how to title-case"
>>> name.title()
"I Don'T Know How To Title-Case"

We almost certainly don'T want to capitalize the T in Don't.

The string capitalize method

What about the string capitalize method?

This one's even further from what we want.

>>> name = "is this title-case?"
>>> name.capitalize()
'Is this title-case?'

The capitalize method only capitalizes the very first letter in the string.

So string methods are out. What next?

The string module's capwords function

Python's standard library also has a string module that might be helpful.

>>> import string

The string module contains a bunch of miscellaneous string-related functions.

Unlike the title method, the capwords function does not uppercase letters after an apostrophe:

>>> statement = "Python: it's pretty great"
>>> string.capwords(statement)
"Python: It's Pretty Great"

But it also doesn't capitalize a letter after a hyphen:

>>> name = "I don't know how to title-case"
>>> string.capwords(name)
"I Don't Know How To Title-case"

And it misses words that follow an open parenthesis:

>>> phrase = "keyword (named) arguments"
>>> string.capwords(phrase)
'Keyword (named) Arguments'

Both the string title method and the string.capwords function define a "word" in a problematic way. The title method sees anything that isn't a letter as a division between two words (yes even apostrophes separate words according to title). The string.capwords function splits string solely by whitespace and then capitalizes the first character of each split section (so (named) is seen as a word and ( is "capitalized").

Neither of these approaches is ideal.

Regular expressions

What about a regular expression?

We could use a regular expression like this:

import re


def title(value):
    words_and_nonword_chunks = re.split(r'([\s\-"({\[<]+)', value)
    return "".join(
        # upper/lower-casing symbols and whitespace does nothing
        chunk[0].upper() + chunk[1:].lower()
        for chunk in words_and_nonword_chunks
    )

Or we could use a regular expression with verbose mode enabled, to add whitespace and comments that document what we're looking for:

import re


def title(value):
    words_and_nonword_chunks = re.split(r"""
        (                   # capture both words and the chunks between words
            (?:             # split on any amount of consecutive:
                \s |        # - whitespace characters
                -  |        # - dashes
                "  |        # - double quotes
                [\[({<]     # - opening brackets and braces
            )+
        )
    """, value, flags=re.VERBOSE)
    return "".join(
        # upper/lower-casing symbols and whitespace does nothing
        chunk[0].upper() + chunk[1:].lower()
        for chunk in words_and_nonword_chunks
    )

In both of these regular expressions we're finding each "word" as well as the symbols and whitespace that join each word and then we're title-casing the words manually and joining them back together.

The Jinja2 templating framework has a title filter that uses a regular expression that's very similar to the above regular expressions. The Django web framework takes a different approach for its title template filter.

Django first uses the string title method and then uses a regular expression to clean up the results, resulting in a title function similar to this:

import re


def title(value):
    titled = value.title()
    titled = re.sub(r"([a-z])'([A-Z])", lowercase_match, titled)  # Fix Don'T
    titled = re.sub(r"\d([A-Z])", lowercase_match, titled)  # Fix 1St and 2Nd
    return titled


def lowercase_match(match):
    """Lowercase the whole regular expression match group."""
    return match.group().lower()

This lowercases letters after apostrophes as well as letters after digits.

In many cases, these two approaches work nearly identically:

>>> title("Title-case a string in Python")
'Title-Case A String In Python'
>>> title("Python: it's pretty great")
"Python: It's Pretty Great"
>>> title("keyword (named) arguments")
'Keyword (Named) Arguments'
>>> title("Python is the 2nd best language for everything")
'Python Is the 2nd Best Language for Everything'

And both work fairly well... unless we want to lowercase articles and prepositions.

What about articles and short prepositions?

How is that title-casing functions in popular Python libraries don't follow common title-casing conventions? Well, it's a hard problem.

These libraries don't assume that all text is written in English. But even for English text, there isn't just one title-casing convention.

Here's a phrase for which four popular style guides recommend four different capitalizations:

  • Chicago: Patterns over Anti-patterns
  • MLA: Patterns over Anti-Patterns
  • APA: Patterns Over Anti-patterns
  • AP: Patterns Over Anti-Patterns

How you title-case your strings really depends on the style guide you choose. And no style guide has easy-to-code title-casing conventions because there are always edge cases like pytest, O'Reilly, and iPhone

You could try to write a fairly complex regular expression to ensure particular words aren't title-cased.

import re


DO_NOT_TITLE = [
    "a", "an", "and", "as", "at", "but", "by", "en", "for", "from", "if",
    "in", "nor", "of", "on", "or", "per", "the", "to", "v", "vs", "via", "with"
]



def title_word(chunk):
    """Title-case a given word (or do noting to a non-word chunk)."""
    if chunk.lower() in DO_NOT_TITLE:
        return chunk.lower()
    return chunk[0].upper() + chunk[1:].lower()


def title(value):
    words_and_nonword_chunks = re.split(r"""
        (                   # capture both words and the chunks between words
            (?:             # split on consecutive:
                \s |        # - whitespace characters
                -  |        # - dashes
                "  |        # - double quotes
                [\[({<]     # - opening brackets and braces
            )+
        )
    """, value, flags=re.VERBOSE)
    return "".join(
        # upper/lower-casing symbols and whitespace does nothing
        title_word(chunk)
        for chunk in words_and_nonword_chunks
    )

Or you could use a library, like titlecase, which follows the New York Times Style Manual's title-casing recommendations.

You could even try to get really fancy and use natural language processing with a library like spacy to invent your own MLA/APA/Chicago-style title-casing tool, but I don't recommend it. NLP libraries are accurate but slow and often require a bit of setup work.

Don't use the string title method in Python

Regardless of whether you care which words are capitalized and which aren't, you almost certainly don't want to capitalize every letter that follows an apostrophe. When you need to title-case a string, don't use the string title method.

I recommend using either:

  • The string.capwords function (at least it's better than title in most cases)
  • A regular expression (possibly paired with the title method)
  • A library like titlecase which handles many complex edge cases as well
  • Using a semi-automated process where a human can correct mistakes if needed

It's unfortunate, but if you want to title-case correctly you either need to do it manually or use some fairly complex logic.

For more on Python's string methods, see Python string methods to know.

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.