You'd like to capitalize every character of a string. How do you do it?
You might think the string title
method is the answer.
But that's likely not the solution you're looking for.
Python's string (str
) type has a title
method for title-casing:
>>> name = "Title-case a string in Python"
>>> name.title()
'Title-Case A String In Python'
This method works pretty well for some strings, but it's far from perfect.
Did we want to capitalize A
and In
?
Maybe, maybe not.
What about the C
after the -
in title-case
?
Should it be capitalized?
Most style guides agree that articles (like a) and short prepositions (like in) shouldn't be capitalized in titles.
Some style guides say that words just after a -
should be capitalized and some say they shouldn't.
But there's an even bigger issue with the string title
method.
The title
method sees the beginning of a word as any letter that isn't preceded by another letter.
So letters after hyphens are capitalized, but so are letters after apostrophes!
>>> name = "I don't know how to title-case"
>>> name.title()
"I Don'T Know How To Title-Case"
We almost certainly don'T want to capitalize the T
in Don't
.
capitalize
methodWhat about the string capitalize
method?
This one's even further from what we want.
>>> name = "is this title-case?"
>>> name.capitalize()
'Is this title-case?'
The capitalize
method only capitalizes the very first letter in the string.
So string methods are out. What next?
string
module's capwords
functionPython's standard library also has a string
module that might be helpful.
>>> import string
The string
module contains a bunch of miscellaneous string-related functions.
Unlike the title
method, the capwords
function does not uppercase letters after an apostrophe:
>>> statement = "Python: it's pretty great"
>>> string.capwords(statement)
"Python: It's Pretty Great"
But it also doesn't capitalize a letter after a hyphen:
>>> name = "I don't know how to title-case"
>>> string.capwords(name)
"I Don't Know How To Title-case"
And it misses words that follow an open parenthesis:
>>> phrase = "keyword (named) arguments"
>>> string.capwords(phrase)
'Keyword (named) Arguments'
Both the string title
method and the string.capwords
function define a "word" in a problematic way.
The title
method sees anything that isn't a letter as a division between two words (yes even apostrophes separate words according to title
).
The string.capwords
function splits string solely by whitespace and then capitalizes the first character of each split section (so (named)
is seen as a word and (
is "capitalized").
Neither of these approaches is ideal.
What about a regular expression?
We could use a regular expression like this:
import re
def title(value):
words_and_nonword_chunks = re.split(r'([\s\-"({\[<]+)', value)
return "".join(
# upper/lower-casing symbols and whitespace does nothing
chunk[0].upper() + chunk[1:].lower()
for chunk in words_and_nonword_chunks
)
Or we could use a regular expression with verbose mode enabled, to add whitespace and comments that document what we're looking for:
import re
def title(value):
words_and_nonword_chunks = re.split(r"""
( # capture both words and the chunks between words
(?: # split on any amount of consecutive:
\s | # - whitespace characters
- | # - dashes
" | # - double quotes
[\[({<] # - opening brackets and braces
)+
)
""", value, flags=re.VERBOSE)
return "".join(
# upper/lower-casing symbols and whitespace does nothing
chunk[0].upper() + chunk[1:].lower()
for chunk in words_and_nonword_chunks
)
In both of these regular expressions we're finding each "word" as well as the symbols and whitespace that join each word and then we're title-casing the words manually and joining them back together.
The Jinja2 templating framework has a title
filter that uses a regular expression that's very similar to the above regular expressions.
The Django web framework takes a different approach for its title
template filter.
Django first uses the string title
method and then uses a regular expression to clean up the results, resulting in a title
function similar to this:
import re
def title(value):
titled = value.title()
titled = re.sub(r"([a-z])'([A-Z])", lowercase_match, titled) # Fix Don'T
titled = re.sub(r"\d([A-Z])", lowercase_match, titled) # Fix 1St and 2Nd
return titled
def lowercase_match(match):
"""Lowercase the whole regular expression match group."""
return match.group().lower()
This lowercases letters after apostrophes as well as letters after digits.
In many cases, these two approaches work nearly identically:
>>> title("Title-case a string in Python")
'Title-Case A String In Python'
>>> title("Python: it's pretty great")
"Python: It's Pretty Great"
>>> title("keyword (named) arguments")
'Keyword (Named) Arguments'
>>> title("Python is the 2nd best language for everything")
'Python Is the 2nd Best Language for Everything'
And both work fairly well... unless we want to lowercase articles and prepositions.
How is that title-casing functions in popular Python libraries don't follow common title-casing conventions? Well, it's a hard problem.
These libraries don't assume that all text is written in English. But even for English text, there isn't just one title-casing convention.
Here's a phrase for which four popular style guides recommend four different capitalizations:
Patterns over Anti-patterns
Patterns over Anti-Patterns
Patterns Over Anti-patterns
Patterns Over Anti-Patterns
How you title-case your strings really depends on the style guide you choose.
And no style guide has easy-to-code title-casing conventions because there are always edge cases like pytest
, O'Reilly
, and iPhone
You could try to write a fairly complex regular expression to ensure particular words aren't title-cased.
import re
DO_NOT_TITLE = [
"a", "an", "and", "as", "at", "but", "by", "en", "for", "from", "if",
"in", "nor", "of", "on", "or", "per", "the", "to", "v", "vs", "via", "with"
]
def title_word(chunk):
"""Title-case a given word (or do noting to a non-word chunk)."""
if chunk.lower() in DO_NOT_TITLE:
return chunk.lower()
return chunk[0].upper() + chunk[1:].lower()
def title(value):
words_and_nonword_chunks = re.split(r"""
( # capture both words and the chunks between words
(?: # split on consecutive:
\s | # - whitespace characters
- | # - dashes
" | # - double quotes
[\[({<] # - opening brackets and braces
)+
)
""", value, flags=re.VERBOSE)
return "".join(
# upper/lower-casing symbols and whitespace does nothing
title_word(chunk)
for chunk in words_and_nonword_chunks
)
Or you could use a library, like titlecase, which follows the New York Times Style Manual's title-casing recommendations.
You could even try to get really fancy and use natural language processing with a library like spacy to invent your own MLA/APA/Chicago-style title-casing tool, but I don't recommend it. NLP libraries are accurate but slow and often require a bit of setup work.
title
method in PythonRegardless of whether you care which words are capitalized and which aren't, you almost certainly don't want to capitalize every letter that follows an apostrophe.
When you need to title-case a string, don't use the string title
method.
I recommend using either:
string.capwords
function (at least it's better than title
in most cases)title
method)It's unfortunate, but if you want to title-case correctly you either need to do it manually or use some fairly complex logic.
For more on Python's string methods, see Python string methods to know.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.