Commatizing numbers in Python
Platform: Any
Language: Python
Requirements: None
Earlier today, I stumbled across a small programming challenge on rosettacode.org that required a solution in Python. I was intrigued enough to give it a shot and below you will find my approach to “commatizing” numbers in string.
The challenge is simple. Take a string, see if it contains numbers and then format the numbers in a specific way, by clustering the digits and inserting separators like commas, periods, blanks, or blanks, to make them more readable.
In essence, the stringpi=3.14159265358979323846264338327950288419716939937510582097494459231
will be turned into something likepi=3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59231
The approach needs to be flexible so that clusters can have a variable length and the separator can be defined.
import re as RegEx def Commatize( _string, _startPos=0, _clusterLen=3, _separator="," ): outString = "" strPos = 0 matches = RegEx.findall( "[0-9]*", _string ) for match in matches[:-1]: if not match: outString += _string[ strPos ] strPos += 1 else: if len(match) > _clusterLen: leadIn = match[:_startPos] clusters = [ match [ i:i + _clusterLen ] for i in range ( _startPos, len ( match ), _clusterLen ) ] outString += leadIn + _separator.join( clusters ) else: outString += match strPos += len( match ) return outString print ( Commatize( "pi=3.14159265358979323846264338327950288419716939937510582097494459231", 0, 5, " " ) ) print ( Commatize( "The author has two Z$100000000000000 Zimbabwe notes (100 trillion).", 0, 3, "." )) print ( Commatize( "\"-in Aus$+1411.8millions\"" )) print ( Commatize( "===US$0017440 millions=== (in 2000 dollars)" )) print ( Commatize( "123.e8000 is pretty big." )) print ( Commatize( "The land area of the earth is 57268900(29% of the surface) square miles." )) print ( Commatize( "Ain't no numbers in this here words, nohow, no way, Jose." )) print ( Commatize( "James was never known as 0000000007" )) print ( Commatize( "Arthur Eddington wrote: I believe there are 15747724136275002577605653961181555468044717914527116709366231425076185631031296 protons in the universe." )) print ( Commatize( "␢␢␢$-140000±100 millions." )) print ( Commatize( "6/9/1946 was a good year for some.", 0, 4 ))
The workings of the code are very simple. I use a regular expression to locate series of digits in the string and then iterate through the resulting match to create a new output string. I am skipping the last match because it represents the End-of-string marker.
for match in matches[:-1]:
Whenever RegEx encounters a non-digit, it will create an empty match, so as I iterate through matches, I check if a match is empty and simply copy the respective letter from the original string.
If a valid digit match is found, I check the length of the series. If it’s shorter than the desired cluster length, it is copied verbatim, while longer series will be turned into a list of clusters of the desired length. They are then joined back together, using the separator. I am doing this with an often-overlooked little Python trick by applying the join() function to the separator string, while providing the list of clusters as a parameter.
_separator.join( clusters )
The print statements are simple output code to illustrate the usage and the results of the function.
I hope this is something you can use for yourself some time.
Hang loose!