Commatizing numbers in Python
Earlier today, I stumbled across a small programming challenge on rosettacode.org that required a solution in Python. I was intrigued enough to give it a shot and below you will find my approach to “commatizing” numbers in string.
The challenge is simple. Take a string, see if it contains numbers and then format the numbers in a specific way, by clustering the digits and inserting separators like commas, periods, blanks, or blanks, to make them more readable.
In essence, the stringpi=3.14159265358979323846264338327950288419716939937510582097494459231
will be turned into something likepi=3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59231
The approach needs to be flexible so that clusters can have a variable length and the separator can be defined.
import re as RegEx def Commatize( _string, _startPos=0, _clusterLen=3, _separator="," ): outString = "" strPos = 0 matches = RegEx.findall( "[0-9]*", _string ) for match in matches[:-1]: if not match: outString += _string[ strPos ] strPos += 1 else: if len(match) > _clusterLen: leadIn = match[:_startPos] clusters = [ match [ i:i + _clusterLen ] for i in range ( _startPos, len ( match ), _clusterLen ) ] outString += leadIn + _separator.join( clusters ) else: outString += match strPos += len( match ) return outString print ( Commatize( "pi=3.14159265358979323846264338327950288419716939937510582097494459231", 0, 5, " " ) ) print ( Commatize( "The author has two Z$100000000000000 Zimbabwe notes (100 trillion).", 0, 3, "." )) print ( Commatize( "\"-in Aus$+1411.8millions\"" )) print ( Commatize( "===US$0017440 millions=== (in 2000 dollars)" )) print ( Commatize( "123.e8000 is pretty big." )) print ( Commatize( "The land area of the earth is 57268900(29% of the surface) square miles." )) print ( Commatize( "Ain't no numbers in this here words, nohow, no way, Jose." )) print ( Commatize( "James was never known as 0000000007" )) print ( Commatize( "Arthur Eddington wrote: I believe there are 15747724136275002577605653961181555468044717914527116709366231425076185631031296 protons in the universe." )) print ( Commatize( "␢␢␢$-140000±100 millions." )) print ( Commatize( "6/9/1946 was a good year for some.", 0, 4 ))
The workings of the code are very simple. I use a regular expression to locate series of digits in the string and then iterate through the resulting match to create a new output string. I am skipping the last match because it represents the End-of-string marker.
for match in matches[:-1]:
Whenever RegEx encounters a non-digit, it will create an empty match, so as I iterate through matches, I check if a match is empty and simply copy the respective letter from the original string.
If a valid digit match is found, I check the length of the series. If it’s shorter than the desired cluster length, it is copied verbatim, while longer series will be turned into a list of clusters of the desired length. They are then joined back together, using the separator. I am doing this with an often-overlooked little Python trick by applying the join() function to the separator string, while providing the list of clusters as a parameter.
_separator.join( clusters )
The print statements are simple output code to illustrate the usage and the results of the function.
I hope this is something you can use for yourself some time.
Hang loose!
data:image/s3,"s3://crabby-images/fdf27/fdf27755c87283e04e60044aa9311eb72c95c8e2" alt="Share on Facebook Facebook"
data:image/s3,"s3://crabby-images/3e635/3e635b63bddc3c3a763e596b5938e3b112f91dbd" alt="Share on Twitter twitter"
data:image/s3,"s3://crabby-images/ad59e/ad59ebe6a73492c08719598b3ff7a9db895b7895" alt="Share on Reddit reddit"
data:image/s3,"s3://crabby-images/8631a/8631ac6d95cfb4c0fe7e5d2cf171633452b63200" alt="Pin it with Pinterest pinterest"
data:image/s3,"s3://crabby-images/07a7f/07a7fbbbb9f1a457a057ea30b51f1c7ede622afe" alt="Share on Linkedin linkedin"
data:image/s3,"s3://crabby-images/2e7b4/2e7b41ec01d711f5d5ed6a5a334dbd02278fcd1f" alt="Share by email mail"