Commatizing numbers in Python


Platform: Any
Language: Python
Requirements: None

Earlier today, I stumbled across a small programming challenge on rosettacode.org that required a solution in Python. I was intrigued enough to give it a shot and below you will find my approach to “commatizing” numbers in string.

The challenge is simple. Take a string, see if it contains numbers and then format the numbers in a specific way, by clustering the digits and inserting separators like commas, periods, blanks, or blanks, to make them more readable.

In essence, the string
pi=3.14159265358979323846264338327950288419716939937510582097494459231
will be turned into something like
pi=3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59231

The approach needs to be flexible so that clusters can have a variable length and the separator can be defined.

import re as RegEx

def Commatize( _string, _startPos=0, _clusterLen=3, _separator="," ):
  outString = ""
  strPos = 0
  matches = RegEx.findall( "[0-9]*", _string )

  for match in matches[:-1]:
    if not match:
      outString += _string[ strPos ]
      strPos += 1
    else:
      if len(match) > _clusterLen:
        leadIn = match[:_startPos]
        clusters =  [ match [ i:i + _clusterLen ] for i in range ( _startPos, len ( match ), _clusterLen ) ]
        outString += leadIn + _separator.join( clusters )
      else:
        outString += match

      strPos += len( match )

  return outString

print ( Commatize( "pi=3.14159265358979323846264338327950288419716939937510582097494459231", 0, 5, " " ) )
print ( Commatize( "The author has two Z$100000000000000 Zimbabwe notes (100 trillion).", 0, 3, "." ))
print ( Commatize( "\"-in Aus$+1411.8millions\"" ))
print ( Commatize( "===US$0017440 millions=== (in 2000 dollars)" ))
print ( Commatize( "123.e8000 is pretty big." ))
print ( Commatize( "The land area of the earth is 57268900(29% of the surface) square miles." ))
print ( Commatize( "Ain't no numbers in this here words, nohow, no way, Jose." ))
print ( Commatize( "James was never known as 0000000007" ))
print ( Commatize( "Arthur Eddington wrote: I believe there are 15747724136275002577605653961181555468044717914527116709366231425076185631031296 protons in the universe." ))
print ( Commatize( "␢␢␢$-140000±100 millions." ))
print ( Commatize( "6/9/1946 was a good year for some.", 0, 4 ))

The workings of the code are very simple. I use a regular expression to locate series of digits in the string and then iterate through the resulting match to create a new output string. I am skipping the last match because it represents the End-of-string marker.

for match in matches[:-1]:

Whenever RegEx encounters a non-digit, it will create an empty match, so as I iterate through matches, I check if a match is empty and simply copy the respective letter from the original string.

If a valid digit match is found, I check the length of the series. If it’s shorter than the desired cluster length, it is copied verbatim, while longer series will be turned into a list of clusters of the desired length. They are then joined back together, using the separator. I am doing this with an often-overlooked little Python trick by applying the join() function to the separator string, while providing the list of clusters as a parameter.

_separator.join( clusters )

The print statements are simple output code to illustrate the usage and the results of the function.

I hope this is something you can use for yourself some time.

Hang loose!

Facebooktwitterredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published.