Idle Time

So you need to get a count of all asterisks in a string

2007-04-16 23:46:04 -08:00

You have a string, which I’ll call sample, and you need to count the number of asterisks in it. For comparison purposes:

>>> sample = ' *' * 5000000 #Just imagine that you got this from somewhere else

What do you do?

Solution A: filter

(ifilter won’t work here, because you can’t count it.)

>>> start = time.time(); len(filter(lambda ch: ch == '*', sample)); end = time.time()
5000000 #The correct result
>>> end - start
2.2621231079101562 #seconds

Solution B: List comprehension

(Generator comprehensions won’t work here, because you can’t count them.)

>>> start = time.time(); len([ch for ch in sample if ch == '*']); end = time.time()
5000000
>>> end - start
2.005012035369873

OK, so it looks like I’ll be going with list com—WAIT! What’s that!? It’s a regular expression!

Solution C: re.findall

>>> start = time.time(); len(re.findall(r'\*', sample)); end = time.time()
5000000
>>> end - start
0.40664911270141602

…Wow.

Incidentally, I didn’t find a statistically-significant speed-up in running re.compile over the expression first. Apparently, this expression isn’t complex enough for that to help any.

[Added 2007-04-19] So I guess that’s it then. re.findall is the winn—

What’s this? New comment from Chuck…

Solution D: str.count

>>> start = time.time(); sample.count('*'); end = time.time();
5000000
>>> end - start
0.038351058959960938

A factor-of-ten improvement! Wow—thanks, Chuck!

Categories: Programming; Python. | Comments: 4 (feed).

4 Responses to “So you need to get a count of all asterisks in a string”

Chris Ryland Says:
April 17th, 2007 at 04:26:54
Peter, I think re.compile is always called before actual execution, so precompiling only helps if you’re re-using the pattern heavily. In your case, you’re only using it once, so it doesn’t matter.
Peter Hosey Says:
April 17th, 2007 at 12:50:06
Good point.
Chuck Says:
April 19th, 2007 at 21:42:28
Faster still is sample.count("*"). By a lot. The regex takes about three seconds on my PowerBook, while count() takes half a second.
Peter Hosey Says:
April 19th, 2007 at 21:53:41
Chuck: Wow! I didn’t know about str.count—it beat re.findall by a factor of ten!

Thanks a million! *goes to edit the post accordingly*