So you need to get a count of all asterisks in a string
You have a string, which I’ll call sample
, and you need to count the number of asterisks in it. For comparison purposes:
>>> sample = ' *' * 5000000 #Just imagine that you got this from somewhere else
What do you do?
Solution A: filter
(ifilter won’t work here, because you can’t count it.)
>>> start = time.time(); len(filter(lambda ch: ch == '*', sample)); end = time.time() 5000000 #The correct result >>> end - start 2.2621231079101562 #seconds
Solution B: List comprehension
(Generator comprehensions won’t work here, because you can’t count them.)
>>> start = time.time(); len([ch for ch in sample if ch == '*']); end = time.time() 5000000 >>> end - start 2.005012035369873
OK, so it looks like I’ll be going with list com—WAIT! What’s that!? It’s a regular expression!
Solution C: re.findall
>>> start = time.time(); len(re.findall(r'\*', sample)); end = time.time() 5000000 >>> end - start 0.40664911270141602
…Wow.
Incidentally, I didn’t find a statistically-significant speed-up in running re.compile over the expression first. Apparently, this expression isn’t complex enough for that to help any.
[Added 2007-04-19] So I guess that’s it then. re.findall is the winn—
What’s this? New comment from Chuck…
Solution D: str.count
>>> start = time.time(); sample.count('*'); end = time.time(); 5000000 >>> end - start 0.038351058959960938
A factor-of-ten improvement! Wow—thanks, Chuck!
April 17th, 2007 at 04:26:54
Peter, I think re.compile is always called before actual execution, so precompiling only helps if you’re re-using the pattern heavily. In your case, you’re only using it once, so it doesn’t matter.
April 17th, 2007 at 12:50:06
Good point.
April 19th, 2007 at 21:42:28
Faster still is
sample.count("*")
. By a lot. The regex takes about three seconds on my PowerBook, while count() takes half a second.April 19th, 2007 at 21:53:41
Chuck: Wow! I didn’t know about str.count—it beat re.findall by a factor of ten!
Thanks a million! *goes to edit the post accordingly*