Generating all combinations from an arbitrary number of sequences
In Python 2.5:
import itertools def combinations(*seqs): def base_combinations(so_far, seq): for x in seq: yield so_far + [x] def nested_combinations(so_far, seqs): if len(seqs) == 1: return base_combinations(so_far, seqs[0]) else: iterators = (nested_combinations(so_far + [x], seqs[1:]) for x in seqs[0]) return itertools.chain(*iterators) return nested_combinations([], seqs)
Usage:
for x, y, z in combinations(xs, ys, zs): # Perform magic
If you have a better way, I’d like to read it.
(Thanks to Colin for inspiring me to write it, and renaming one of the nested functions.)
October 22nd, 2007 at 23:55:51
To be clear, the original idea was to replace this:
with something like this:
I’ll bet someone proficient in functional python would have no trouble writing something like this :)
October 23rd, 2007 at 00:51:43
or
Looks pretty straighforward to me :-). (I’m sure there’s an even easier one floating around somewhere…)
October 23rd, 2007 at 01:31:26
Matt: Yes, that works as long as you know how many sequences to generate combinations from. The problem is to create a generic solution that works with any number of sequences.
October 23rd, 2007 at 01:42:58
Furthermore, it doesn’t allow me to write a loop body.
The ultimate goal is less indentation for nested for loops.
October 23rd, 2007 at 05:41:11
@Colin: Yes, you can write a loop body:
As for the original problem, a typical functional solution would use recursion:
it assumes seqs of mininum length 1.
A generator version of the same:
October 23rd, 2007 at 06:00:32
Hmm, I was doing it wrong before.
The first syntax is incredibly verbose. x y and z are specified three times each. I like the generator version — something like this would be great in itertools, I think.
October 23rd, 2007 at 06:13:34
please note that Arnar’s recursive combination function is extremely slow if you need to make lots of combinations, and will rapidly blow the stack. Check out probstat if you need more, faster, combinations without much more hassle.
October 23rd, 2007 at 06:14:29
Let me make a little optimization:
I switched the for loops, calling
combinations(*seqs[1:])
is presumably much more expensive than accessing an element inseqs[0]
– so it belongs in the outer loop.October 23rd, 2007 at 06:40:35
True, it’s not exactly blindingly fast. However, the stack only grows with the length of seqs, not the length of each sequence. The number of sequences is likely to be fairly low in most circumstances.
Running
is no trouble. In fact, neither is:
The likely bottleneck is thus the loop body, i.e. what you are actually doing with the combinations.
Oh, and of course there’s a typo in my last post, the inner call should be to icombinations:
October 23rd, 2007 at 06:59:57
And, since I have nothing productive at all to add to this as this time, I’ll note (for Google, if nothing else) that the output of this method is what’s called a “Cartesian Product”.
October 23rd, 2007 at 07:22:11
Arnar,
Another very easy optimisation you can do is to generate tuples instead of lists.
Changing “
yield [i] + rest
” to “yield (i,) + rest
” and “yield [i]
” to “yield (i,)
” has a dramatic effect (halving the time for me), since most time is spent generating these single element leaves.October 23rd, 2007 at 07:22:17
Knuth calls it “generating all n-tuples” in pre-fascicle 2a.
October 23rd, 2007 at 07:45:23
Brian, that’s nice. I would have used tuples if I were to use it in my code – but left it as lists here to match the interface in the OP. Didn’t realize the speed difference though, but that certainly makes sense.
October 23rd, 2007 at 11:12:02
Canonical version?
October 23rd, 2007 at 15:48:37
Actually, the previous version goes through items in the wrong order:
>>> print \'\\n\'.join(repr(i) for i in icombinations(\"12\", \"ab\", \"yz\"))
(\'1\', \'a\', \'y\')
(\'2\', \'a\', \'y\')
(\'1\', \'b\', \'y\')
(\'2\', \'b\', \'y\')
(\'1\', \'a\', \'z\')
(\'2\', \'a\', \'z\')
(\'1\', \'b\', \'z\')
(\'2\', \'b\', \'z\')
What we actually want is this:
def icombinations(*seqs):
if len(seqs) == 1:
for i in seqs[0]: yield (i, )
else:
for rest in icombinations(*seqs[:-1]):
for i in seqs[-1]:
yield rest + (i, )
In order to get this output:
>>> print \'\\n\'.join(repr(i) for i in icombinations(\"12\", \"ab\", \"yz\"))
(\'1\', \'a\', \'y\')
(\'1\', \'a\', \'z\')
(\'1\', \'b\', \'y\')
(\'1\', \'b\', \'z\')
(\'2\', \'a\', \'y\')
(\'2\', \'a\', \'z\')
(\'2\', \'b\', \'y\')
(\'2\', \'b\', \'z\')
October 23rd, 2007 at 15:50:01
(hey, my previous comment got double-escaped since I ignored the CAPTCHA the first time. feel free to delete it, and this parenthetical remark.)
Actually, the previous version goes through items in the wrong order:
>>> print '\n'.join(repr(i) for i in icombinations("12", "ab", "yz"))
('1', 'a', 'y')
('2', 'a', 'y')
('1', 'b', 'y')
('2', 'b', 'y')
('1', 'a', 'z')
('2', 'a', 'z')
('1', 'b', 'z')
('2', 'b', 'z')
What we actually want is this:
def icombinations(*seqs):
if len(seqs) == 1:
for i in seqs[0]: yield (i, )
else:
for rest in icombinations(*seqs[:-1]):
for i in seqs[-1]:
yield rest + (i, )
In order to get this output:
>>> print '\n'.join(repr(i) for i in icombinations("12", "ab", "yz"))
('1', 'a', 'y')
('1', 'a', 'z')
('1', 'b', 'y')
('1', 'b', 'z')
('2', 'a', 'y')
('2', 'a', 'z')
('2', 'b', 'y')
('2', 'b', 'z')
October 23rd, 2007 at 16:54:24
(This is fun to mess around with.) The Lisp way to do it:
October 23rd, 2007 at 18:14:50
I needed something like this for work today. Here’s my version – no recursion. I’ve not profiled it.
October 23rd, 2007 at 20:20:05
Gary: Looks cool, but Colin complains that the order in which combinations are yielded does not match the original nested
for
loops.Any chance we could get a version with clearer variable names? ☺
October 23rd, 2007 at 21:38:34
I suspect I could add a reverse in there to make it match the for loops. To be honest, I didn’t use this for work – it’s far too difficult to read. It just occurred to me when I was thinking about the problem and I thought it was cute.
I finally ran some profiling. It seems to be about 16x slower than the recursive solution. I don’t think it’s worth updating the variable names :-).
Gary Godfrey
Austin, TX, USA
October 24th, 2007 at 00:39:26
Non-scientific tests:
I wouldn’t take these numbers seriously though. In some of my tests, icombinations beat ecombinations.
October 24th, 2007 at 01:45:10
It’s interesting to see some timings. We basically have two dimensions that can vary, the number of sequences and the lengths of the sequences. Don’t have time right now to do it myself, but I propose the following test cases:
etc.
Arnar
October 24th, 2007 at 16:06:03
On a slightly related note, here’s how to generate the powerset (all possible subsets) of some set:
(Problem suggested by Árni Már)
October 25th, 2007 at 09:37:18
from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496807
October 25th, 2007 at 10:24:24
This actually makes it work the way I want it to:
Adding
print r
inside the loop reveals how it works:October 25th, 2007 at 10:27:28
Colin: I wonder whether the list comprehension could be turned into a generator comprehension somehow, possibly relieving the need to call
reversed
…May 23rd, 2008 at 21:01:44
FYI if you’re coming from Google or whatever, this is in Python 2.6 and 3K as itertools.product (implemented in C).
And I tested the various implementations along with a couple others, and ecombinations was (surprisingly) the clear winner. It makes me want to eschew Python for Lisp… :-)