Generating all combinations from an arbitrary number of sequences

2007-10-22 23:50:40 UTC

In Python 2.5:

```import itertools

def combinations(*seqs):
def base_combinations(so_far, seq):
for x in seq:
yield so_far + [x]
def nested_combinations(so_far, seqs):
if len(seqs) == 1:
return base_combinations(so_far, seqs[0])
else:
iterators = (nested_combinations(so_far + [x], seqs[1:]) for x in seqs[0])
return itertools.chain(*iterators)
return nested_combinations([], seqs)```

Usage:

```for x, y, z in combinations(xs, ys, zs):
# Perform magic```

If you have a better way, I’d like to read it.

(Thanks to Colin for inspiring me to write it, and renaming one of the nested functions.)

27 Responses to “Generating all combinations from an arbitrary number of sequences”

1. Colin Barrett Says:

To be clear, the original idea was to replace this:

```for x in xs:
for y in ys:
for z in zs:
# do magic```

with something like this:

```for x, y, z in something(xs, ys, zs):
# do magic```

I’ll bet someone proficient in functional python would have no trouble writing something like this :)

2. Matt Says:
```>>> combs = [(x, y, z) for x in [1,2,3] for y in [4,5,6] for z in [7,8,9]]
>>> len(combs)
27```

or

```### note parens instead of braces -- it's a generator
>>> combs = ((x, y, z) for x in [1,2,3] for y in [4,5,6] for z in [7,8,9])
>>> combs.next()
(1, 4, 7)
>>> combs.next()
(1, 4, 8)
>>> combs.next()
(1, 4, 9)
...```

Looks pretty straighforward to me :-). (I’m sure there’s an even easier one floating around somewhere…)

3. Peter Hosey Says:

Matt: Yes, that works as long as you know how many sequences to generate combinations from. The problem is to create a generic solution that works with any number of sequences.

4. Colin Barrett Says:

Furthermore, it doesn’t allow me to write a loop body.

The ultimate goal is less indentation for nested for loops.

5. Arnar Birgisson Says:

@Colin: Yes, you can write a loop body:

```for (x,y,z) in ((x, y, z) for x in [1,2,3] for y in [4,5,6] for z in [7,8,9]):
do_your_thing_with(x,y,z)```

As for the original problem, a typical functional solution would use recursion:

```def combinations(*seqs):
if len(seqs) > 1:
return [[i] + r for i in seqs[0] for r in combinations(*seqs[1:])]
else:
return [[i] for i in seqs[0]]```

it assumes seqs of mininum length 1.

A generator version of the same:

```def icombinations(*seqs):
if len(seqs) > 1:
for i in seqs[0]:
for rest in combinations(*seqs[1:]):
yield [i] + rest
else:
for i in seqs[0]: yield [i]```
6. Colin Barrett Says:

Hmm, I was doing it wrong before.

The first syntax is incredibly verbose. x y and z are specified three times each. I like the generator version — something like this would be great in itertools, I think.

7. Bill Mill Says:

please note that Arnar’s recursive combination function is extremely slow if you need to make lots of combinations, and will rapidly blow the stack. Check out probstat if you need more, faster, combinations without much more hassle.

8. Arnar Birgisson Says:

Let me make a little optimization:

```def icombinations(*seqs):
if len(seqs) > 1:
for rest in combinations(*seqs[1:]):
for i in seqs[0]:
yield [i] + rest
else:
for i in seqs[0]: yield [i]```

I switched the for loops, calling `combinations(*seqs[1:])` is presumably much more expensive than accessing an element in `seqs[0]` – so it belongs in the outer loop.

9. Arnar Birgisson Says:

True, it’s not exactly blindingly fast. However, the stack only grows with the length of seqs, not the length of each sequence. The number of sequences is likely to be fairly low in most circumstances.

Running

```g = icombinations(range(1000), range(1000), range(1000))
g.next()
g.next()```

is no trouble. In fact, neither is:

```g = icombinations(*[range(1000) for i in range(200)])
g.next()```

The likely bottleneck is thus the loop body, i.e. what you are actually doing with the combinations.

Oh, and of course there’s a typo in my last post, the inner call should be to icombinations:

```def icombinations(*seqs):
if len(seqs) > 1:
for i in seqs[0]:
for rest in icombinations(*seqs[1:]):
for i in seqs[0]:
yield [i] + rest
else:
for i in seqs[0]: yield [i]```
10. Jesper Says:

And, since I have nothing productive at all to add to this as this time, I’ll note (for Google, if nothing else) that the output of this method is what’s called a “Cartesian Product”.

11. Brian Says:

Arnar,
Another very easy optimisation you can do is to generate tuples instead of lists.
Changing “`yield [i] + rest`” to “`yield (i,) + rest`” and “`yield [i]`” to “`yield (i,)`” has a dramatic effect (halving the time for me), since most time is spent generating these single element leaves.

12. Bill Mill Says:

Knuth calls it “generating all n-tuples” in pre-fascicle 2a.

13. Arnar Birgisson Says:

Brian, that’s nice. I would have used tuples if I were to use it in my code – but left it as lists here to match the interface in the OP. Didn’t realize the speed difference though, but that certainly makes sense.

14. Carl Says:

Canonical version?

```def icombinations(*seqs):
if len(seqs) == 1:
for i in seqs[0]: yield (i, )
else:
for rest in icombinations(*seqs[1:]):
for i in seqs[0]:
yield (i, ) + rest```
15. Carl Says:

Actually, the previous version goes through items in the wrong order:

```>>> print \'\\n\'.join(repr(i) for i in icombinations(\"12\", \"ab\", \"yz\")) (\'1\', \'a\', \'y\') (\'2\', \'a\', \'y\') (\'1\', \'b\', \'y\') (\'2\', \'b\', \'y\') (\'1\', \'a\', \'z\') (\'2\', \'a\', \'z\') (\'1\', \'b\', \'z\') (\'2\', \'b\', \'z\')```

What we actually want is this:

```def icombinations(*seqs): if len(seqs) == 1: for i in seqs[0]: yield (i, ) else: for rest in icombinations(*seqs[:-1]): for i in seqs[-1]: yield rest + (i, )```

In order to get this output:

```>>> print \'\\n\'.join(repr(i) for i in icombinations(\"12\", \"ab\", \"yz\")) (\'1\', \'a\', \'y\') (\'1\', \'a\', \'z\') (\'1\', \'b\', \'y\') (\'1\', \'b\', \'z\') (\'2\', \'a\', \'y\') (\'2\', \'a\', \'z\') (\'2\', \'b\', \'y\') (\'2\', \'b\', \'z\')```

16. Carl Says:

(hey, my previous comment got double-escaped since I ignored the CAPTCHA the first time. feel free to delete it, and this parenthetical remark.)

Actually, the previous version goes through items in the wrong order:

```>>> print '\n'.join(repr(i) for i in icombinations("12", "ab", "yz")) ('1', 'a', 'y') ('2', 'a', 'y') ('1', 'b', 'y') ('2', 'b', 'y') ('1', 'a', 'z') ('2', 'a', 'z') ('1', 'b', 'z') ('2', 'b', 'z')```

What we actually want is this:

```def icombinations(*seqs): if len(seqs) == 1: for i in seqs[0]: yield (i, ) else: for rest in icombinations(*seqs[:-1]): for i in seqs[-1]: yield rest + (i, )```

In order to get this output:

```>>> print '\n'.join(repr(i) for i in icombinations("12", "ab", "yz")) ('1', 'a', 'y') ('1', 'a', 'z') ('1', 'b', 'y') ('1', 'b', 'z') ('2', 'a', 'y') ('2', 'a', 'z') ('2', 'b', 'y') ('2', 'b', 'z')```

17. Carl Says:

(This is fun to mess around with.) The Lisp way to do it:

```def ecombinations(*args):
n = len(args)
defun = "def inner(args):"
for_loops = "\n".join("%sfor v%s in args[%s]:" % ("    "*(i+1), i, i)
for i in range(n))
yield_line = "v%s, " * n % tuple(range(n))
yield_line = "%syield (%s)" % ("    "*(n+1), yield_line)
exec '\n'.join([defun, for_loops, yield_line])
return inner(args)```
18. Gary Godfrey Says:

I needed something like this for work today. Here’s my version – no recursion. I’ve not profiled it.

```from itertools import *

c = 1
def cumul(n):
global c
r = c
c *= len(n)
return r

def gcombs(*s):
tary = [ cycle(chain(*[ repeat(nn, cumval) for nn in news])) for (news,cumval) in [ (ns, cumul(ns)) for ns in s]]
tary.append(xrange(0,c))
t = izip( *tary)
for v in t:
yield v[:-1]

print list(gcombs([1,2],[4,5,6],[7,8,9]))```
19. Peter Hosey Says:

Gary: Looks cool, but Colin complains that the order in which combinations are yielded does not match the original nested `for` loops.

Any chance we could get a version with clearer variable names? ☺

20. Gary Godfrey Says:

I suspect I could add a reverse in there to make it match the for loops. To be honest, I didn’t use this for work – it’s far too difficult to read. It just occurred to me when I was thinking about the problem and I thought it was cute.

I finally ran some profiling. It seems to be about 16x slower than the recursive solution. I don’t think it’s worth updating the variable names :-).

Gary Godfrey
Austin, TX, USA

21. Carl Says:

Non-scientific tests:

```>>> c=1
>>> t = time(); list(gcombs(*(range(2) for i in range(18)))) and time() - t
0.87823700904846191
>>> c=1
>>> t = time(); list(gcombs(*(range(2) for i in range(18)))) and time() - t
0.67177414894104004
>>> t = time(); list(icombinations(*(range(2) for i in range(18)))) and time()
- t
0.68789410591125488
>>> t = time(); list(icombinations(*(range(2) for i in range(18)))) and time()
- t
0.72254085540771484
>>> t = time(); list(ecombinations(*(range(2) for i in range(18)))) and time()
- t
0.58095097541809082
>>> t = time(); list(ecombinations(*(range(2) for i in range(18)))) and time()
- t
0.59889411926269531```

I wouldn’t take these numbers seriously though. In some of my tests, icombinations beat ecombinations.

22. Arnar Birgisson Says:

It’s interesting to see some timings. We basically have two dimensions that can vary, the number of sequences and the lengths of the sequences. Don’t have time right now to do it myself, but I propose the following test cases:

```(range(2) for i in range(10))
(range(2) for i in range(100))
(range(2) for i in range(1000))
(range(2) for i in range(10000)) # perhaps to much?
(range(10) for i in range(10))
(range(100) for i in range(10))
(range(10000) for i in range(2))
(range(i) for i in range(10))
(range(i) for i in range(100))```

etc.

Arnar

23. Arnar Birgisson Says:

On a slightly related note, here’s how to generate the powerset (all possible subsets) of some set:

```In [103]:
def powerset(S):
for b in range(1 << len(S)):
yield set(e for (i,e) in enumerate(S) if (1<<i)&b)

In [104]: myset = set(range(1,5))

In [105]: list(powerset(myset))
Out[105]:
[set([]),
set([1]),
set([2]),
set([1, 2]),
set([3]),
set([1, 3]),
set([2, 3]),
set([1, 2, 3]),
set([4]),
set([1, 4]),
set([2, 4]),
set([1, 2, 4]),
set([3, 4]),
set([1, 3, 4]),
set([2, 3, 4]),
set([1, 2, 3, 4])]```

(Problem suggested by Árni Már)

24. Wensheng Wang Says:
```def combinations(*list):
r=[[]]
for x in list:
r = [i+[y] for y in x for i in r]
return r

print combinations([1,2,3],[4,5,6])
print
print combinations([1,2],[3,4],[5,6])
print
print combinations([1,2,3],[4,5],[6],[7,8,9,10])```
25. Colin Barrett Says:

This actually makes it work the way I want it to:

```def combinations(*list):
r=[[]]
for x in reversed(list):
r = [[y]+i for y in x for i in r]
return r```

Adding `print r` inside the loop reveals how it works:

```>>> combinations([1,2],[3,4],[5,6]) and 'Done'
[[5], [6]]
[[3, 5], [3, 6], [4, 5], [4, 6]]
[[1, 3, 5], [1, 3, 6], [1, 4, 5], [1, 4, 6], [2, 3, 5], [2, 3, 6], [2, 4, 5], [2, 4, 6]]
'Done'```
26. Peter Hosey Says:

Colin: I wonder whether the list comprehension could be turned into a generator comprehension somehow, possibly relieving the need to call `reversed`

27. Brandon Mintern Says:

FYI if you’re coming from Google or whatever, this is in Python 2.6 and 3K as itertools.product (implemented in C).

And I tested the various implementations along with a couple others, and ecombinations was (surprisingly) the clear winner. It makes me want to eschew Python for Lisp… :-)

Do not delete the second sentence.