Tuesday, August 27, 2013

How to select only specific columns from a DataFrame with MultiIndex columns?

How to select only specific columns from a DataFrame with MultiIndex columns?

I have DataFrame with MultiIndex columns that looks like this:
# sample data
col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'],
['a', 'b', 'c', 'a', 'b', 'c']])
data = pd.DataFrame(np.random.randn(4, 6), columns=col)
data

What is the proper, simple way of selecting only specific columns (e.g.
['a', 'c'], not a range) from the second level?
Currently I am doing it like this:
import itertools
tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])]
new_index = pd.MultiIndex.from_tuples(tuples)
print(new_index)
data.reindex_axis(new_index, axis=1)

It doesn't feel like a good solution, however, because I have to bust out
itertools, build another MultiIndex by hand and then reindex (and my
actual code is even messier, since the column lists aren't so simple to
fetch). I am pretty sure there has to be some ix or xs way of doing this,
but everything I tried resulted in errors.

No comments:

Post a Comment