Pregunta

I have lots of data in PostgreSQL. But I need to do some pivot tables like it does SPSS. For example i have table with cities and states.

 create table cities
(
    city integer,
    state integer
);
insert into cities(city,state) values (1,1);
insert into cities(city,state) values (2,2);
insert into cities(city,state) values (3,1);
insert into cities(city,state) values (4,1);

Actually in this table i have 4 cities and 2 states. I want to do pivot table with percentage like

city\state |state-1| state-2|
city1      |33%    |0%      |
city2      |0%     |100%    |
city3      |33%    |0%      |
city4      |33%    |0%      |
totalCount |3      |1       |

I understant how do to this in this particulary case with sql. But all i want is to cross one variable by another (just count distinct values and devide it by "count(*) where variable_in_column_names=1 and so on) using some stored function. Im looking now at plpython.Some my questions are:

  1. How to output set of records with not having temporary table with shape that fits number and type of output columns.
  2. Maybe there is working solutions?

As i can see, input will be table name, column name of first variable, column name of second variable. Doing lots of queries in function's body (count(*),loop thru every distinct value in variables and count it and so on) and then return a table with percentage.

  1. Actually i dont have a lot of rows in one query(about 10k), and may be the best way to do such things in raw python,not plpython?
¿Fue útil?

Solución

You might want to give pandas a try, which is an excellent python data analysis library.

To query a PostgreSQL database:

import psycopg2
import pandas as pd
from pandas.io.sql import frame_query

conn_string = "host='localhost' dbname='mydb' user='postgres' password='password'"
conn = psycopg2.connect(conn_string)
df = frame_query('select * from cities', con=conn)

Where df is a DataFrame like:

    city    state
0    1   1
1    2   2
2    3   1
3    4   1

You can then create a pivot table using pivot_table and dividing by the total to get percentages:

totals = df.groupby('state').size()
pivot = pd.pivot_table(df, rows='city', cols='state', aggfunc=len, fill_value=0) / totals

Giving you the result:

state   1   2
city        
1    0.333333   0
2    0          1
3    0.333333   0
4    0.333333   0

Finally to get the layout you want, you just need to rename the index and columns, and append the totals:

totals_frame = pd.DataFrame(totals).T
totals_frame.index = ['totalCount']

pivot.index = ['city%i' % item for item in pivot.index]
final_result = pivot.append(totals_frame)
final_result.columns  = ['state-%i' % item for item in final_result.columns]

Giving you:

            state-1     state-2
city1       0.333333    0
city2       0.000000    1
city3       0.333333    0
city4       0.333333    0
totalCount  3.000000    1

Otros consejos

Check out PostgreSQL window functions. Might give you a non-(pl)python solution. http://blog.hashrocket.com/posts/sql-window-functions

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top