Python Pandas Textブロックデータフレーム混合タイプ

https://stackoverflow.com//questions/20028560

21-12-2019
|

質問

私はPythonとPandas Newbieです。私は列にデータが配置されたテキストブロックを持っています。最初の6列のデータは整数であり、残りは浮動小数点です。連結できる2つのデータフレームを作成しようとしました：

sect1 = DataFrame(dtype=int)
sect2 = DataFrame(dtype=float)
i = 0
# The first 26 lines are header text
for line in txt[26:]:
    colmns = line.split()
    sect1[i] = colmns[:6]  # Columns with integers
    sect2[i] = colmns[6:]  # Columns with floating point
    i +=

これはAssertionErrorを引き起こします。値の長さが索引

の長さと一致しません。

ここに2行のデータ

2013 11 15  0000   56611      0   1.36e+01  3.52e-01  7.89e-02  4.33e-02  3.42e-02  1.76e-02  2.89e+04  5.72e+02 -1.00e+05
2013 11 15  0005   56611    300   1.08e+01  5.50e-01  2.35e-01  4.27e-02  3.35e-02  1.70e-02  3.00e+04  5.50e+02 -1.00e+05

助けを借りてありがとう。

解決

Pandas csvパーサー stringio 。 Pandasのマニュアルの例。

サンプルは次のようになります。

>>> import pandas as pd
>>> from StringIO import StringIO
>>> data = """2013 11 15  0000   56611      0   1.36e+01  3.52e-01  7.89e-02  4.33e-02  3.42e-02  1.76e-02  2.89e+04  5.72e+02 -1.00e+05
... 2013 11 15  0005   56611    300   1.08e+01  5.50e-01  2.35e-01  4.27e-02  3.35e-02  1.70e-02  3.00e+04  5.50e+02 -1.00e+05"""

データの負荷データ

>>> df = pd.read_csv(StringIO(data), sep=r'\s+', header=None)

最初の3行をDateTimeに変換する（オプション）

>>> df[0] = df.iloc[:,:3].apply(lambda x:'{}.{}.{}'.format(*x), axis=1).apply(pd.to_datetime)
>>> del df[1]
>>> del df[2]
>>> df
                   0   3      4    5     6      7       8       9       10  \
0 2013-11-15 00:00:00   0  56611    0  13.6  0.352  0.0789  0.0433  0.0342
1 2013-11-15 00:00:00   5  56611  300  10.8  0.550  0.2350  0.0427  0.0335

       11     12   13      14
0  0.0176  28900  572 -100000
1  0.0170  30000  550 -100000

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow