Dataframe Flashcards

Question 1

Q

DataFrameのデータ参照 (10)
(1) col1を取得する(配列)
(2) col2を取得する(ドット)
(3) col2が’a’ の行を.と[]で取得
(4) col2が’a’ の行を.で取得
(5) loc
(6) loc スライス
(7) locで配列アドレス指定で値取得
(8) ilocで2番目の列を取得する
(9) at
(10) iat

Answer

A

col1 col2

import pandas as pd
df = pd.DataFrame([[1, 10], [2, 20], [3, 30]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’, ‘c’])
# col1 col2
# a 1 10
# b 2 20
# c 3 30

(1) col1を取得する
df[‘col1’] # 1

# b 2
# c 3
# Name: col1, dtype: int64

(2) col2を取得する
df.col2 # 2

# b 20
# c 30
# Name: col2, dtype: int64

(3) col2が’a’ の行を.と[]で取得
df.col2[‘a’] # 3
# 10

(4) col2が’a’ の行を.で取得
df.col2.a # 4
# 10

(5) loc
df.loc[‘a’] # 5
# col1 1
# col2 10
# Name: 0, dtype: int64

(6) loc スライス
df.loc[1:2] # 6
# col1 col2
# 1 2 20
# 2 3 30

(7) locで配列アドレス指定で値取得
df.loc[‘a’,’col1’]
df.loc[‘a’][‘col1’]
# 1

(8) ilocで2番目の列を取得する
df.iloc[1] # 8
# col1 2
# col2 20
# Name: 1, dtype: int64

(9) at (単数の返り値しか得られない)
df.at[‘a’, ‘col1’]

(10) iat (単数の返り値しか得られない)
df.iat[1, 1]

Question 2

Q

DataFrameをリストでフィルタリング(1)

Answer

A

4行2列のDataFrameを生成する

df = pd.DataFrame([[‘A’, 10], [‘B’, 20], [‘C’, 30], [‘D’, 40]], columns=[‘col1’, ‘col2’])
# col1 col2
# 0 A 10
# 1 B 20
# 2 C 30
# 3 D 40

l = [True, False, True, False]

df[l] #1

# 0 A 10
# 2 C 30

Question 3

Q

whereによるフィルタリング (4)
(1) 添字でフィルタ条件指定(数値の大小判定)
(2) whereを使用
(3) NaN以外で値を埋める
(4) セル毎にNaN以外で値を埋める

Answer

A

col1 col2

(1) 添字でフィルタ条件指定(数値の大小判定)
df[df[‘col1’] > 2] # 1

# 2 3 30
# 3 4 40

(2) whereを使用
df.where(df[‘col1’] > 2) #2
# col1 col2
# 0 NaN NaN
# 1 NaN NaN
# 2 3.0 30.0
# 3 4.0 40.0

(3) NaN以外で値を埋める
df.where(df[‘col1’] > 2, 0) #3

# 0 0 0
# 1 0 0
# 2 3 30
# 3 4 40

(4) セル毎にNaN以外で値を埋める
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’])

pad = pd.DataFrame([[0, ‘-‘]] * len(df), columns=df.columns, index=df.index) #4a
# col1 col2
# 0 0 -
# 1 0 -
# 2 0 -
# 3 0 -

df.where(df[‘col1’] > 2, pad) #4b
# col1 col2
# 0 0 -
# 1 0 -
# 2 3 30
# 3 4 40

Question 4

Q

DataFrameのソート (1)

Answer

A

col1 col2 col3

df = pd.DataFrame([[3, 10, 200], [2, 30, 100], [4, 40, 300], [1, 20, 200]], columns=[‘col1’, ‘col2’, ‘col3’])

# 0 3 10 200
# 1 2 30 100
# 2 4 40 300
# 3 1 20 200

df.sort_values([‘col1’, ‘col2’], ascending=[True, False]) #1
# col1 col2 col3
# 3 1 20 200
# 1 2 30 100
# 0 3 10 200
# 2 4 40 300

Question 5

Q

DataFrameの更新系処理(行,列の追加,削除,更新)(7)
(1) DataFrameに新たな列を追加
(2) DataFrameの既存列を更新する
(3) 行を追加
(4) 単一行を削除
(5) 複数行を削除
(6) 列の削除
(7) 値を一つだけ指定して更新
[新規追加] (8) DataFrameの既存行を更新する

Answer

A

DataFrameとSeries生成

DataFrameとSeries生成
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’, ‘c’, ‘d’])
s = pd.Series([1, 1, 1, 1], index=[‘a’, ‘b’, ‘c’, ‘d’])

(1) DataFrameに新たな列を追加
df[‘new_col’] = s # 1
# col1 col2 new_col
# a 1 10 1
# b 2 20 1
# c 3 30 1
# d 4 40 1

(2) DataFrameの既存列を更新する
df[‘col1’] = s # 2
# col1 col2 new_col
# a 1 10 1
# b 1 20 1
# c 1 30 1
# d 1 40 1

(3) 行を追加
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’, ‘c’, ‘d’])
df2 = pd.DataFrame([[9, 99]], columns=[‘col1’, ‘col2’], index=[‘x’])
df.append(df2) # 3

# a 1 10
# b 2 20
# c 3 30
# d 4 40
# x 9 99

(4) 単一行を削除
df.drop(‘a’) # 4
# col1 col2
# b 2 20
# c 3 30
# d 4 40

(5) 複数行を削除
df.drop([‘a’, ‘b’]) # 5
# col1 col2
# c 3 30
# d 4 40

(6) 列の削除
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’, ‘c’, ‘d’])
df.drop(‘col1’, axis=1) # 6
col2
a 10
b 20
c 30
d 40

(7) 値を一つだけ指定して更新
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’, ‘c’, ‘d’])
df.at[‘a’, ‘col1’] = 999 # 7

# a 999 10
# b 2 20
# c 3 30
# d 4 40
#
(8) DataFrameの既存行を更新する
df.loc[“a”] = [1,1,1,1]

Question 6

Q

DataFrameのループ処理 (1)

Answer

A

import pandas as pd
df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’])

for index, row in df.iterrows(): # 1
print(row[‘col1’], row[‘col2’])

Question 7

Q

欠損値を除去（NaN） (2)

(1) Seriesで欠損値だけを除去する
(2) DataFrame で欠損値がある列を丸ごと除去する

Answer

A

Seriesで欠損値だけを除去する

import pandas as pd
import math

s = pd.Series([2, 5, 8, None])
# 0 2.0
# 1 5.0
# 2 8.0
# 3 NaN
# dtype: float64

s.dropna() # 1
# 0 2.0
# 1 5.0
# 2 8.0
# dtype: float64

DataFrame で欠損値がある列を丸ごと除去する

df = pd.DataFrame([[1, 10], [None, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’])
df.dropna(axis=1) # 2
col2
0 10
1 20
2 30
3 40

Question 8

Q

column(列名) index(行名)の変更 (3)
(1) 列名を変更する
(2) 行名を変更する
(3) 破壊的に変更する

Answer

A

DataFrameを生成する

import pandas as pd

df = pd.DataFrame([[1, 10], [2, 20]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’])
print(df)
# col1 col2
# a 1 10
# b 2 20

(1) 列名を変更する
new_df = df.rename(columns={‘col1’: ‘new1’, ‘col2’: ‘new2’}) # 1
print(new_df)
# new1 new2
# a 1 10
# b 2 20

(2) 行名を変更する
new_df = df.rename(index={‘a’: ‘new1’, ‘b’: ‘new2’}) # 2
print(new_df)
# col1 col2
# new1 1 10
# new2 2 20

(3) 破壊的に変更する
df.rename(columns={‘col1’: ‘new1’, ‘col2’: ‘new2’}, inplace=True) # 3
print(df)
# new1 new2
# a 1 10
# b 2 20

Question 9

Q

DataFrameをgroupbyで集計する(3)
(1) categoryごとの合計を算出する
(2) category, tagごとの件数を算出する
(3) tagごとのばらつきを算出する

Answer

A

category tag value

df = pd.DataFrame([[‘cate1’, ‘tag1’, 150],[‘cate1’, ‘tag2’, 210],[‘cate2’, ‘tag2’, 80],[‘cate2’, ‘tag1’, 310], ], columns=[‘category’, ‘tag’, ‘value’])

# 1 cate1 tag2 210
# 2 cate2 tag2 80
# 3 cate2 tag1 310

df.groupby(‘category’).sum() # 1

# category
# cate1 360
# cate2 390

category、tagごとの件数を算出する
df.groupby([‘category’, ‘tag’]).count() # 2
# value
# category tag
# cate1 tag1 1
# tag2 1
# cate2 tag1 1
# tag2 1

df.groupby([‘tag’]).std() # 3

Question 10

Q

DataFrameの値を置換する (1)

Answer

A

name price

import pandas as pd
df = pd.DataFrame([[‘apple’, 10], [‘oranggg’, 20], [‘banana’, 30]], columns=[‘name’, ‘price’])

df.replace(‘oranggg’, ‘orange’) # 1

# 0 apple 10
# 1 orange 20
# 2 banana 30

Question 11

Q

ピボットテーブル(categoryとtagと値

) (1)

Answer

A

categoryとtagと値

import pandas as pd
df = pd.DataFrame([[‘cate1’, ‘tag1’, 4], [‘cate2’, ‘tag1’, 10], [‘cate1’, ‘tag2’, 5], [‘cate3’, ‘tag3’, 5], [‘cate2’, ‘tag3’, 5]], columns=[‘category’, ‘tag’, ‘value’])

# 0 cate1 tag1 4
# 1 cate2 tag1 10
# 2 cate1 tag2 5
# 3 cate3 tag3 5
# 4 cate2 tag3 5

df.pivot_table(index=[‘category’], columns=[‘tag’], values=’value’, fill_value=0, aggfunc=lambda x: sum(x)) # 1

# category
# cate1 4 5 0
# cate2 10 0 5
# cate3 0 0 5
#

Question 12

Q

DataFrameの行列を入れ替える(1)

Answer

A

df = pd.DataFrame([[1, 10], [2, 20], [3, 30], [4, 40]], columns=[‘col1’, ‘col2’], index=[‘a’, ‘b’, ‘c’, ‘d’])
df.T # 1
# a b c d
# col1 1 2 3 4
# col2 10 20 30 40

Question 13

Q

(1) 行数（レコード数）
(2) 要素数(列 x 行)
(3) 平均
(4) 標準偏差
(5) 最大値
(6) 最小値
(7) 分散
(8) ランダムサンプリング
(9) まとめて取得

Answer

A

(1) 行数（レコード数）
len(df)

(2) 要素数(列 x 行)
df.size

(3) 平均
df.mean()

(4) 標準偏差
df.std()

(5) 最大値
df.max()

(6) 最小値
df.min()

(7) 分散
df.var()

(8) ランダムサンプリング
df.sample()

(9) まとめて取得
df.describe()
# col1 col2
# count 3.0 3.0
# mean 2.0 20.0
# std 1.0 10.0
# min 1.0 10.0
# 25% 1.5 15.0
# 50% 2.0 20.0
# 75% 2.5 25.0
# max 3.0 30.0

Question 14

Q

loc, iloc, at, iatを次に分類しなさい

絶対座標指定:
ラベル名指定:
複数要素指定:
スライス表記:

Answer

A

絶対座標指定: iloc, iat
ラベル名指定: loc, at
複数要素指定: loc, iloc
スライス表記: loc, iloc

https://deepage.net/features/pandas-location.html

Question 15

Q

データフレーム中の;

1) データ型を取得しなさい
2) データ方を変換しなさい(キャスティング)

Answer

A

col1 int64

1) データ型を取得しなさい
df.dtypes

col1 int64
col2 object

2) データ方を変換しなさい(キャスティング)
df = df.astype({‘acc_no’: ‘str’})

Dataframe Flashcards

(15 cards)