Pandas具有功能全面的高性能內(nèi)存中連接操作,與SQL等關(guān)系數(shù)據(jù)庫非常相似。
Pandas提供了一個單獨的merge()函數(shù),作為DataFrame對象之間所有標(biāo)準(zhǔn)數(shù)據(jù)庫連接操作的入口 -
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, sort=True)
在這里,有以下幾個參數(shù)可以使用 -
True,則使用左側(cè)DataFrame中的索引(行標(biāo)簽)作為其連接鍵。 在具有MultiIndex(分層)的DataFrame的情況下,級別的數(shù)量必須與來自右DataFrame的連接鍵的數(shù)量相匹配。True,設(shè)置為False時,在很多情況下大大提高性能。現(xiàn)在創(chuàng)建兩個不同的DataFrame并對其執(zhí)行合并操作。
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print (left)
print("========================================")
print (right)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name id subject_id
0 Alex 1 sub1
1 Amy 2 sub2
2 Allen 3 sub4
3 Alice 4 sub6
4 Ayoung 5 sub5
========================================
Name id subject_id
0 Billy 1 sub2
1 Brian 2 sub4
2 Bran 3 sub3
3 Bryce 4 sub6
4 Betty 5 sub5
在一個鍵上合并兩個數(shù)據(jù)幀
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left,right,on='id')
print(rs)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name_x id subject_id_x Name_y subject_id_y
0 Alex 1 sub1 Billy sub2
1 Amy 2 sub2 Brian sub4
2 Allen 3 sub4 Bran sub3
3 Alice 4 sub6 Bryce sub6
4 Ayoung 5 sub5 Betty sub5
合并多個鍵上的兩個數(shù)據(jù)框
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left,right,on=['id','subject_id'])
print(rs)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name_x id subject_id Name_y
0 Alice 4 sub6 Bryce
1 Ayoung 5 sub5 Betty
如何合并參數(shù)指定如何確定哪些鍵將被包含在結(jié)果表中。如果組合鍵沒有出現(xiàn)在左側(cè)或右側(cè)表中,則連接表中的值將為NA。
這里是how選項和SQL等效名稱的總結(jié) -
| 合并方法 | SQL等效 | 描述 |
|---|---|---|
left |
LEFT OUTER JOIN |
使用左側(cè)對象的鍵 |
right |
RIGHT OUTER JOIN |
使用右側(cè)對象的鍵 |
outer |
FULL OUTER JOIN |
使用鍵的聯(lián)合 |
inner |
INNER JOIN |
使用鍵的交集 |
Left Join示例
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left, right, on='subject_id', how='left')
print (rs)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name_x id_x subject_id Name_y id_y
0 Alex 1 sub1 NaN NaN
1 Amy 2 sub2 Billy 1.0
2 Allen 3 sub4 Brian 2.0
3 Alice 4 sub6 Bryce 4.0
4 Ayoung 5 sub5 Betty 5.0
Right Join示例
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left, right, on='subject_id', how='right')
print (rs)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name_x id_x subject_id Name_y id_y
0 Amy 2.0 sub2 Billy 1
1 Allen 3.0 sub4 Brian 2
2 Alice 4.0 sub6 Bryce 4
3 Ayoung 5.0 sub5 Betty 5
4 NaN NaN sub3 Bran 3
Outer Join示例
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left, right, how='outer', on='subject_id')
print (rs)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name_x id_x subject_id Name_y id_y
0 Alex 1.0 sub1 NaN NaN
1 Amy 2.0 sub2 Billy 1.0
2 Allen 3.0 sub4 Brian 2.0
3 Alice 4.0 sub6 Bryce 4.0
4 Ayoung 5.0 sub5 Betty 5.0
5 NaN NaN sub3 Bran 3.0
Inner Join示例
連接將在索引上進行。連接(Join)操作將授予它所調(diào)用的對象。所以,a.join(b)不等于b.join(a)。
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left, right, on='subject_id', how='inner')
print (rs)
執(zhí)行上面示例代碼,得到以下結(jié)果 -
Name_x id_x subject_id Name_y id_y
0 Amy 2 sub2 Billy 1
1 Allen 3 sub4 Brian 2
2 Alice 4 sub6 Bryce 4
3 Ayoung 5 sub5 Betty 5