pandas 速查表

    科技2025-03-02  6

    pandas 速查表

    I’ve worked with SQL for several years before I started using Pandas in Python. I eventually get used to it but I struggled syntax difference between them when I started.

    在开始在Python中使用Pandas之前,我已经使用SQL多年。 我最终习惯了,但是当我开始时,我在它们之间的语法差异上苦苦挣扎。

    I want to gather syntax comparison in this article, also as cheat sheet. I’m going to mention to the basic syntax but if there is another example that doesn’t exist in this article, please let me know!

    我想在本文中(也作为备忘单)收集语法比较。 我将提到基本语法,但是如果本文中不存在另一个示例,请告诉我!

    数据集 (Dataset)

    I’m going to use this dataset.

    我将使用此数据集。

    import pandas as pddf_list = [ ["Lamar Jackson", 23, "Ravens", "QB"] , ["Russell Wilson", 31, "Seahawks", "QB"] , ["Aaron Donald", 29, "Rams", "DT"] , ["Patrick Mahomes", 24, "Chiefs", "QB"] , ["Michael Thomas", 27, "Saints", "WR"] , ["Christian McCaffrey", 24, "Panthers", "RB"] , ["George Kittle", 26, "49ers", "TE"] , ["DeAndre Hopkins", 28, "Cardinals", "WR"] , ["Stephon Gilmore", 29, "Patriots", "CB"] , ["Derrick Henry", 26, "Titans", "RB"]]df_columns = ["Name", "Age", "Team", "Position"]players = pd.DataFrame(data=df_list, columns=df_columns) players 玩家

    选择(SELECT)

    Extract all:

    提取所有:

    SQL: SELECT * FROM playersPandas: players

    Projection of some columns:

    一些列的投影:

    SQL: SELECT Name, Age FROM playersPandas: players[["Name", "Age"]]

    Selection of top some rows:

    选择顶部几行:

    SQL: SELECT TOP 5 * FROM playersPandas: players.head(5)

    DISTINCT:

    不同:

    SQL: SELECT DISTINCT Position FROM playersPandas: players["Position"][~players["Position"].duplicated()]

    哪里 (WHERE)

    Selection of specific rows with condition:

    选择具有条件的特定行:

    SQL: SELECT * FROM players WHERE Position = 'QB'Pandas: players[players.Position == "QB"]also: players[players["Position"] == "QB"]

    Selection of specific rows & projection of some columns

    特定行的选择和某些列的投影

    SQL: SELECT Name, Age FROM players WHERE Position = 'QB'Pandas: players[["Name", "Age"]][players.Position == "QB"]

    AND / OR:

    AND / OR:

    SQL: SELECT Name, Age FROM players WHERE Position = 'QB' AND Age < 30Pandas: players[["Name", "Age"]][(players.Position == "QB") & (players.Age < 30)] SQL: SELECT Name, Age FROM players WHERE Position = 'QB' OR Position = 'RB'Pandas: players[["Name", "Age"]][(players.Position == "QB") |(players.Position == "RB")]

    IN:

    在:

    SQL: SELECT * FROM players WHERE Position IN ('QB', 'RB')Pandas: players[players.Position.isin(["QB", "RB"])]

    LIKE:

    喜欢:

    SQL: SELECT * FROM players WHERE Team LIKE '%ns%'Pandas: players[players.Team.str.contains("ns")]

    订购 (ORDER BY)

    Order by 1 column:

    按1列排序:

    SQL: SELECT * FROM players ORDER BY NamePandas: players.sort_values("Name")

    Order by 2+ columns:

    按2+列排序:

    SQL: SELECT * FROM players ORDER BY Age, NamePandas: players.sort_values(["Age", "Name"])

    DESC:

    DESC:

    SQL: SELECT * FROM players ORDER BY Age DESC, NamePandas: players.sort_values(["Age", "Name"], ascending=[False, True])

    通过...分组 (GROUP BY)

    COUNT:

    计数:

    SQL: SELECT COUNT(*) AS COUNT FROM players GROUP BY PositionPandas: players.groupby("Position").agg({"Name": "count"}).rename(columns={"Name": "COUNT"})

    SUM:

    和:

    SQL: SELECT SUM(Age) AS Age FROM players GROUP BY PositionPandas: players.groupby("Position").agg({"Age": "sum"})

    AVG:

    AVG:

    SQL: SELECT AVG(Age) AS Age FROM players GROUP BY PositionPandas: players.groupby("Position").agg({"Age": "mean"})

    RANK:

    秩:

    SQL: SELECT *, RANK() OVER (ORDER BY Age DESC) FROM playersPandas: players["AgeRank"] = players.Age.rank(ascending=False)players

    RANK + PARTITION BY

    排名+分区依据

    SQL: SELECT *, RANK() OVER (PARTITION BY Position ORDER BY Age DESC) FROM playersPandas: players["AgeRank"] = players.groupby("Position")["Age"].rank(ascending=False)players

    DENSE:

    稠密:

    SQL: SELECT *, DENSE_RANK() OVER (ORDER BY Age DESC) FROM playersPandas: players["AgeRank"] = players.Age.rank(method="dense", ascending=False)players

    更新 (UPDATE)

    SQL: UPDATE players SET Age = 0 WHERE Age = 24Pandas: players.loc[players.Age == 24, "Age"] = 0

    加入 (JOIN)

    Please see my another article!

    请看我的另一篇文章!

    翻译自: https://medium.com/analytics-vidhya/sql-v-pandas-basic-syntax-comparison-cheat-sheet-498289372d45

    pandas 速查表

    Processed: 0.016, SQL: 8