使用拉格朗日插值法填充缺失值,报错‘Passing list-likes to .loc or [] with any missing labels is no longer supported

    科技2025-09-12  78

    学习拉格朗日插值法时,曾经参考第一篇文章https://blog.csdn.net/playgoon2/article/details/77051285做了一段算法代码,

    import pandas as pd import matplotlib.pyplot as plt from scipy.interpolate import lagrange def polyinterp(data,k=5): df1=data.copy() print("原始数据(含缺失值):",'\n',data) import pandas as pd import matplotlib.pyplot as plt from scipy.interpolate import lagrange def polyinterp(data,k=5): df1=data.copy() print("原始数据(含缺失值):",'\n',data) for j in range(data.shape[1]): for i in range(len(df1)): if np.isnan(df1.iloc[i,j]): list1=list(range(i-k,i))+list(range(i+1,i+1+k)) #取数索引范围,向插值前取k个,向后取k个 list0=[i for i in list1 if i <max(df1.index)]#去掉超过范围的索引 interdf = df1.iloc[list0, j]#取数 interdf = interdf[interdf.notnull()]#去掉缺失值 list_x = list(interdf. index)#对应的x list_y = list(interdf. values)#对应的y f = lagrange(list_x, list_y)#根据取点,构造函数关系插值 df1.iloc[i,j] = f(i) print("副本插值后:",'\n',df1) return(df1) def chart_view(df01,df1): df1.rename(columns={'y': 'New y'}, inplace=True) df01['y'].plot(style='k--') df1['New y'].plot(alpha=0.5) plt.legend(loc='best') plt.show() if __name__=='__main__': x=np.linspace(0,10,11) y=x**3+10 data1=np.vstack((x,y)) df0=pd.DataFrame(data1.T,columns=['x','y']) print(df0) df01=df0.copy()#建立副本 df01.loc[2:3,"y"]=np.NaN#构造缺失值 df1=df01.copy() new_data=polyinterp(df1,5)#插值后 chart_view(df01,new_data)#插值前后绘图for j in range(data.shape[1]): for i in range(len(df1)): if np.isnan(df1.iloc[i,j]): list1=list(range(i-k,i))+list(range(i+1,i+1+k)) #取数索引范围,向插值前取k个,向后取k个 list0=[i for i in list1 if i <max(df1.index)]#去掉超过范围的索引 interdf = df1.iloc[list0, j]#取数 interdf = interdf[interdf.notnull()]#去掉缺失值 list_x = list(interdf. index)#对应的x list_y = list(interdf. values)#对应的y f = lagrange(list_x, list_y)#根据取点,构造函数关系插值 df1.iloc[i,j] = f(i) print("副本插值后:",'\n',df1) return(df1) def chart_view(df01,df1): df1.rename(columns={'y': 'New y'}, inplace=True) df01['y'].plot(style='k--') df1['New y'].plot(alpha=0.5) plt.legend(loc='best') plt.show() if __name__=='__main__': x=np.linspace(0,10,11) y=x**3+10 data1=np.vstack((x,y)) df0=pd.DataFrame(data1.T,columns=['x','y']) print(df0) df01=df0.copy()#建立副本 df01.loc[2:3,"y"]=np.NaN#构造缺失值 df1=df01.copy() new_data=polyinterp(df1,5)#插值后 chart_view(df01,new_data)#插值前后绘图

    输出结果:

    x y 0 0.0 10.0 1 1.0 11.0 2 2.0 18.0 3 3.0 37.0 4 4.0 74.0 5 5.0 135.0 6 6.0 226.0 7 7.0 353.0 8 8.0 522.0 9 9.0 739.0 10 10.0 1010.0 原始数据(含缺失值): x y 0 0.0 10.0 1 1.0 11.0 2 2.0 NaN 3 3.0 NaN 4 4.0 74.0 5 5.0 135.0 6 6.0 226.0 7 7.0 353.0 8 8.0 522.0 9 9.0 739.0 10 10.0 1010.0 副本插值后: x y 0 0.0 10.0 1 1.0 11.0 2 2.0 18.0 3 3.0 37.0 4 4.0 74.0 5 5.0 135.0 6 6.0 226.0 7 7.0 353.0 8 8.0 522.0 9 9.0 739.0 10 10.0 1010.0

    似乎一切没有什么问题,当我再看到另外一篇文章https://blog.csdn.net/shener_m/article/details/81706358想验证自己的代码时,却发现自己的算法结果与他人不同,于是进行算法改进。

    def polyinterp(data,k=5): df1=data.copy() print("原始数据(含缺失值):",'\n',data) for i in range(len(df1)): if (df1['y'].isnull())[i]: #取数索引范围,向插值前取k个,向后取k个 #index_=list(range(i-k, i)) + list(range(i+1, i+1+k))#Series索引不为负数 #list0=[j for j in index_ if j in df1['y'].sort_index()] #y= df1['y'][list0] y= df1['y'][list(range(i-k, i)) + list(range(i+1, i+1+k))] y = y[y.notnull()]#索引为负则为缺失值,去掉缺失值 f = lagrange(y.index, list(y)) df1.iloc[i,1] = f(i) print("副本插值后:",'\n',df1) return(df1)

    一开始是在python IDLE上进行代码测试并没有出现问题,但是把代码搬到Jupyter时,出现了自己不想看到的报错:提示前往官网查看https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike’,具体原因是pd.Series的索引方式出了问题,在jupyter上不支持旧版写法: 期间还发现一个致命问题,pd.Series的index是没有反向索引的(-1,-2……),是自己学艺不精o(╥﹏╥)o。原始数据(含缺失值): x y 0 0 NaN 1 1 NaN 2 2 NaN 3 3 459.0 4 4 NaN 5 5 456.0 6 6 NaN 7 7 448.0 8 8 450.0 9 9 442.0 10 10 450.0 11 11 421.0 12 12 421.0 13 13 421.0 14 14 500.0 15 15 500.0 16 16 500.0 17 17 500.0 18 18 492.0 19 19 492.0 20 20 473.0 21 21 469.0 22 22 469.0 23 23 NaN 24 24 NaN 25 25 453.0 26 26 NaN 27 27 152.0 28 28 70.0 29 29 54.0 30 30 30.0 31 31 23.0 32 32 23.0 33 33 26.0 34 34 149.0 35 35 226.0 36 36 143.0 37 37 317.0 38 38 NaN

    第一种算法结果:

    副本插值后:

    ​ x y

    0 0 459.000000 1 1 1155.943959 2 2 859.543272 3 3 459.000000 4 4 356.883931 5 5 456.000000 6 6 484.287458 7 7 448.000000 8 8 450.000000 9 9 442.000000 10 10 450.000000 11 11 421.000000 12 12 421.000000 13 13 421.000000 14 14 500.000000 15 15 500.000000 16 16 500.000000 17 17 500.000000 18 18 492.000000 19 19 492.000000 20 20 473.000000 21 21 469.000000 22 22 469.000000 23 23 471.343906 24 24 480.046886 25 25 453.000000 26 26 318.202383 27 27 152.000000 28 28 70.000000 29 29 54.000000 30 30 30.000000 31 31 23.000000 32 32 23.000000 33 33 26.000000 34 34 149.000000 35 35 226.000000 36 36 143.000000 37 37 317.000000 38 38 310.000000

    第二种算法结果:

    副本插值后: x y

    0 0 463.500000 1 1 462.000000 2 2 460.410714 3 3 459.000000 4 4 458.253401 5 5 456.000000 6 6 449.101130 7 7 448.000000 8 8 450.000000 9 9 442.000000 10 10 450.000000 11 11 421.000000 12 12 421.000000 13 13 421.000000 14 14 500.000000 15 15 500.000000 16 16 500.000000 17 17 500.000000 18 18 492.000000 19 19 492.000000 20 20 473.000000 21 21 469.000000 22 22 469.000000 23 23 471.343906 24 24 480.046886 25 25 453.000000 26 26 318.202383 27 27 152.000000 28 28 70.000000 29 29 54.000000 30 30 30.000000 31 31 23.000000 32 32 23.000000 33 33 26.000000 34 34 149.000000 35 35 226.000000 36 36 143.000000 37 37 317.000000 38 38 1696.000000

    每天与你分享一点儿学习经验,做过路过别忘了点赞,小女子在此谢过O(∩_∩)O

    参考文献:

    1.https://blog.csdn.net/playgoon2/article/details/77051285拉格朗日插值法在数据分析中的应用——Python插值scimpy,lagrange

    2.https://blog.csdn.net/shener_m/article/details/81706358拉格朗日插值法python实现与应用

    Processed: 0.015, SQL: 8