留存率预测（利用T值调整）

科技2022-08-18 104

留存率预测（利用T值调整）

本文为《R语言游戏数据分析与挖掘》学习笔记，仅作分享。由于由幂函数拟合得到的留存率曲线过于平滑，而现实数据往往会出现锯齿状，由幂函数预测得到的数据在后期会出现预测误差较大等情况，且后期预测值大于实际值。如图所示：

故可类比回归分析加权重系数的方法，添加T值对预测值进行调整。由经验可知：T值分布如下：

故可借鉴上文方法，进行预测，再将预测值乘以对应的T值，便可得到调整后的预测值。

①先利用上文方法对两类游戏进行留存率预测：程序如下：

actual <- read.csv("C:/Mathmodel/Ryuyan/GameR/实际留存数据.csv") # 对类型1的游戏进行留存率预测 type1 <- actual[1:7,2] day <- seq(1:7) fit1 <- nls(type1~a*day^b,start = list(a=1,b=1)) fit1 predicted1 <- round(predict(fit1,data.frame(day=seq(1,365))),3) # 对类型2的游戏进行留存率预测 type2 <- actual[1:7,3] day <- seq(1:7) fit2 <- nls(type2~a*day^b,start = list(a=1,b=1)) fit2 predicted2 <- round(predict(fit2,data.frame(day=seq(1,365))),3) # 将预测值与原始值合并在一起 result <- data.frame(actual,predicted1,predicted2) head(result)

结果如下：

#类型A > fit1 Nonlinear regression model model: type1 ~ a * day^b data: parent.frame() a b 0.3819 -0.5085 residual sum-of-squares: 2.76e-05 Number of iterations to convergence: 7 Achieved convergence tolerance: 9.236e-08 #类型B > fit2 Nonlinear regression model model: type2 ~ a * day^b data: parent.frame() a b 0.3132 -0.6577 residual sum-of-squares: 5.67e-06 Number of iterations to convergence: 6 Achieved convergence tolerance: 8.196e-07 #预测结果概况： > head(result) day Type1 Type2 predicted1 predicted2 1 1 0.383 0.313 0.382 0.313 2 2 0.268 0.200 0.268 0.199 3 3 0.216 0.151 0.218 0.152 4 4 0.187 0.125 0.189 0.126 5 5 0.167 0.108 0.168 0.109 6 6 0.156 0.097 0.154 0.096

②再导入T值数据：

# 导入T值 tvalue <- read.csv("C:/Mathmodel/Ryuyan/GameR/data/第8章/T值.csv") # 对type1类型的游戏计算调整预测值 result$adjust.predicted1 <- tvalue$T1*result$predicted1 # 对type2类型的游戏计算调整预测值 result$adjust.predicted2 <- tvalue$T2*result$predicted2 # 查看数据前六行 head(result)

结果如下：即可得到调整后的预测值

> head(result) day Type1 Type2 predicted1 predicted2 adjust.predicted1 adjust.predicted2 1 1 0.383 0.313 0.382 0.313 0.382 0.313 2 2 0.268 0.200 0.268 0.199 0.268 0.199 3 3 0.216 0.151 0.218 0.152 0.218 0.152 4 4 0.187 0.125 0.189 0.126 0.189 0.126 5 5 0.167 0.108 0.168 0.109 0.168 0.109 6 6 0.156 0.097 0.154 0.096 0.154 0.096

③再分别绘制对应的预测对比图，以观察两者差异：类型1:

# 绘制实际留存率曲线 plot(Type1~day,data=result,col="slateblue2",main="类型1的留存率曲线", type="l",xaxt="n",lty=1,lwd=2) # 绘制留存率预测曲线 lines(predicted1~day,dat=result,col="violetred3",type="l",lty=2,lwd=2) # 绘制留存率调整预测曲线 lines(adjust.predicted1~day,data=result,col="yellowgreen",type="l",lty=3,lwd=2) # 增加图例 legend("top",legend=colnames(result)[c(2,4,6)],lty = 1:3,ncol=3, col=c("slateblue2","violetred3","yellowgreen"),bty = "n")

类型2：

# 绘制type2的预测曲线 # 绘制实际留存率曲线 plot(Type2~day,data=result,col="slateblue2",main="类型2的留存率曲线", type="l",xaxt="n",lty=1,lwd=2) # 绘制留存率预测曲线 lines(predicted2~day,dat=result,col="violetred3",type="l",lty=2,lwd=2) # 绘制留存率调整预测曲线 lines(adjust.predicted2~day,data=result,col="yellowgreen",type="l",lty=3,lwd=2) # 增加图例 legend("top",legend=colnames(result)[c(3,5,7)],lty = 1:3,ncol = 3, col=c("slateblue2","violetred3","yellowgreen"),bty = "n")

可发现，预测曲线和实际曲线十分相接近。

红色指的是：未调整的预测曲线，偏高。

蓝色代表真实留存率，灰色是调整后预测留存率，两者几乎重合。

Processed: 0.060, SQL: 12