学习笔记:Maxent的示例运行及部分结果解释

    科技2025-10-03  6

    文件准备

    Samples文件

    .csv格式通常为三列,分别为“物种”、“经度”、“纬度”

    Tips: 默认情况下会删除重复项(同一网格单元中同一物种的多个记录),可以通过设置取消

    Environmental layers(栅格)文件

    .asc格式必须具有相同的地理边界和单元格大小

    运行示例

    1、导入Samples(csv格式)、Environmental layers(asc格式/也可以直接选择包含asc文件的文件夹) 2、勾选

    3、设置结果存储路径-Output directory 4、Setting 中 Random test percentage 设置为25,即随机留出25%的样本用于检验 5、单击Run

    <randomly set aside 25% of the sample records for testing>

    Tips:

    • 每次在相同的数据集上运行Maxent时,都会使用相同的随机样本,除非在Settings选择了“Random seed”选项。 • 一个或多个物种的测试数据可以在一个单独的文件中提供,方法是在Settings指定“Test sample file”的名称。 • 支持四种模式的输出: raw, cumulative,logistic,Cloglog. 默认为Cloglog • logistic给出了存在概率在0~1之间的估计

    <The default output is logistic, which is the easiest to conceptualize: it gives an estimate between 0 and 1 of probability of presence.>

    结果解释

    模型拟合评价指标 gain

    • 与deviance相似 • gain随着模型的运行由0沿渐近线逐渐增加 • 在这个过程中,maxent形成一个概率分布,从均匀分布开始逐渐提高对数据的拟合程度 • gain为存在样本的平均对数概率,可减去一个常数使均匀分布gain为0(?) • gain在模型运行结束时表示,在存在样本周围的集中程度如何 • 若gain为2,表示模型对存在样本判断正确的平均可能性高于随机模型exp(2)≈7.4倍

    < It starts at 0 and increases towards an asymptote during the run. During this process, Maxent is generating a probability distribution over pixels in the grid, starting from the uniform distribution and repeatedly improving the fit to the data. The gain is defined as the average log probability of the presence samples, minus a constant that makes the uniform distribution have zero gain. At the end of the run, the gain indicates how closely the model is concentrated around the presence samples; for example, if the gain is 2, it means that the average likelihood of the presence samples is exp(2) ≈ 7.4 times higher than that of a random background pixel.>

    这个图暂时不会解释,感觉是描述训练集与测试集模型的错误率,但x/y轴的含义都看不明白。

    Tips:

    测试数据与训练数据不独立时,会出现test omission line 远低于 predicted omission line 的情况

    <In some situations, the test omission line lies well below the predicted omission line: a common reason is that the test and training data are not independent, for example if they derive from the same spatially autocorrelated presence data.>

    ROC曲线

    • 红色线表示模型与训练数据的拟合,蓝色线表示模型与测试数据的拟合,是对模型预测能力的真实测试。 • 红色线通常高于蓝色线 • 物种在某个狭小区域时,AUC值通常会较高,这并不一定意味着模型更好。

    <If you use the same data for training and for testing then the red and blue lines will be identical. If you split your data into two partitions, one for training and one for testing it is normal for the red (training) line to show a higher AUC than the blue (testing) line.> <It is important to note that AUC values tend to be higher for species with narrow ranges, relative to the study area described by the environmental data. This does not necessarily mean that the models are better; instead this behavior is an artifact of the AUC statistic.>

    刀切法 jackknife test

    • x轴为训练集拟合模型的gain值;y轴为各环境变量 • Without variable 表示去掉该变量时训练集拟合模型的gain值 • With only variable 表示仅使用该变量时训练集拟合模型的gain值 • With all variable 表示使用全部变量时训练集拟合模型的gain值

    解释时会用到的变量:

    pre6190_ann 年降水量pre6190_l10 10月降水量pre6190_l1 1月平均降水量

    这张图我得再啃啃其他文章再来补

    各变量的贡献程度

    • 模型不断修正某单个特征的系数来增加gain,将gain的增加分配给该特征所依赖的环境变量,并在最后转化为百分比。 • 当存在高度相关的变量时,应谨慎解释这些百分比。本例中,七月与十月降水量与年降水量高度相关,反映的贡献度却截然不同,这并不意味着10月降水量对物种的重要性远远大于年降水量。

    <Each step of the Maxent algorithm increases the gain of the model by modifying the coefficient for a single feature; the program assigns the increase in the gain to the environmental variable(s) that the feature depends on. Converting to percentages at the end of the training process.>

    <In our Bradypus example, annual precipitation is highly correlated with October and July precipitation. Although the above table shows that Maxent used the October precipitation variable more than any other, and hardly used annual precipitation at all, this does not necessarily imply that October precipitation is far more important to the species than annual precipitation.>

    如何依据变量进行预测? 响应曲线

    • 以下每条曲线是假设其他变量为它们在存在地点上的平均值时,仅改变当前变量大小来生成的。 • y轴上显示的值是由logistic输出格式给出的适当条件的预测概率。

    Tips: 如果环境变量是相关的,边际响应曲线可能是错误的。例如,如果两个密切相关的变量的响应曲线是接近相反的,那么对于大多数像素来说,两个变量的联合效应可能很小。

    <Note that if the environmental variables are correlated, as they are here, the marginal response curves can be misleading.For example, if two closely correlated variables have response curves that are near opposites of each other, then for most pixels, the combined effect of the two variables may be small.we see that predicted suitability is negatively correlated with annual precipitation (pre6190_ann), if all other variables are held fixed. In other words, once the effect of all the other variables has already been accounted for, the marginal effect of increasing annual precipitation is to decrease predicted suitability. However, annual precipitation is highly correlated with the monthly precipitation variables, so in reality we cannot easily hold the monthly values fixed while varying the annual value.

    以下每条曲线是通过只使用相应的变量而不考虑其他变量的模型生成的。

    <each curve is made by generating a model using only the corresponding variable, disregarding all other variables>

    模型设置

    特征函数 featurs

    • 模型设置中有五类特征函数,模型拟合中使用的特征函数越多表示模型参数空间内参数量越大。 • 使用不同的特征函数模型展示结果也不同,以下为仅使用“Threshold features”与仅使用“hinge features”时pre6190_l10响应曲线的结果。

    正则化 regularization multiplier

    • 在Settings中的“regularization multiplier” 进行设置,默认值为1. • 越小拟合程度越接近presence records,也越容易过拟合;越大拟合越分散spread out.

    <fitting so close to the training data that the model doesn’t generalize well to independent test data>

    Processed: 0.011, SQL: 8