threejs自定义着色器

科技2022-07-31 117

threejs自定义着色器

As a graduate student studying microbial community data, most of the projects I work on involve some sort of clustering analysis. For one of them, I wanted to color the ends of a dendrogram by a variable from my metadata, to visualize whether that variable followed the clustering as part of another figure. There exist excellent packages in R like ggdendro that allow you to either plot colored bars under dendrograms to represent how groups cluster or color the terminal segments by the cluster itself.

作为研究微生物群落数据的研究生，我从事的大多数项目都涉及某种聚类分析。对于其中一个，我想用元数据中的一个变量为树状图的末端着色，以可视化该变量是否作为另一个图形的一部分跟随聚类。 R中有出色的软件包，例如ggdendro ，可让您在树状图下绘制彩色条形图，以表示组如何聚类或通过聚类本身对末端片段进行着色。

That said, I still haven’t found an easy way to change the color of the terminal ends of the dendrogram itself based on user-defined metadata, which I personally think can be more aesthetically pleasing in some situations. This tutorial describes how I did it and provides reproducible code if you are hoping to do the same thing!

就是说，我仍然没有找到一种简单的方法来根据用户定义的元数据来更改树状图本身的终端颜色，我个人认为在某些情况下，这在美学上会更令人愉悦。本教程介绍了我的操作方法，并提供了可复制的代码(如果您希望做同样的事情！)

树状图基础 (Dendrogram Basics)

Before I start, what is a dendrogram, anyway?

在我开始之前，什么是树状图？

A dendrogram is a graphical representation of hierarchical clustering. Clusters can be constructed in different ways (i.e., top-down or bottom-up), most commonly in R through the application of hclust() on a distance matrix. Dendrograms are built by connecting nodes to branches or other nodes, resulting in a tree-like figure that shows how individual things are related to each other based on multiple variables.

树状图是层次聚类的图形表示。可以通过不同的方式(即自上而下或自下而上)构造聚类，最常见的是通过在距离矩阵上应用hclust()在R中进行hclust() 。通过将节点连接到分支或其他节点来构建树状图，从而生成一个树状图，该图显示了基于多个变量的各个事物如何相互关联。

Let’s say we want to compare how individual irises cluster from the well-known R-core data set. This dataframe contains four numeric vectors (Sepal.Length, Sepal.Width, Petal.Length, and Petal.Width) as well as one character vector (Species). We could easily construct and plot a dendrogram incorporating all these numeric data with base R, but what if we want to color the terminal segments by the species of iris to visualize whether Species follows the clustering determined by hclust()?

假设我们要比较众所周知的R-core数据集中各个虹膜的聚类情况。此数据帧包含四个数值向量( Sepal.Length ， Sepal.Width ， Petal.Length和Petal.Width )以及一个字符向量( Species )。我们可以轻松地构造和绘制将所有这些数值数据与底数R相结合的树状图，但是如果我们想通过虹膜的种类对末端片段进行着色以可视化物种是否遵循hclust()确定的聚类呢？

步骤1：安装套件 (Step 1: Install Packages)

For this tutorial, you’ll want to load three R packages: tidyverse for data manipulation and visualization, ggdendro to extract dendrogram segment data into a dataframe, and RColorBrewer to make an automatic custom color palette for your dendrogram ends. If you would like to make your dendrogram interactive, be sure to load plotly as well.

在本教程中，你将要加载三个R包： tidyverse进行数据操作和可视化， ggdendro以树形图提取段数据到数据帧，并RColorBrewer进行自动定义调色板为您的树状图的目的。如果你想使你的系统树的互动，一定要加载plotly为好。

pacman::p_load(tidyverse, ggdendro, RColorBrewer, plotly)

步骤2：载入资料 (Step 2: Load Data)

Now we’ll want to load the irisdataframe into our environment. As bioinformaticians, we typically have sample names mapped to each observation, so we will want to create our own (sample_name) right at the start.

现在，我们将iris数据帧加载到我们的环境中。作为生物信息学家，我们通常将样本名称映射到每个观察值，因此我们将要在开始时创建自己的样本名称( sample_name )。

With microbial community data, My workflow essentially involves two objects: a giant matrix of ASV (amplicon sequence variant; a term used to describe taxonomy based on DNA sequence similarity) abundances by sample_name, and metadata associated with each sample. To simulate this, we will separate iris into numeric_data, from which we will calculate distance and construct a dendrogram, and metadata, which for our purposes will simply consist of the species of iris for each sample_name. For this workflow, it is important to have a sample_name identifier for each observation; it will be the basis of merging everything at the end.

利用微生物群落数据，我的工作流程本质上涉及两个对象： sample_name和每个样本相关联的元数据的ASV巨型矩阵(amplicon序列变体；基于DNA序列相似性用于描述分类法的术语)的丰度。为了模拟这一点，我们将iris分离为numeric_data ，从中我们将计算距离并构建树状图，而metadata则出于我们的目的将仅由每个sample_name的虹膜种类组成。对于此工作流程，为每个观察值都有一个sample_name标识符很重要；这将是最终合并所有内容的基础。

# label rows with unique sample_namedat <- iris %>% mutate(sample_name = paste(“iris”, seq(1:nrow(iris)), sep = “_”)) # create unique sample ID# save non-numeric metadata in separate dataframemetadata <- dat %>% select(sample_name, Species)# extract numeric vectors for distance matrixnumeric_data <- dat %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, sample_name)# check data head(numeric_data) Image by Author 图片作者

步骤3：标准化资料并建立树状图 (Step 3: Normalize Data and Create Dendrogram)

Before we make the dendrogram, we will calculate a distance matrix based on numeric_data using dist(). It is good practice to normalize your data before doing this calculation; I will therefore normalize all values within a vector on a scale from 0 to 1.

在制作树状图之前，我们将使用dist()基于numeric_data计算距离矩阵。在进行此计算之前，最好规范化数据。因此，我将以0到1的比例对向量中的所有值进行标准化。

After we do that, we can create a distance matrix (dist_matrix) and generate a dendrogram from our normalized data.

完成之后，我们可以创建一个距离矩阵( dist_matrix )并从规范化数据中生成dendrogram 。

# normalize data to values from 0 to 1 numeric_data_norm <- numeric_data %>% select(sample_name, everything()) %>% pivot_longer(cols = 2:ncol(.), values_to = “value”, names_to = “type”) %>% group_by(type) %>% mutate(value_norm = (value-min(value))/(max(value)-min(value))) %>% # normalize data to values 0–1 select(sample_name, value_norm) %>% pivot_wider(names_from = “type”, values_from = “value_norm”) %>% column_to_rownames(“sample_name”)# create dendrogram from distance matrix of normalized datadist_matrix <- dist(numeric_data_norm, method = “euclidean”)dendrogram <- as.dendrogram(hclust(dist_matrix, method = “complete”))

步骤4：使用ggdendro提取树状图分段数据 (Step 4: Extract Dendrogram Segment Data Using ggdendro)

Now let’s quickly take a look at what our dendrogram looks like using base R:

现在，让我们快速看一下使用base R的树状图：

plot(dendrogram) Image by Author 图片作者

Okay, it’s not very pretty, but bear with me. This is a useful visual to show how we will extract the coordinate data from the dendrogram object with ggdendro::dendro_data() to make a better figure. Every dendrogram is plotted by adding individual segments between points on an x and y grid.

好吧，它不是很漂亮，但是请忍受我。这是一个有用的视觉效果，它展示了我们如何使用ggdendro::dendro_data()从树状图对象中提取坐标数据，从而获得更好的图形。通过在x和y网格上的点之间添加单独的线段来绘制每个树状图。

When we apply dendro_data() and look at the extracted segment data, we see there are four vectors for every dendrogram: x, y, xend, and yend. Every horizontal or vertical line you see in the base R figure is ultimately constructed from one row of the following dataframe:

当我们应用dendro_data()并查看提取的段数据时，我们看到每个树状图都有四个向量： x ， y ， xend和yend 。您在基本R图中看到的每条水平线或垂直线最终都是由以下数据帧的一行构成的：

# extract dendrogram segment datadendrogram_data <- dendro_data(dendrogram)dendrogram_segments <- dendrogram_data$segments # contains all dendrogram segment datahead(dendrogram_segments) Image by Author 图片作者

We will split these coordinate data into two dataframes: dendrogram_segments, containing all the segments, and dendrogram_ends, containing only the terminal branches of the figure. As the plot above shows, when the value in the y-direction as 0 (i.e., yend == 0), only those single segments at the bottom of the plot are included:

我们将这些坐标数据分为两个数据帧： dendrogram_segments (包含所有段)和dendrogram_ends (仅包含dendrogram_ends的末端分支)。如上图所示，当y方向上的值为0(即yend == 0 )时，仅包括图底部的那些单个段：

# get terminal dendrogram segmentsdendrogram_ends <- dendrogram_segments %>% filter(yend == 0) %>% # filter for terminal dendrogram ends left_join(dendrogram_data$labels, by = “x”) %>% # .$labels contains the row names from dist_matrix (i.e., sample_name) rename(sample_name = label) %>% left_join(metadata, by = “sample_name”) # dataframe now contains only terminal dendrogram segments and merged metadata associated with each iris

Looking at dendrogram_ends, we now have a dataframe with vectors containing the dendrogram coordinate data matched to the sample_name andSpecies vectors. We are now ready to start plotting in ggplot2!

查看dendrogram_ends ，我们现在有了一个数据框，其中的向量包含与sample_name和Species向量匹配的树状图坐标数据。现在我们准备开始在ggplot2绘图！

head(dendrogram_ends) Image by Author 图片作者

步骤5：使用RColorBrewer(可选)基于元数据变量为树状图端生成自定义调色板。 (Step 5: Generate a Custom Color Palette for Dendrogram Ends Based on Metadata Variables using RColorBrewer (Optional))

If you want to dynamically create a list of colors based on how many unique variables the metadata vector of interest contains, you can run this code. In this example, our metadata only contains three species of iris, so this could be done manually fairly quickly. However, if the number of unique metadata variables in your dataset is more than that, as is common with microbial community data, chances are you might want to automate this process.

如果要基于感兴趣的元数据矢量包含多少个唯一变量来动态创建颜色列表，则可以运行此代码。在此示例中，我们的metadata仅包含三种虹膜，因此可以相当快地手动完成。但是，如果您的数据集中唯一的元数据变量的数量超过了微生物群落数据所常见的数量，那么您可能希望自动化该过程。

# Generate custom color palette for dendrogram ends based on metadata variableunique_vars <- levels(factor(dendrogram_ends$Species)) %>% as.data.frame() %>% rownames_to_column(“row_id”) # count number of unique variablescolor_count <- length(unique(unique_vars$.))# get RColorBrewer paletteget_palette <- colorRampPalette(brewer.pal(n = 8, name = “Set1”))# produce RColorBrewer palette based on number of unique variables in metadata:palette <- get_palette(color_count) %>% as.data.frame() %>% rename(“color” = “.”) %>% rownames_to_column(var = “row_id”)color_list <- left_join(unique_vars, palette, by = “row_id”) %>% select(-row_id)species_color <- as.character(color_list$color)names(species_color) <- color_list$.

If you don’t want to bother with the above code for this tutorial, you could just manually create a named character vector as an alternative:

如果您不想为本教程使用上面的代码，则可以手动创建一个命名字符向量作为替代：

# Alternatively, create a custom named vector for iris species color:species_color <- c(“setosa” = “#E41A1C”, “versicolor” = “#CB6651”, “virginica” = “#F781BF”)

第6步：绘制自定义树状图！ (Step 6: Plot your Custom-Colored Dendrogram!)

Now it’s time to plot our dendrogram! You will want to define two geoms for geom_segment: one plotting all the segment data extracted from Step 4, which are uncolored, and one for just the terminal branches of the dendrogram, which is what we will color with species_color from the previous step. If you wrap this plot with plotly (see below), I recommend adding an extra text aesthetic to control which information will display on your output.

现在是时候绘制树状图了！您将要为geom_segment定义两个几何：一个绘制从第4步提取的所有分段数据，它们是未着色的，另一个是仅绘制树状图的末端分支的species_color ，这是我们将使用上一步中的species_color进行着色的图。如果使用plotly包装此图(请参见下文)，则建议添加额外的text美感，以控制将在输出中显示的信息。

p <- ggplot() + geom_segment(data = dendrogram_segments, aes(x=x, y=y, xend=xend, yend=yend)) + geom_segment(data = dendrogram_ends, aes(x=x, y=y.x, xend=xend, yend=yend, color = Species, text = paste(‘sample name: ‘, sample_name, ‘<br>’, ‘species: ‘, Species))) + # test aes is for plotly scale_color_manual(values = species_color) + scale_y_reverse() + coord_flip() + theme_bw() + theme(legend.position = “none”) + ylab(“Distance”) + # flipped x and y coordinates for aesthetic reasons ggtitle(“Iris dendrogram”)p Image by Author 图片作者

If you want to get really fancy, you can wrap your ggplot with plotly to make your dendrogram interactive! Be sure to specify tooltip = “text” to control which information is displayed.

如果您真的想花哨的话，可以用ggplot包裹ggplot来使树状图互动！确保指定tooltip = “text”以控制显示哪些信息。

ggplotly(p, tooltip = “text”)

And there you have it — dendrogram ends dynamically colored by a variable in your metadata! As we can see, the species of iris does seem to follow the hierarchical clustering determined by hclust(), which can inform further tests done in your exploratory analysis pipeline.

到此为止，树状图的结尾由元数据中的变量动态着色！正如我们所看到的，虹膜的种类似乎遵循hclust()确定的层次聚类，这可以为您在探索性分析管道中进行的进一步测试提供hclust() 。

翻译自: https://towardsdatascience.com/custom-coloring-dendrogram-ends-in-r-f1fa45e5077a

threejs自定义着色器

相关资源：在ThreeJS中使用自定义顶点和片段着色器

Processed: 0.010, SQL: 8