redis源码分析之七基础的数据结构ziplist

科技2025-11-09 11

一、ziplist压缩列表

压缩列表是HASH和跳表的小数据时的数据结构，这个在前面提到过。压缩列表的定义和使用其实在源码的头部说明中是很清楚的。看一下英文的注释： The ziplist is a specially encoded dually linked list that is designed to be very memory efficient. It stores both strings and integer values, where integers are encoded as actual integers instead of a series of characters. It allows push and pop operations on either side of the list in O(1) time. However, because every operation requires a reallocation of the memory used by the ziplist, the actual complexity is related to the amount of memory used by the ziplist。它的意思是ziplist是一个特殊编码的双向链表，目标是为了提高存储的效率，其可用于存储字符串和整数，其中整数是按照二进制编码的，而不是按照字符串编码，其压入和弹出的的时间复杂度为O(1)、不过由于每个操作都需要重新处理使用的内存，所以其实际复杂度和内存大小有关系。

其空间存放格式： The general layout of the ziplist is as follows:

图片中对占用的空间大小和占位符进行了说明。

/* * ZIPLIST ENTRIES * =============== * * <prevlen> <encoding> * * The length of the previous entry, <prevlen>, is encoded in the following way: * If this length is smaller than 254 bytes, it will only consume a single * byte representing the length as an unsinged 8 bit integer. When the length * is greater than or equal to 254, it will consume 5 bytes. The first byte is * set to 254 (FE) to indicate a larger value is following. The remaining 4 * bytes take the length of the previous entry as value. * * So practically an entry is encoded in the following way: * * <prevlen from 0 to 253> <encoding> <entry> * * Or alternatively if the previous entry length is greater than 253 bytes * the following encoding is used: * * 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry> * * The encoding field of the entry depends on the content of the * entry. When the entry is a string, the first 2 bits of the encoding first * byte will hold the type of encoding used to store the length of the string, * followed by the actual length of the string. When the entry is an integer * the first 2 bits are both set to 1. The following 2 bits are used to specify * what kind of integer will be stored after this header. An overview of the * different types and encodings is as follows. The first byte is always enough * to determine the kind of entry. * * |00pppppp| - 1 byte * String value with length less than or equal to 63 bytes (6 bits). * "pppppp" represents the unsigned 6 bit length. * |01pppppp|qqqqqqqq| - 2 bytes * String value with length less than or equal to 16383 bytes (14 bits). * IMPORTANT: The 14 bit number is stored in big endian. * |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes * String value with length greater than or equal to 16384 bytes. * Only the 4 bytes following the first byte represents the length * up to 32^2-1. The 6 lower bits of the first byte are not used and * are set to zero. * IMPORTANT: The 32 bit number is stored in big endian. 说明：上面三种不同的编码，最高位分别为00,01,10是字节数组编码，表示节点的数据存储的是字节数组，数组的长度由编码去除最高两位后的其他位记录。 * |11000000| - 3 bytes * Integer encoded as int16_t (2 bytes). * |11010000| - 5 bytes * Integer encoded as int32_t (4 bytes). * |11100000| - 9 bytes * Integer encoded as int64_t (8 bytes). * |11110000| - 4 bytes * Integer encoded as 24 bit signed (3 bytes). * |11111110| - 2 bytes * Integer encoded as 8 bit signed (1 byte). * |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer. * Unsigned integer from 0 to 12. The encoded value is actually from * 1 to 13 because 0000 and 1111 can not be used, so 1 should be * subtracted from the encoded 4 bit value to obtain the right value. 说明：下面的例子讲的是这种特殊情况，即len,data全二为一，前四个表示长度，后四个表示数据只能有13个值，由于0000，1110，1111和前面定义冲突，所以只能表示1~13，而真实数据是从0开始，所以要减去1，才是真实的数据。 * |11111111| - End of ziplist special entry. 说明：11开头的上面的编码说明是不同类型的整数。 * * Like for the ziplist header, all the integers are represented in little * endian byte order, even when this code is compiled in big endian systems. * * EXAMPLES OF ACTUAL ZIPLISTS * =========================== * * The following is a ziplist containing the two elements representing * the strings "2" and "5". It is composed of 15 bytes, that we visually * split into sections: * * [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff] * | | | | | | * zlbytes zltail entries "2" "5" end 注意：这个entries,指的其实是zlen,2个元素 * * The first 4 bytes represent the number 15, that is the number of bytes * the whole ziplist is composed of. The second 4 bytes are the offset * at which the last ziplist entry is found, that is 12, in fact the * last entry, that is "5", is at offset 12 inside the ziplist. * The next 16 bit integer represents the number of elements inside the * ziplist, its value is 2 since there are just two elements inside. * Finally "00 f3" is the first entry representing the number 2. It is * composed of the previous entry length, which is zero because this is * our first entry, and the byte F3 which corresponds to the encoding * |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F" * higher order bits 1111, and subtract 1 from the "3", so the entry value * is "2". The next entry has a prevlen of 02, since the first entry is * composed of exactly two bytes. The entry itself, F6, is encoded exactly * like the first entry, and 6-1 = 5, so the value of the entry is 5. * Finally the special entry FF signals the end of the ziplist. * * Adding another element to the above string with the value "Hello World" * allows us to show how the ziplist encodes small strings. We'll just show * the hex dump of the entry itself. Imagine the bytes as following the * entry that stores "5" in the ziplist above: * * [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64] * * The first byte, 02, is the length of the previous entry. The next * byte represents the encoding in the pattern |00pppppp| that means * that the entry is a string of length <pppppp>, so 0B means that * an 11 bytes string follows. From the third byte (48) to the last (64) * there are just the ASCII characters for "Hello World". * */

上面是对编码方式的定义说明和一个具体的例子进行的描述，很简单。重点说明了是小端模式，小端模式。请看上面注释内的说明。

二、源码分析

看一下源码：

1、定义

typedef struct zlentry { unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/ unsigned int prevrawlen; /* Previous entry len. */ unsigned int lensize; /* Bytes used to encode this entry type/len. For example strings have a 1, 2 or 5 bytes header. Integers always use a single byte.*/ unsigned int len; /* Bytes used to represent the actual entry. For strings this is just the string length while for integers it is 1, 2, 3, 4, 8 or 0 (for 4 bit immediate) depending on the number range. */ unsigned int headersize; /* prevrawlensize + lensize. */ unsigned char encoding; /* Set to ZIP_STR_* or ZIP_INT_* depending on the entry encoding. However for 4 bits immediate integers this can assume a range of values and must be range-checked. */ unsigned char * p; /* Pointer to the very start of the entry, that is, this points to prev-entry-len field. * / } zlentry; unsigned char *ziplistNew(void) { unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE; unsigned char * zl = zmalloc(bytes); ZIPLIST_BYTES(zl) = intrev32ifbe(bytes); ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE); ZIPLIST_LENGTH(zl) = 0; zl[bytes-1] = ZIP_END; return zl; }

创建没有什么好讲的，几个宏做初始化。

2、插入

unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) { //计算当前长度，并定义前一个长度 size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen; unsigned int prevlensize, prevlen = 0; size_t offset; int nextdiff = 0; unsigned char encoding = 0; long long value = 123456789; /* initialized to avoid warning. Using a value that is easy to see if for some reason we use it uninitialized. */ zlentry tail; //计算前一个长度 /* Find out prevlen for the entry that is inserted. */ if (p[0] != ZIP_END) { ZIP_DECODE_PREVLEN(p, prevlensize, prevlen); } else { unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl); if (ptail[0] != ZIP_END) { prevlen = zipRawEntryLength(ptail); } } //开始计算reqlen=prevrawlen+len(encoding)+data /* See if the entry can be encoded */ if (zipTryEncoding(s,slen,&value,&encoding)) { /* 'encoding' is set to the appropriate integer encoding */ reqlen = zipIntSize(encoding); } else { /* 'encoding' is untouched, however zipStoreEntryEncoding will use the * string length to figure out how to encode it. */ reqlen = slen; } /* We need space for both the length of the previous entry and * the length of the payload. */ //增加长度 reqlen += zipStorePrevEntryLength(NULL,prevlen); reqlen += zipStoreEntryEncoding(NULL,encoding,slen); /* When the insert position is not equal to the tail, we need to * make sure that the next entry can hold this entry's length in * its prevlen field. */ //重新定义空间变化并分配 int forcelarge = 0; nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0; if (nextdiff == -4 && reqlen < 4) { nextdiff = 0; forcelarge = 1; } //动态调整内存 /* Store offset because a realloc may change the address of zl. */ offset = p-zl; zl = ziplistResize(zl,curlen+reqlen+nextdiff); p = zl+offset; /* Apply memory move when necessary and update tail offset. */ if (p[0] != ZIP_END) { /* Subtract one because of the ZIP_END bytes */ memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff); /* Encode this entry's raw length in the next entry. */ if (forcelarge) zipStorePrevEntryLengthLarge(p+reqlen,reqlen); else zipStorePrevEntryLength(p+reqlen,reqlen); /* Update offset for tail */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ zipEntry(p+reqlen, &tail); if (p[reqlen+tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } } else { /* This element will be the new tail. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl); } /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist */ if (nextdiff != 0) { offset = p-zl; zl = __ziplistCascadeUpdate(zl,p+reqlen); p = zl+offset; } //写入数据 /* Write the entry * / p += zipStorePrevEntryLength(p,prevlen); p += zipStoreEntryEncoding(p,encoding,slen); if (ZIP_IS_STR(encoding)) { memcpy(p,s,slen); } else { zipSaveInteger(p,value,encoding); } ZIPLIST_INCR_LENGTH(zl,1); return zl; }

3、删除 POP也是调用这个函数：

unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) { unsigned int i, totlen, deleted = 0; size_t offset; int nextdiff = 0; zlentry first, tail; zipEntry(p, &first); for (i = 0; p[0] != ZIP_END && i < num; i++) { p += zipRawEntryLength(p); deleted++; } totlen = p-first.p; /* Bytes taken by the element(s) to delete. */ if (totlen > 0) { if (p[0] != ZIP_END) { /* Storing `prevrawlen` in this entry may increase or decrease the * number of bytes required compare to the current `prevrawlen`. * There always is room to store this, because it was previously * stored by an entry that is now being deleted. */ //计算删除节点p的前一个节点与删除节点的长度差 nextdiff = zipPrevLenByteDiff(p,first.prevrawlen); /* Note that there is always space when p jumps backward: if * the new previous entry is large, one of the deleted elements * had a 5 bytes prevlen header, so there is for sure at least * 5 bytes free and we need just 4. */ p -= nextdiff; zipStorePrevEntryLength(p,first.prevrawlen); /* Update offset for tail */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ //处理长度变化引起的尾节点的位置变化 zipEntry(p, &tail); if (p[tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } /* Move tail to the front of the ziplist */ memmove(first.p,p, intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1); } else { /* The entire tail was deleted. No need to move memory. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe((first.p-zl)-first.prevrawlen); } /* Resize and update length */ offset = first.p-zl; zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff); ZIPLIST_INCR_LENGTH(zl,-deleted); p = zl+offset; /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist * / //nextdiff != 0，说明删除节点的长度变化，需要更新节点p长度 if (nextdiff != 0) zl = __ziplistCascadeUpdate(zl,p); } return zl; }

删除比插入稍微简单一些。

4、查找

unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) { int skipcnt = 0; unsigned char vencoding = 0; long long vll = 0; while (p[0] != ZIP_END) { unsigned int prevlensize, encoding, lensize, len; unsigned char * q; //查找时先定位第一个entry即privious_entry_length字段的位置 ZIP_DECODE_PREVLENSIZE(p, prevlensize); ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len); q = p + prevlensize + lensize; if (skipcnt == 0) { /* Compare current entry with specified entry */ //检查编码类型--字符串 if (ZIP_IS_STR(encoding)) { if (len == vlen && memcmp(q, vstr, vlen) == 0) { return p; } } else { /* Find out if the searched field can be encoded. Note that * we do it only the first time, once done vencoding is set * to non-zero and vll is set to the integer value. */ if (vencoding == 0) { if (!zipTryEncoding(vstr, vlen, &vll, &vencoding)) { /* If the entry can't be encoded we set it to * UCHAR_MAX so that we don't retry again the next * time. */ vencoding = UCHAR_MAX; } /* Must be non-zero by now */ assert(vencoding); } /* Compare current entry with specified entry, do it only * if vencoding != UCHAR_MAX because if there is no encoding * possible for the field it can't be a valid integer. */ if (vencoding != UCHAR_MAX) { long long ll = zipLoadInteger(q, encoding); if (ll == vll) { return p; } } } /* Reset skip count */ skipcnt = skip; } else { /* Skip entry */ skipcnt--; } /* Move to next entry * / p = q + len; } return NULL; }

分为字符串查找和数字查找，这个没啥可说的。

三、总结

压缩列表其实就是在控制小规模数据时（元素小于128，所有成员长度小于64字节），利用类似数组的数组操作思想，虽然时间复杂度有所增加，但是由于规模受到限制，自然复杂度也受到限制。其实这就是解决问题的方法，针对不同的场景使用不同的处理方式，相比于跳表，虽然时间复杂度增加，但空间复杂度下降，这就是平衡，设计架构时，无处不在。

Processed: 0.009, SQL: 8