本人通过查阅一些资料阐述一下对Python编程语言中Hash value,mutability的理解,以及通过code来验证__hash__和__eq__方法的深刻意义,对其它面向对象的语言也是相通的。
下面这段英文内容来自公司培训Nagiza F. Samatova, NC State Univ的PPT
A hash value ● A fixed sized integer that identifies a particular value ● Useful for quick look-up of values in a large collection of values such as sets and dictionaries ● Unlike lists that access and compare every element of the list
Mechanics of Using Hash Values ●Python keeps track of each hash ● if x in values statement will trigger: >>computing the hash-value for x, >>looking it up in an internal structure to identify the number and >>only comparing x with the values that have the same hash as x
Mutability ● Lists may contain non-hashable objects ● Mutable objects are the objects whose value can be changed ● Mutable objects should not be hashed ● Hashable objects can not be changed to ensure persistency through the object’s life-time
哈希的作用? 它是一个将大体量数据转化为很小数据的过程,甚至可以仅仅是一个数字,以便我们可以用在固定的时间复杂度下查询它,所以,哈希对高效的算法和数据结构很重要。
hash(object) hash() 用于获取取一个对象(字符串或者数值等)的哈希值。返回对象的哈希值。对象需要两个方法才能使对象可哈希:hash__和__eq。
什么是可哈希(hashable),不可哈希(unhashable)? 引用来自官网的一段话
An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() method). Hashable objects which compare equal must have the same hash value.
也就是说一个对象可哈希,它的哈希值在整个生命周期是不可变的,而且它可以和其它对象进行比较,并且相等的对象其哈希值也是相同的。
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
可哈希性使得一个个对象以字典的形式(key,value)使用,因为哈希值就可以区分对象。
Most of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not; immutable containers (such as tuples and frozensets) are only hashable if their elements are hashable. Objects which are instances of user-defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their id().
大多数Python内置的不可变对象是可哈希的,而可变容器类(例如lists 或dicrionaries,或sets)是不可哈希的,不可变容器类(例如tuples 和frozensets)可哈希当且仅当它包含的全部元素是可哈希的。用户自定义的类对象默认是可哈希的,因为默认继承超类Object中__hash__和__eq__方法,hash方法返回对象内存地址的引用的哈希值,eq方法主要是比较两个对象是否指向同一个对象的引用。
str_object = 'python' tuple_object = ('python') tuple_immutable = (1,2,(3,4)) tuple_mutable = (1,2,[3,4]) list_object = ['python', 'java', 'r'] set_object = {'python'} dict_object = {'language':'python'} hash(str_object) # out: 4840421178446298472 hash(list_object[0]) # out: 4840421178446298472 hash(tuple_object) #out:4840421178446298472 hash(tuple_immutable)# output: 3794340727080330424不可变容器类tuple (1,2,[3,4])中含有可变元素list[3,4],因此也是不可哈希的
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-53-25886bc07981> in <module> ----> 1 hash(tuple_mutable) TypeError: unhashable type: 'list' hash(dict_object) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-14-60aed2bdf4ce> in <module> ----> 1 hash(dict_object) TypeError: unhashable type: 'dict' hash(list_object) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-15-4974dcb26867> in <module> ----> 1 hash(list_object) TypeError: unhashable type: 'list' hash(set_object) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-18-0ca966ccf645> in <module> ----> 1 hash(set_object) TypeError: unhashable type: 'set'怎么判断可变不可变 ? 总结:改个值 看id是不是一样,id一样的为可变,则不可哈希,改了值,id变化,则为不可变,则可哈希
怎样判断为同一对象呢? 当两个实例被视为相等时,且它们必须返回相同的哈希值,认为两个实例是指向同一个引用,视为同一对象。如果集合中都存在哈希值,则认为该实例已存在于集合中,该实例被视为等于集合中具有相同哈希值的其中一个实例,并非代表具有相同哈希值唯一的实例。
有关__eq__和__ hash __方法 将通过code来阐述它们的奥秘。
没有override 超类Object类的__eq__和__ hash __方法,p1,p2,p3,p4是4个不同的实例
class Person: def __init__(self, name, identityId): self.name = name self.identityId = identityId p1 = Person('Formal Name', 123456789) p2 = Person('Formal Name', 123456789) p3 = Person('Nick Name', 123456789) p4 = Person('Nick Name', 123456789) result = set() result.add(p1) result.add(p2) result.add(p3) result.add(p4) print(len(result)) # output:4仅override超类Object类的__eq__ ,但是默认超类Object类的__ hash __方法,也就是说对象的地址引用的hash值。 is 主要是判断 2 个变量是否引用的是同一个对象,如果是的话,则返回 true,否则返回 false。 == 用来判断两个对象的值是否相等(跟 Java 不同,Java 中 == 用来判断是否是同一个对象), 所以p1,p2,p3和p4尽管是equal的,但是hash value是不一样的,所以是不同的实例。而且Person是不可hashable的,以此也验证了上面的那句话,只有__hash__和__eq__一致实现才可以hashable,即两个对象相等,其hash值也必须相等才可hashable
class Person: def __init__(self, name, identityId): self.name = name self.identityId = identityId def __eq__(self, other): return isinstance(other, Person) and self.identityId == other.identityId p1 = Person('Formal Name', 123456789) p2 = Person('Formal Name', 123456789) p3 = Person('Nick Name', 123456789) p4 = Person('Nick Name', 123456789) print(p1==p2) # output: True print(p1==p3) # output: True print(p1 is p1) # output: False print(p1 is p3) # output: False print(id(p1)) # output: 2239125918048 print(id(p2)) # output: 2239125918144 print(id(p3)) # output: 2239125776272 print(id(p4)) # output: 2239125775168 result = set() result.add(p1) result.add(p2) result.add(p3) result.add(p4) print(len(result)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-43-f7b955e924d1> in <module> 20 print(id(p4)) # output: 2239125775168 21 result = set() ---> 22 result.add(p1) 23 result.add(p2) 24 result.add(p3) TypeError: unhashable type: 'Person'override超类Object的__ hash __方法,因此使用默认的object的__eq__是比较两个实例的地址引用是否相等,它只在obj1 is obj2也是true时返回true。换句话说,只有当两个实例完全相同的实例时,才认为它们相等。仅仅因为它们的散列匹配,并不能确定它们在一个集合中是唯一的。所以p1,p2,p3,p4哈希值是一样的,但是是不同对象。
class Person: def __init__(self, name, identityId): self.name = name self.identityId = identityId def __hash__(self): # use the hashcode of self.identityId since that is used # for equality checks as well return hash(self.identityId) p1 = Person('Formal Name', 123456789) p2 = Person('Formal Name', 123456789) p3 = Person('Nick Name', 123456789) p4 = Person('Nick Name', 123456789) print(p1==p2) # output: False print(p1==p3) # output: False print(p1 is p2) # output: False print(p1 is p3) # output: False print(id(p1)) # output: 2239124972880 print(id(p2)) # output: 2239124973888 print(id(p3)) # output: 2239124973936 print(id(p4)) # output: 2239124973360 print(hash(p1)) #output:123456789 print(hash(p2)) #output:123456789 print(hash(p3)) #output:123456789 print(hash(p4)) #output:123456789 result = set() result.add(p1) result.add(p2) result.add(p3) result.add(p4) print(len(result)) # output:4override超类Ojbect的 __ hash __ and __ eq __ 方法, p1, p2, p3, p4哈希值是一样的,而且是相等的,但是id是不一样的,因此还是指向不同的对象。但是为啥set还是视p1,p2, p3, p4为重复的对象呢?,有必要下一篇文章去研究一下set去重原理,这里先给个结论吧,哈哈!set的去重是通过两个函数__hash__和__eq__结合实现的。当两个变量的哈希值不相同时,就认为这两个变量是不同的;当两个变量哈希值一样时,调用__eq__方法,当返回值为True时认为这两个变量是同一个,应该去除一个。返回FALSE时,不去重。
class Person: def __init__(self, name, identityId): self.name = name self.identityId = identityId def __hash__(self): # use the hashcode of self.identityId since that is used # for equality checks as well return hash(self.identityId) def __eq__(self, other): return isinstance(other, Person) and self.identityId == other.identityId p1 = Person('Formal Name', 123456789) p2 = Person('Formal Name', 123456789) p3 = Person('Nick Name', 123456789) p4 = Person('Nick Name', 123456789) print(p1==p2) # output: True print(p1==p3) # output: True print(p1 is p2) # output: False print(p1 is p3) # output: False print(id(p1)) # output: 2239124972880 print(id(p2)) # output: 2239124973888 print(id(p3)) # output: 2239124973936 print(id(p4)) # output: 2239124973360 print(hash(p1)) #output:123456789 print(hash(p2)) #output:123456789 print(hash(p3)) #output:123456789 print(hash(p4)) #output:123456789 result = set() result.add(p1) result.add(p2) result.add(p3) result.add(p4) print(len(result)) # output:1以上就是本人对hash,mutability以及hash,eq方法的理解,也算是本人第一篇正式出道的博客文章吧,体会了深入研究的乐趣,哈哈,欢迎大家提出建议,一起学习,一起进步!