Hbase
HBase的介绍HBase的特征HBase的存储机制HBase的搭建Hbase在第三方软件中的体现Hbase数据在Hadoop中存储的位置Hbase在zookeeper中的存储位置Hbase在Hbase中的体现Hbase在webui中的体现Hbase在shell中的体现
Hbase命令javaAPI操作Hbaseclient交互hbase的过程Hbase切割文件手动分区手动移动分区
预切割Hbase数据版本控制创表参数TTLKEEP_DELETED_CELLS
过滤器过滤器的分类比较过滤器专用过滤器
过滤器列表
比较运算符集合类比较器过滤器的用法
Hbase计数器计数器的命令javaAPI操作计数器
rowKey的设计布隆过滤器sql可视化界面
HBase的介绍
HBase是基于hadoop的数据库,分布式可伸缩(可任意增加或减少节点)大型数据存储,随机、实时读写数据,可以存储十亿行,百万列,版本化、非关系型,面向列(列不固定)的数据库,非关系型数据库还有Redis和mongodb,存储形式都是kv对。
HBase的特征
线性模块化扩展方式。严格一致性读写自动可配置表切割区域服务器之间自动容灾HBase支持Hadoop MR作业易于使用的Java API块缓存和布隆过滤器用于实时查询通过服务器端过滤器实现查询预测支持XML, Protobuf, and binary data等串行化数据技术可扩展的shell可视化
HBase的存储机制
面向列存储 ,table是按行排序,HBase的存储结构如下: 因此定位一个数据需要该数据的行号,列族和列,版本号
HBase的搭建
HBase有两种节点,master(管理节点)和RegionServer(区域服务器) 这里我选择Centos100作为master,Centos101和Centos102作为RS(一般来说应该选择奇数个服务器作为RS,但是本人电脑资源有限只能选择两台)
安装jdk
安装hadoop
安装HBase
tar -zvxf hbase-2.0.0-bin.tar.gz -C /soft/
xsync.sh hbase-2.0.0
export HBASE_HOME
=/soft/hbase-2.0.0
export PATH
=$PATH:$HBASE_HOME/bin
source /etc/profile
hbase version
修改hbase-2.0.0/conf/hbase-env.sh,然后分发
export JAVA_HOME
=/soft/jdk1.8.0_65/
export HBASE_MANAGES_ZK
=false
HBase本地模式 hbase-2.0.0/conf/hbase-site.xml,添加或修改以下属性
<property>
<name>hbase.rootdir
</name>
<value>file:/home/hadoop/HBase/HFiles
</value>
</property>
HBase伪分布式 hbase-2.0.0/conf/hbase-site.xml,添加或修改以下属性
<property>
<name>hbase.cluster.distributed
</name>
<value>true
</value>
</property>
<property>
<name>hbase.rootdir
</name>
<value>hdfs://localhost:8030/hbase
</value>
</property>
HBase完全分布式 hbase-2.0.0/conf/hbase-site.xml,添加或修改以下属性,配置完后分发
<property>
<name>hbase.cluster.distributed
</name>
<value>true
</value>
</property>
<property>
<name>hbase.rootdir
</name>
<value>hdfs://mycluster/hbase
</value>
</property>
<property>
<name>hbase.zookeeper.quorum
</name>
<value>Centos100:2181,Centos101:2181,Centos102:2181
</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir
</name>
<value>/home/zk/zookeeper
</value>
</property>
配置hbase-2.0.0/conf/regionservers,指定区域服务器主机名(配置完分发)
Centos101
Centos102
启动集群(在master节点上启动,注意:完全分布式启动之前需要先启动hadoop和zk)
zkServer.sh start
start-all.sh
start-hbase.sh
启动成功后master节点主机上会有一个进程HMaster,RS节点上会有一个进程HRegionServer
访问hbase的webui,端口是16010
查看zk
zkCli.sh
ls /
ls /hbase/rs
get /hbase/master
启用备份master
hbase-daemon.sh start master
校时
Hbase在第三方软件中的体现
Hbase数据在Hadoop中存储的位置
名称空间
表
表数据的分区
列族
列族中的数据
Hbase在zookeeper中的存储位置
名称空间 表 meta表所在的服务器 查询数据一般都是先通过zk查到meta所在的服务器,然后查询meta分区表,根据提供的rowKey找到对应分区所在的服务器,查到数据
Hbase在Hbase中的体现
Hbase在webui中的体现
表格 表格分区信息
Hbase在shell中的体现
查询名称空间表和分区
Hbase命令
hbase-daemon.sh start master
hbase-daemons.sh start regionserver
start-hbase.sh
stop-hbase.sh
hbase shell
help
命令分组:
general(常规组)
whoami
version
ddl(数据定义语言)
create
'mydb1:t1',
'f1',
'f2',
'f3'
describe
'mydb1:t1'
desc
'mydb1:t1'
disable
'mydb1:t1'
drop
'mydb1:t1'
enable 'mydb1:t1'
namespace(名字空间)
list_namespace
list_namespace_tables
'default'
create_namespace
'mydb1'
dml
put
'mydb1:t1','row1','f1:id',100
put
'mydb1:t1','row1','f1:name','tom'
put
'mydb1:t1','row1','f1:age',12
get
'mydb1:t1','row1'
scan
'mydb1:t1'
count
'mydb1:t1'
tools
flush
'mydb1:t1'
split 'mydb1:t1'
split 'mydb1:t1',
'row008888'
move
'e15b2d06eb033f19934c26f5af3eab3d',
'centos101,16020,1602661324341'
merge_region
'e15b2d06eb033f19934c26f5af3eab3d',
'8bf5e09451b8b5f516dff93f834927dd'
javaAPI操作Hbase
将服务器上的hbase-site.xml配置文件拷到项目中的src根目录下,注意如果服务器上的配置文件写的是主机名,那你这边不能把主机名变成ip,需要去修改hosts文件ip映射,如果服务器上配置的ip,那就写ip,总之不要去更改从服务器上考下来的配置文件maven环境搭建
<dependency>
<groupId>org.apache.hbase
</groupId>
<artifactId>hbase-client
</artifactId>
<version>2.0.0
</version>
</dependency>
代码实现(注意:当前代码都是在2.0.0版本的基础上写的,不同版本的hbase代码会有差异)
import org
.apache
.hadoop
.conf
.Configuration
;
import org
.apache
.hadoop
.hbase
.HBaseConfiguration
;
import org
.apache
.hadoop
.hbase
.NamespaceDescriptor
;
import org
.apache
.hadoop
.hbase
.TableName
;
import org
.apache
.hadoop
.hbase
.client
.*
;
import org
.apache
.hadoop
.hbase
.util
.Bytes
;
import java
.text
.DecimalFormat
;
public static void put() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
TableName tName
= TableName
.valueOf("mydb1:t1");
Table table
= conn
.getTable(tName
);
byte[] rowId
= Bytes
.toBytes("row2");
Put put
= new Put(rowId
);
byte[] f1
= Bytes
.toBytes("f1");
byte[] id
= Bytes
.toBytes("id");
byte[] valueId
= Bytes
.toBytes(101);
put
.addColumn(f1
,id
,valueId
);
byte[] name
= Bytes
.toBytes("name");
byte[] valueName
= Bytes
.toBytes("zhangsan");
put
.addColumn(f1
,name
,valueName
);
byte[] hobby
= Bytes
.toBytes("hobby");
byte[] valueHobby
= Bytes
.toBytes("打篮球");
put
.addColumn(f1
,hobby
,valueHobby
);
table
.put(put
);
conn
.close();
}
public static void get() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
TableName tName
= TableName
.valueOf("mydb1:t1");
Table table
= conn
.getTable(tName
);
byte[] rowId
= Bytes
.toBytes("row1");
Get get
= new Get(rowId
);
Result r
= table
.get(get
);
byte[] value
= r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("name"));
System
.out
.println(Bytes
.toString(value
));
conn
.close();
}
public static void batch() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t5"));
List
<Row> rows
= new ArrayList<Row>();
Put put
= new Put(Bytes
.toBytes("row003"));
put
.addColumn(Bytes
.toBytes("f1"),Bytes
.toBytes("name"),Bytes
.toBytes("zhangsan"));
rows
.add(put
);
Get get
= new Get(Bytes
.toBytes("row001"));
get
.addColumn(Bytes
.toBytes("f1"), Bytes
.toBytes("name"));
rows
.add(get
);
Delete delete
= new Delete(Bytes
.toBytes("row002"));
delete
.addColumn(Bytes
.toBytes("f1"), Bytes
.toBytes("name"));
rows
.add(delete
);
Object
[] results
= new Object[rows
.size()];
table
.batch(rows
,results
);
Result r
= (Result
)results
[1];
byte[] value
= r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("name"));
System
.out
.println(Bytes
.toString(value
));
conn
.close();
}
public static void putAll() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
TableName tName
= TableName
.valueOf("mydb1:t1");
Table table
= conn
.getTable(tName
);
List
<Put> puts
= new ArrayList<Put>();
DecimalFormat df
= new DecimalFormat();
df
.applyPattern("0000000");
for(int i
= 1; i
< 100000; i
++){
Put put
= new Put(Bytes
.toBytes("row" + df
.format(i
)));
put
.addColumn(Bytes
.toBytes("f2"),Bytes
.toBytes("id"),Bytes
.toBytes(i
));
put
.addColumn(Bytes
.toBytes("f2"),Bytes
.toBytes("name"),Bytes
.toBytes("Tom" + i
));
put
.addColumn(Bytes
.toBytes("f2"),Bytes
.toBytes("age"),Bytes
.toBytes(i
% 100));
puts
.add(put
);
}
table
.put(puts
);
conn
.close();
}
public static void createNameSpace() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Admin admin
= conn
.getAdmin();
NamespaceDescriptor nd
= NamespaceDescriptor
.create("mydb2").build();
admin
.createNamespace(nd
);
conn
.close();
}
public static void listNameSpaces() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Admin admin
= conn
.getAdmin();
NamespaceDescriptor
[] nds
= admin
.listNamespaceDescriptors();
for(NamespaceDescriptor nd
:nds
){
System
.out
.println(nd
.getName());
}
conn
.close();
}
public static void createTable() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Admin admin
= conn
.getAdmin();
TableName tn
= TableName
.valueOf("mydb2:t2");
TableDescriptorBuilder
.ModifyableTableDescriptor table
= new TableDescriptorBuilder.ModifyableTableDescriptor(tn
);
ColumnFamilyDescriptorBuilder
.ModifyableColumnFamilyDescriptor family1
= new ColumnFamilyDescriptorBuilder.ModifyableColumnFamilyDescriptor(Bytes
.toBytes("f1"));
table
.setColumnFamily(family1
);
ColumnFamilyDescriptorBuilder
.ModifyableColumnFamilyDescriptor family2
= new ColumnFamilyDescriptorBuilder.ModifyableColumnFamilyDescriptor(Bytes
.toBytes("f2"));
table
.setColumnFamily(family2
);
admin
.createTable(table
);
conn
.close();
}
public static void deleteData() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb2:t2"));
Delete delete
= new Delete(Bytes
.toBytes("row000001"));
delete
.addColumn(Bytes
.toBytes("f1"),Bytes
.toBytes("id"));
delete
.addColumn(Bytes
.toBytes("f1"),Bytes
.toBytes("name"));
table
.delete(delete
);
conn
.close();
}
public static void deleteTable() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Admin admin
= conn
.getAdmin();
admin
.disableTable(TableName
.valueOf("mydb2:t2"));
admin
.deleteTable(TableName
.valueOf("mydb2:t2"));
conn
.close();
}
public static void scan() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
Scan scan
= new Scan();
scan
.setBatch(3);
scan
.setCaching(1500);
scan
.withStartRow(Bytes
.toBytes("row004748"),true);
scan
.withStopRow(Bytes
.toBytes("row004758"),true);
ResultScanner rs
= table
.getScanner(scan
);
Iterator
<Result> iterator
= rs
.iterator();
while (iterator
.hasNext()){
Result result
= iterator
.next();
Map
<byte[], byte[]> map
= result
.getFamilyMap(Bytes
.toBytes("f1"));
for(byte[] key
:map
.keySet()){
String k
= Bytes
.toString(key
);
String v
= Bytes
.toString(map
.get(key
));
System
.out
.print(k
+ "===" + v
+ "......");
}
System
.out
.println();
}
rs
.close();
conn
.close();
}
public static void scan2() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
Scan scan
= new Scan();
scan
.withStartRow(Bytes
.toBytes("row004748"),true);
scan
.withStopRow(Bytes
.toBytes("row004758"),true);
ResultScanner rs
= table
.getScanner(scan
);
Iterator
<Result> iterator
= rs
.iterator();
while (iterator
.hasNext()){
Result result
= iterator
.next();
Map
<byte[], NavigableMap
<byte[], NavigableMap
<Long
, byte[]>>> map
= result
.getMap();
for(byte[] fBytes
: map
.keySet()){
String f
= Bytes
.toString(fBytes
);
NavigableMap
<byte[], NavigableMap
<Long
, byte[]>> map2
= map
.get(fBytes
);
for(byte[] cBytes
: map2
.keySet()){
String c
= Bytes
.toString(cBytes
);
NavigableMap
<Long
, byte[]> map3
= map2
.get(cBytes
);
for(Long e
: map3
.keySet()){
String val
= Bytes
.toString(map3
.get(e
));
System
.out
.print(f
+ ":" + c
+ ":" + e
+ ":" + val
);
}
}
}
System
.out
.println();
}
rs
.close();
conn
.close();
}
public static void getVersions() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
TableName tName
= TableName
.valueOf("mydb1:t2");
Table table
= conn
.getTable(tName
);
Get get
= new Get(Bytes
.toBytes("row001"));
get
.readAllVersions();
Result r
= table
.get(get
);
List
<Cell> cells
= r
.getColumnCells(Bytes
.toBytes("f1"), Bytes
.toBytes("name"));
for(Cell cell
:cells
){
String f
= Bytes
.toString(cell
.getFamilyArray(),cell
.getFamilyOffset(),cell
.getFamilyLength());
String q
= Bytes
.toString(cell
.getQualifierArray(),cell
.getQualifierOffset(),cell
.getQualifierLength());
long timestamp
= cell
.getTimestamp();
String val
= Bytes
.toString(cell
.getValueArray(),cell
.getValueOffset(),cell
.getValueLength());
System
.out
.println("列族:" + f
+ ";列:" + q
+ ";版本:" + timestamp
+ ";值:" + val
);
}
conn
.close();
}
client交互hbase的过程
hbase集群启动时,master负责分配区域到指定区域服务器。联系zk,找出meta表所在rs(regionserver) /hbase/meta-region-server定位row key,找到对应region server缓存信息在本地。联系RegionServerHRegionServer负责open HRegion对象,为每个列族创建Store对象,Store包含多个StoreFile实例, 他们是对HFile的轻量级封装。每个Store还对应了一个MemStore,用于内存存储数据。
Hbase切割文件
Hbase的分区默认是当分区文件的大小超过hbase.hregion.max.filesize属性值时,就会新开辟一个分区存储,默认是10G
<property>
<name>hbase.hregion.max.filesize
</name>
<value>10737418240
</value>
<source>hbase-default.xml
</source>
</property>
手动分区
split 'mydb1:t1'
split 'mydb1:t1,row004927,1602595549576.ea72ecf8920a3525e226c786a3538847.',
'row008000'
注意: 原数据信息是存在缓冲区里面一段时间,不会长时间保留,只有切割后的分区信息数据会永久保留在磁盘上
flush
'hbase:meta'
手动移动分区
move
'e15b2d06eb033f19934c26f5af3eab3d',
'centos101,16020,1602661324341'
分区id:
需要分配到的服务器的名称,端口,启动码: meta表中显示:
预切割
预切割指的是在创建表的时候就对表划分好分区,这样的话就可以省的我们去不断的给表进行切割
create
'mydb1:t2',
'f1',SPLITS
=>['row3000',
'row6000']
Hbase数据版本控制
create
'mydb1:t2',
{NAME
=>'f1',VERSIONS
=>'3'}
get
'mydb1:t2',
'row001',
{COLUMN
=>'f1',VERSIONS
=>4
}
get
'mydb1:t2',
'row001',
{COLUMN
=>'f1',TIMESTAMP
=>1603076828293
}
get
'mydb1:t2',
'row001',
{COLUMN
=>'f1',TIMERANGE
=>[1603075388967,1603076828293
],VERSIONS
=>10
}
delete
'mydb1:t2',
'row001',
'f1:name',1603076828293
scan
'mydb1:t2',
{COLUMN
=>'f1',RAW
=>true,VERSIONS
=>10
}
创表参数
TTL
TTL是创建表的参数,用于指定数据的存活时间,单位秒,不指定默认是永久存活,如果超过这个时间使用get命令是无法查询到数据,但是可以使用原生扫描查询到(低版本的Hbase原生扫描同样查询不到)
create
'mydb1:t3' ,
{NAME
=>'f1',TTL
=>10
}
KEEP_DELETED_CELLS
KEEP_DELETED_CELLS是创建表的参数,用于指定删除数据的所有版本是否全部删除,布尔值,默认值false,不全部删除,指定true,表示删除之后通过get操作将无法获得表里的数据,但是原生扫描还是可以查询到的
create
'mydb1:t4' ,
{NAME
=>'f1',KEEP_DELETED_CELLS
=>true
}
delete
'mydb1:t4',
'row001',
'f1:name'
flush
'mydb1:t4'
过滤器
过滤器的分类
比较过滤器
Hbase的所有比较过滤器都是继承org.apache.hadoop.hbase.filter.CompareFilter这个抽象类,Hbase分以下五种比较过滤器
行过滤器(org.apache.hadoop.hbase.filter.RowFilter)列族过滤器(org.apache.hadoop.hbase.filter.FamilyFilter)列过滤器(org.apache.hadoop.hbase.filter.QualifierFilter)值过滤器(org.apache.hadoop.hbase.filter.ValueFilter)依赖过滤器(org.apache.hadoop.hbase.filter.DependentColumnFilter)
专用过滤器
Hbase提供的专用过滤器都是直接继承org.apache.hadoop.hbase.filter.FilterBase,Hbase有以下几种专用过滤器
单列值过滤器(org.apache.hadoop.hbase.filter.SingleColumnValueFilter)单列排除过滤器(org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter)前缀过滤器(org.apache.commons.io.filefilter.PrefixFileFilter)分页过滤器(org.apache.hadoop.hbase.filter.PageFilter)行键过滤器(org.apache.hadoop.hbase.filter.KeyOnlyFilter)首次行键过滤器(org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter)包含结束的过滤器(org.apache.hadoop.hbase.filter.InclusiveStopFilter)时间戳过滤器(org.apache.hadoop.hbase.filter.TimestampsFilter)列计数过滤器(org.apache.hadoop.hbase.filter.ColumnCountGetFilter)列分页过滤(org.apache.hadoop.hbase.filter.ColumnPaginationFilter)
过滤器列表
org.apache.hadoop.hbase.filter.FilterList用于组合过滤器,他有两个枚举值
MUST_PASS_ALL:and衔接过滤器MUST_PASS_ONE:or衔接过滤器
比较运算符集合类
org.apache.hadoop.hbase.CompareOperator类是Hbase过滤器比较运算符的类,该类下面有七个常量:
常量名称描述
LESS小于LESS_OR_EQUAL小于等于EQUAL等于NOT_EQUAL不等于GREATER_OR_EQUAL大于等于GREATER大于NO_OP排除一切值
比较器
HBase的比较器都继承了org.apache.hadoop.hbase.filter.ByteArrayComparable这个抽象类,Hbase共有以下几种比较器
org.apache.hadoop.hbase.filter.BinaryComparator(按照阈值比较)org.apache.hadoop.hbase.filter.BinaryPrefixComparator(匹配前缀)org.apache.hadoop.hbase.filter.BitComparator(按位比较字节数组,只能与EQUAL 和NOT_EQUAL 配合使用)org.apache.hadoop.hbase.filter.NullComparator(判断当前值是否为null)org.apache.hadoop.hbase.filter.RegexStringComparator(与正则表达式匹配,只能与EQUAL 和NOT_EQUAL 配合使用)(正则表达式中的有两个常用的符号:(tom$:表示以tom结尾,^tom:表示以tom开头,相当于sql语句里面的like))org.apache.hadoop.hbase.filter.SubstringComparator(匹配子字符串,只能与EQUAL 和NOT_EQUAL 配合使用)
过滤器的用法
public static void rowFilter() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Scan scan
= new Scan();
RowFilter rf
= new RowFilter(CompareOperator
.LESS
,new BinaryComparator(Bytes
.toBytes("row000100")));
scan
.setFilter(rf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
ResultScanner rs
= table
.getScanner(scan
);
for(Result r
: rs
){
String row
= Bytes
.toString(r
.getRow());
System
.out
.println(row
);
}
rs
.close();
conn
.close();
}
public static void familyFilter() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Scan scan
= new Scan();
FamilyFilter ff
= new FamilyFilter(CompareOperator
.EQUAL
, new BinaryComparator(Bytes
.toBytes("f1")));
scan
.setFilter(ff
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t6"));
ResultScanner rs
= table
.getScanner(scan
);
for(Result r
:rs
){
System
.out
.println(Bytes
.toString(r
.value()));
}
}
public static void qualifierFilter() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Get get
= new Get(Bytes
.toBytes("row000001"));
QualifierFilter qf
= new QualifierFilter(CompareOperator
.EQUAL
, new BinaryComparator(Bytes
.toBytes("name")));
get
.setFilter(qf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
Result r
= table
.get(get
);
System
.out
.println(Bytes
.toString(r
.value()));
conn
.close();
}
public static void dependentFilter(boolean drop
,CompareOperator co
,ByteArrayComparable bc
) throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Scan scan
= new Scan();
DependentColumnFilter df
= new DependentColumnFilter(Bytes
.toBytes("f1"),Bytes
.toBytes("name"),drop
,CompareOperator
.EQUAL
,new BinaryPrefixComparator(Bytes
.toBytes("Tom1")));
scan
.setFilter(df
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
ResultScanner rs
= table
.getScanner(scan
);
for(Result r
: rs
){
int id
= Bytes
.toInt(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("id")));
String name
= Bytes
.toString(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("name")));
int age
= Bytes
.toInt(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("age")));
System
.out
.println("id:" + id
+ ";name:" + name
+ ";age:" + age
);
}
rs
.close();
conn
.close();
}
public static void singleValue() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Scan scan
= new Scan();
SingleColumnValueFilter sf
= new SingleColumnValueFilter(Bytes
.toBytes("f1"),Bytes
.toBytes("name"),CompareOperator
.EQUAL
, new BinaryComparator(Bytes
.toBytes("Tom10")));
scan
.setFilter(sf
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
ResultScanner rs
= table
.getScanner(scan
);
for(Result r
: rs
){
int id
= Bytes
.toInt(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("id")));
String name
= Bytes
.toString(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("name")));
int age
= Bytes
.toInt(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("age")));
System
.out
.println("id:" + id
+ ";name:" + name
+ ";age:" + age
);
}
rs
.close();
conn
.close();
}
public static void fileList() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Scan scan
= new Scan();
SingleColumnValueFilter nameFilter
= new SingleColumnValueFilter(Bytes
.toBytes("f1"), Bytes
.toBytes("name"), CompareOperator
.EQUAL
, new RegexStringComparator("^Tom1"));
SingleColumnValueFilter ageFilter
= new SingleColumnValueFilter(Bytes
.toBytes("f1"), Bytes
.toBytes("age"), CompareOperator
.EQUAL
, new BinaryComparator(Bytes
.toBytes(55)));
FilterList filterList
= new FilterList(FilterList
.Operator
.MUST_PASS_ALL
);
filterList
.addFilter(nameFilter
);
filterList
.addFilter(ageFilter
);
scan
.setFilter(filterList
);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t1"));
ResultScanner rs
= table
.getScanner(scan
);
for(Result r
: rs
){
int id
= Bytes
.toInt(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("id")));
String name
= Bytes
.toString(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("name")));
int age
= Bytes
.toInt(r
.getValue(Bytes
.toBytes("f1"), Bytes
.toBytes("age")));
System
.out
.println("id:" + id
+ ";name:" + name
+ ";age:" + age
);
}
rs
.close();
conn
.close();
}
Hbase计数器
Hbase计数器一般用于收集统计信息或点击量,他能做到实时统计,并且没有高并发的情况,相对于延时比较高的批处理操作更好
计数器的命令
incr
'mydb1:t7',
'row001',
'dayily:hits'
get_counter
'mydb1:t7',
'row001',
'dayily:hits'
incr
'mydb1:t7',
'row001',
'dayily:hits',10
javaAPI操作计数器
public static void incr() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Increment increment
= new Increment(Bytes
.toBytes("row001"));
increment
.addColumn(Bytes
.toBytes("dayily"),Bytes
.toBytes("hits"),1);
increment
.addColumn(Bytes
.toBytes("monthly"),Bytes
.toBytes("hits"),1);
increment
.addColumn(Bytes
.toBytes("weekly"),Bytes
.toBytes("hits"),1);
Table table
= conn
.getTable(TableName
.valueOf("mydb1:t7"));
table
.increment(increment
);
conn
.close();
}
rowKey的设计
假设我们对全市的过车数据进行设计rowkey,总共有10台服务器
过车数据的数据业务一般跟过车时间,车牌号码,卡口ID有关,我们用过车时间,车牌号码,卡口ID设计rowKey计算分区编号,共十台服务器,我们把过车时间,车牌号,卡口ID组合起来,求出hash值,然后用hash值与10求余,得到0-9的编号,根据编号可以把数据对应放在不同服务器上,防止热点问题rowkey组合:分区编号,过车时间,车牌号,卡口ID上面我们要查询过车时间,车牌号都是可以的,但是如果要查询卡口的过车数据就很麻烦,那么我们可以创建一个索引表,rowKey也按照上面的组合,但是要卡口ID方在分区编号的后面,然后把主表对应的rowKey作为值放在索引表里面
布隆过滤器
布隆过滤器是优化查询的一种手段,可以通过布隆过滤器判断一个分区中是否包含你查询的rowKey,如果这个文件不包含,它会给予你明确的答复,如果包含,它会给你一个可能性的答复,因此他可以提神get查询的性能,避免了全表扫描 创建一个带有布隆过滤器的表:
public static void createBloomFilter() throws Exception
{
Configuration conf
= HBaseConfiguration
.create();
Connection conn
= ConnectionFactory
.createConnection(conf
);
Admin admin
= conn
.getAdmin();
TableName tn
= TableName
.valueOf("mydb2:t3");
TableDescriptorBuilder
.ModifyableTableDescriptor table
= new TableDescriptorBuilder.ModifyableTableDescriptor(tn
);
ColumnFamilyDescriptorBuilder
.ModifyableColumnFamilyDescriptor family
= new ColumnFamilyDescriptorBuilder.ModifyableColumnFamilyDescriptor(Bytes
.toBytes("f1"));
family
.setBloomFilterType(BloomType
.ROW
);
table
.setColumnFamily(family
);
admin
.createTable(table
);
conn
.close();
}
创建完成后,用desc查看表格描述,可以看到BLOOMFILTER => 'ROW'这样一段代码
sql可视化界面
下载apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz将其上传到服务器,并解压复制phoenix-5.0.0-HBase-2.0-server.jar到Hbase的lib的目录下,并且分发重启Hbase进入/apache-phoenix-5.0.0-HBase-2.0-bin/bin目录下输入./sqlline.py Centos100连接到zk服务器,进入phoenix客户端,输入以下命令
!tables
!help
!sql create table mydb1.person
(id varchar
(10
) primary key,name varchar
(100
),age Integer
);
!sql drop table mydb1.person
;
!describe TEST
;
upsert into TEST
(id,name
) values
('row005',
'lisi');
delete from TEST where
id = 'row005';
!quit
查看表格在Hbase中的位置以及表的描述 下载squirrel-sql-4.1.0-standard.jar双击squirrel-sql-4.1.0-standard.jar安装squirrel(直接下一步到底)将phoenix-5.0.0-HBase-2.0-client.jar复制到squirrel的安装目录下打开 squirrel点击下方按钮安装驱动 添加数据库连接 结构