hadoop的序列化机制 -

zzy1943

浏览: 50719 次
性别:
来自: 深圳

最近访客更多访客>>

xiaobiaobiao212

ksfallen

gytloop

卢光style

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

hadoop的序列化机制

博客分类：

HADOOP&MAPREDUCE

Hadoop Java Apache performance 框架

hadoop不用java的serialization机制
doug cutting 是这样解释的：

引用

Why didn’t I use Serialization when we first started Hadoop? Because it looked
big and hairy and I thought we needed something lean and mean, where we had
precise control over exactly how objects are written and read, since that is central
to Hadoop. With Serialization you can get some control, but you have to fight for
it.
The logic for not using RMI was similar. Effective, high-performance inter-process
communications are critical to Hadoop. I felt like we’d need to precisely control
how things like connections, timeouts and buffers are handled, and RMI gives you
little control over those.

总的意思就是：serialization对hadoop很重要，所以我们要自己实现我们专用的序列化机制。不使用RMI也是一样的道理

运用hadoop的序列化
在hadoop的框架中要使一个类可序列化，要实现Writable接口的两个方法：

public interface Writable {
  void write(DataOutput out) throws IOException;
  void readFields(DataInput in) throws IOException;
}

比java的实现Serializable复杂很多。但是通过比较可以发现，hadoop的序列化机制产生的数据量远小于java的序列化所产生的数据量。

在这两个方法中自己控制对fileds的输入和输出。如果类中包含有其他对象的引用，那么那个对象也是要实现Writable接口的（当然也可以不实现Writable借口，只要自己处理好对对象的fileds的存贮就可以了）。
下面是一个简单的例子：
类Attribute

package siat.miner.etl.instance
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;

public class Attribute implements Writable{

	public static int ATTRIBUTE_TYPE_STRING = 1;//string type
	public static int ATTRIBUTE_TYPE_NOMINAL = 2;//nominal type
	public static int ATTRIBUTE_TYPE_REAL = 3;//real type
	
	private IntWritable type;
	private Text name;
	public IntWritable getType() {
		return type;
	}
	public void setType(int type) {
		this.type = new IntWritable(type);
	}
	public Text getName() {
		return name;
	}
	public void setName(String name) {
		this.name = new Text(name);
	}
	public Attribute() {
		super();
		this.type = new IntWritable(0);
		this.name = new Text("");
	}
	public Attribute(int type, String name) {
		super();
		this.type = new IntWritable(type);
		this.name = new Text(name);
	}
	@Override
	public void readFields(DataInput in) throws IOException {
		// TODO Auto-generated method stub
		type.readFields(in);
		name.readFields(in);
		
	}
	@Override
	public void write(DataOutput out) throws IOException {
		// TODO Auto-generated method stub
		type.write(out);
		name.write(out);
		
	}
}

类TestA：

package siat.miner.etl.test;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.DataInput;
import java.io.DataInputStream;
import java.io.DataOutput;
import java.io.DataOutputStream;
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Writable;

import siat.miner.etl.instance.Attribute;

public class TestA implements Writable{

	private Attribute a;
	private IntWritable b;
	/**
	 * @param args
	 * @throws IOException 
	 */
	public static void main(String[] args) throws IOException {
		// TODO Auto-generated method stub

		Attribute a = new Attribute(Attribute.ATTRIBUTE_TYPE_NOMINAL, "name");
		TestA ta = new TestA(a, new IntWritable(1));
		ByteArrayOutputStream bos = new ByteArrayOutputStream();
		DataOutputStream oos = new DataOutputStream(bos);
		ta.write(oos);
		
		TestA tb = new TestA();
		tb.readFields(new DataInputStream(new ByteArrayInputStream(bos.toByteArray())));
	}
	public TestA(Attribute a, IntWritable b) {
		super();
		this.a = a;
		this.b = b;
	}
	public TestA() {
		// TODO Auto-generated constructor stub
	}
	@Override
	public void readFields(DataInput in) throws IOException {
		// TODO Auto-generated method stub
		a = new Attribute();
		a.readFields(in);
		b = new IntWritable();
		b.readFields(in);
	}
	@Override
	public void write(DataOutput out) throws IOException {
		// TODO Auto-generated method stub
		a.write(out);
		b.write(out);
	}

}

可以看到，hadoop的序列化机制就是利用java的DataInput和DataOutput来完成对基本类型的序列化，然后让用户自己来处理对自己编写的类的序列化。

0
顶

0
踩

分享到：

［转］C中的const | xml的"特殊字符"

2010-04-06 15:15
浏览 3302
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hadoop的序列化机制

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hadoop的序列化机制

评论

发表评论

相关推荐

hadoop org.apache.hadoop.io.serializer包

最近访客更多访客>>