Crate hdfs [] [src]

hdfs-rs is a library for accessing to HDFS cluster. Basically, it provides libhdfs FFI APIs. It also provides more idiomatic and abstract Rust APIs, hiding manual memory management and some thread-safety problem of libhdfs. Rust APIs are highly recommended for most users.

Important Note

The original libhdfs implementation allows only one HdfsFs instance for the same namenode because libhdfs only keeps a single hdfsFs entry for each namenode. As a result, you need to keep a singleton HdfsFsCache in an entire program, and you must get HdfsFs through only HdfsFsCache. For it, you need to share HdfsFsCache instance across all threads in the program. Contrast, HdfsFs instance itself is thread-safe.

Usage

in Cargo.toml:

[dependencies]
hdfs = "0.0.4"

or

[dependencies.hdfs]
git = "https://github.com/hyunsik/hdfs-rs.git"

and this to your crate root:

extern crate hdfs;

hdfs-rs uses libhdfs, which is JNI native implementation. JNI native implementation requires the proper CLASSPATH. exec.sh included in the source code root plays a role to execute your program with the proper CLASSPATH. exec.sh requires HADOOP_HOME. So, you firstly set HADOOP_HOME shell environment variable as follows:

export HADOOP_HOME=<hadoop install dir>

Then, you can execute your program as follows:

./exec.sh your_program arg1 arg2

Testing

The test also requires the CLASSPATH. So, you should run cargo test through exec.sh.

./exec.sh cargo test

Example

use std::rc::Rc;
use std::cell::RefCell;
use hdfs::HdfsFsCache;
 
// You must get HdfsFs instance through HdfsFsCache. Also, HdfsFsCache 
// must be shared across all threads in the entire program in order to
// avoid the thread-safe problem of the original libhdfs.
let cache = Rc::new(RefCell::new(HdfsFsCache::new()));  
let fs: HdfsFs = cache.borrow_mut().get("hdfs://localhost:8020/").ok().unwrap();
match fs.mkdir("/data") {
  Ok(_) => { println!("/data has been created") },
  Err(_)  => { panic!("/data creation has failed") }
}; 

Modules

minidfs

Mini HDFS Cluster for easily building unit tests MiniDfs Cluster

native

libhdfs native binding APIs libhdfs FFI Binding APIs

Structs

BlockHosts

Includes hostnames where a particular block of a file is stored.

FileStatus

Interface that represents the client side information for a file or directory.

HdfsFile

open hdfs file

HdfsFs

Hdfs Filesystem

HdfsFsCache

HdfsFsCache which caches HdfsFs instances.

HdfsUtil

Hdfs Utility

RzBuffer

A buffer returned from zero-copy read. This buffer will be automatically freed when its lifetime is finished.

RzOptions

Options for zero-copy read

Enums

HdfsErr

Errors which can occur during accessing Hdfs cluster