Hadoop is a framework for distributed storage and processing of large datasets across clusters of computers. It utilizes HDFS for storage, which distributes data across nodes and replicates files for fault tolerance. HDFS uses a master/slave architecture, with a NameNode managing the file system namespace and DataNodes storing file data in blocks. The Hadoop API provides access to HDFS through interfaces like FileSystem and FSDataInputStream, allowing applications to read, write, and manipulate data in a distributed manner.
30. Java
•
- public FileStatus[] listStatus(Path f) throws IOException;
- public FileStatus[] listStatus(Path f, PathFilter filter)
throws IOException;
- public FileStatus[] listStatus(Path[] files)
throws IOException;
- public FileStatus[] listStatus(Path[] files, PathFilter filter)
throws IOException;
31. Java
•
public class ListStatus {
public static void main(String[] args) throws Exception {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path[] paths = new Path[args.length];
for (int i = 0; i < paths.length; i++) {
paths[i] = new Path(args[i]);
}
FileStatus[] status = fs.listStatus(paths);
for (FileStatus stat : status) {
System.out.println(stat.getPath().toUri().getPath());
}
}
}
32. Java
•
- public FileStatus[] globStatus(Path pathPattern) throws IOException
- public FileStatus[] globStatus(Path pathPattern, PathFilter filter)
throws IOException
33. Java
•
[ab] {a,b}
[^ab] {a,b}
{a,b} (a b )
[a-b]
a b
{a,b} (a b ) a b
[^a-b]
{a,b} a b
¥c c c