数据科学家是一个比计算机科学家懂更多统计学,比统计学家懂更多计算机科学的人。 – Josh Blumenstock
D:\word.txt 中有如下数据:
0101
00101
00101
111
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
public class TestSparkJava {
public static void main(String[] args) {
String logFile = "D:\\word.txt";
SparkConf conf = new SparkConf().setMaster("local").setAppName("Demo");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("0"); }
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("1"); }
}).count();
System.out.println("Lines with 0: " + numAs + ", lines with 1: " + numBs);
sc.stop();
}
}
答案: A
;
版权声明:
本文为智客工坊「楠木大叔」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
