数据科学家是一个比计算机科学家懂更多统计学,比统计学家懂更多计算机科学的人。 – Josh Blumenstock

D:\word.txt 中有如下数据:

0101
00101
00101
111

import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;

public class TestSparkJava {

    public static void main(String[] args) {
        String logFile = "D:\\word.txt";
        SparkConf conf = new SparkConf().setMaster("local").setAppName("Demo");
        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<String> logData = sc.textFile(logFile).cache();

        long numAs = logData.filter(new Function<String, Boolean>() {
            public Boolean call(String s) { return s.contains("0"); }
        }).count();

        long numBs = logData.filter(new Function<String, Boolean>() {
            public Boolean call(String s) { return s.contains("1"); }
        }).count();

        System.out.println("Lines with 0: " + numAs + ", lines with 1: " + numBs);

        sc.stop();
    }
}
答案: A
;
版权声明: 本文为智客工坊「楠木大叔」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。

results matching ""

    No results matching ""