Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
汪远航 
PDM开发小组
忘记自己学过的编程语言 
忘记类型 
忘记链表 
忘记for循环 
忘记while 
忘记方法调用
[A]是一个集合
Scalable Language “ ” 
China Mobile

Recommended for you

Ch5 教學
Ch5 教學Ch5 教學
Ch5 教學

Ch5 教學

Data Analysis with Python - Pandas | WeiYuan
Data Analysis with Python - Pandas | WeiYuanData Analysis with Python - Pandas | WeiYuan
Data Analysis with Python - Pandas | WeiYuan

Data Analysis with Python - Pandas | WeiYuan

Ch9 教學
Ch9 教學Ch9 教學
Ch9 教學

Ch9 教學

1 • 引入 
2 • 函数式编程(FP) 
3 • 面向对象(OO) 
4 • 类型系统(Type System) 
5 • 单子(Monad)
搭建了当前的Javac 
Generic Java的设计者之一 
Martin 
Odersky 
--Scala的设计者 
编译成Java字节码 
与Java几乎无缝调用 
静态类型 
强大的类型系统 
Who?
Erlang 
Java 
Lisp 
天下语言出Lisp, 
且Scala的设计哲学是和Lisp比较近的 
Haskell 
What? 
JVM 
与Java相互调用 
Monad 
并行计算模型 
类型系统 
Scala
Linkedin 
Spark 
Coursera 
Meetup 
Gilt 
Four 
square 
谁在使用Scala? 
Scala的使用比较多样化,既有Spark的应用,也有很多网站使用Scala做后端

Recommended for you

Ch8 教學
Ch8 教學Ch8 教學
Ch8 教學

Ch8 教學

Python basic - v01
Python   basic - v01Python   basic - v01
Python basic - v01

Python basic

蟒蛇
Java8 lambda
Java8 lambdaJava8 lambda
Java8 lambda
java
Spark 
阿里中间 
件团队 
蘑菇街 
看处方 
乔布堂 
唯品会 
谁在使用Scala? 
大公司基本都是由Spark驱动,且用Scala做中间件的较多,对外暴露语言无关的接口
优点 
1多范式混合,表达能力强 
2可以调用Java包,兼容性强 
3静态强类型,直接编译为二 
进制码,速度与Java不相上下
3类型系统很复杂,学习 
曲线陡峭 
缺点 
1函数式编程 
2函数式编程
他没有那么多的括号

Recommended for you

Introduction to Basic Haskell Components (In Chinese)
Introduction to Basic Haskell Components (In Chinese)Introduction to Basic Haskell Components (In Chinese)
Introduction to Basic Haskell Components (In Chinese)

In 2012, we had the first Chinese functional meetup about general functional programming techniques in Taipei. I gave this talk to introduce several classes in the famous Typeclassesopedia article.

functional programminghaskell
Python串列資料應用
Python串列資料應用Python串列資料應用
Python串列資料應用

串列資料結構, 串列函式, 串列方法, 串列的運算子, 字串與串列轉換, 串列的排序作業, 多維串列, 應用實例:撲克牌梭哈遊戲

pythonlist吳錫修
Python元組,字典,集合
Python元組,字典,集合Python元組,字典,集合
Python元組,字典,集合

Python的資料結構, 元組資料與應用, 字典資料與應用, 集合資料與應用, 使用sorted()函式, 使用enumerate()函式, 應用實例 – 井字棋遊戲

pythonlisttuple
countChange :: Int -> Int 
countChange amount = let coins = [1, 2, 5, 10, 20, 50, 100, 200] 
in cc coins amount 
where 
cc :: [Int] -> Int -> Int 
cc [] remain = 0 
cc coins remain 
| remain < 0 = 0 
| remain == 0 = 1 
| otherwise = cc coins (remain - (head coins)) + 
cc (tail coins) remain 
也不至于基本没有括号
type Segment = (List[Int], List[Int], List[Int]) 
object Split { 
def unapply (xs: List[Int]) = { 
Extractor 
val pivot = xs(xs.size / 2) 
@tailrec 
def partition (s: Segment, ys: List[Int]): Segment = { 
val (left, mid, right) = s 
ys match { 
Guard 
case Nil => s 
case head :: tail if head < pivot => partition((head :: left, mid, right), tail) 
case head :: tail if head == pivot => partition((left, head :: mid, right), tail) 
case head :: tail if head > pivot => partition((left, mid, head :: right), tail) 
} 
} 
Some(partition((Nil, Nil, Nil), xs)) 
} 
} 
def qsort(xs: List[Int]): List[Int] = xs match { 
case Nil => xs 
case Split(left, pivot, right) => qsort(left) ::: pivot ::: qsort(right) 
} 
Quick Sort 
尾递归 
模式匹配
尾递归
Scala+RDD

Recommended for you

Python learn guide
Python learn guidePython learn guide
Python learn guide

a basic view for python learn

Ch7 教學
Ch7 教學Ch7 教學
Ch7 教學

Ch7 教學

Python入門:5大概念初心者必備 2021/11/18
Python入門:5大概念初心者必備 2021/11/18Python入門:5大概念初心者必備 2021/11/18
Python入門:5大概念初心者必備 2021/11/18

透過5概念分享對Python的理解 1.你該知道Python的慣例 2.初探python資料結構 3.什麼是if-else, for , while 4.Code能重複用的函式 5.類別(class)是何物

pythonbeginner初心者
var list = (1 to 100).toArray 
for (int i = 1; i <= 100; i++) { 
list[i] += 1 
} 
list = list.map(1 +) 
为 
什 
么 
要 
函 
数 
式 
编 
程
var list = (1 to 100).toArray 
for (int i = 1; i <= 100; i++) { 
list[i] += 1 
} 
list = list.view.map(1 +) 
为 
什 
么 
要 
函 
数 
式 
编 
程
var list = (1 to 100).toArray 
for (int i = 1; i <= 100; i++) { 
list[i] += 1 
} 
list = list.par.map(1 +) 
为 
什 
么 
要 
函 
数 
式 
编 
程
6 ^ 6 
6 * 6 * 6 * 6 * 6 * 6 
def ^(x: Int, y: Int) = { 
if (y == 0) 1 
else if (x % 2 == 0) ^(x * x, y / 2) 
else x * ^(x, y – 1) 
} 
为 
什 
么 
要 
函 
数 
式 
编 
程

Recommended for you

Python学习笔记
Python学习笔记Python学习笔记
Python学习笔记

Python note

python
第5章数组
第5章数组第5章数组
第5章数组

sdg

c
Quick Sort 
object Split { 
def unapply (xs: List[Int]) = { 
val pivot = xs(xs.size / 2) 
Some(xs.partitionBy(pivot)) 
} 
} 
def qsort(xs: List[Int]): List[Int] = xs match { 
case Nil => xs 
case Split(left, pivot, right) => qsort(left) ::: pivot ::: qsort(right) 
}
Quick Sort 
隐式转换 
type Segment = (List[Int], List[Int], List[Int]) 
implicit class ListWithPartition(list: List[Int]) { 
def partitionBy(p: Int): Segment = { 
val idenElem = (List[Int](), List[Int](), List[Int]()) 
def partition(result: Segment, x: Int): Segment = { 
val (left, mid, right) = result 
if (x < p) (x :: left, mid, right) 
else if (x == p) (left, x :: mid, right) 
else (left, mid, x :: right) 
} 
list.foldLeft(idenElem)(partition) 
} 
}
副作用 
值与址 
class Pair[A](var x: A, var y: A) { 
def modifyX(x: A) = this.x = x 
def modifyY(y: A) = this.y = y 
} 
var pair = new Pair(1, 2) 
var pair1 = new Pair(pair, pair) 
var pair2 = new Pair(pair, new Pair(1, 2)) 
pair.modifyX(3)
副作用 
结合律 
var variable = 0 
implicit class FooInt(i: Int) { 
def |+|(j: Int) = { 
variable = (i + j) / 2 
i + j + variable 
} 
} 
(1 |+| 2) |+| 3 
1 |+| (2 |+| 3) 
= 10 
= 12

Recommended for you

Python入門:5大概念初心者必備
Python入門:5大概念初心者必備Python入門:5大概念初心者必備
Python入門:5大概念初心者必備

透過5概念分享對Python的理解 1.你需知道index, string, slicing 2.初探python資料結構 3.什麼是if-else, for , while 4.Code能重複用的函式 5.類別(class)是何物

beginnerpython初心者
Appendix A 教學
Appendix A 教學Appendix A 教學
Appendix A 教學

Appendix A 教學

ScalaTrainings
ScalaTrainingsScalaTrainings
ScalaTrainings

Here are the answers to your questions: 1. The main differences between a Trait and Abstract Class in Scala are: - Traits can be mixed in to classes using with, while Abstract Classes can only be extended. - Traits allow for multiple inheritance as they can be mixed in, while Abstract Classes only allow single inheritance. - Abstract Classes can have fields and constructor parameters while Traits cannot. - Abstract Classes can extend other classes, while Traits can only extend other Traits. 2. abstract class Animal { def isMammal: Boolean def isFriendly: Boolean = true def summarize: Unit = { println("Characteristics of animal:") }

X + y: (Int, Int) => Int 
X + : Int => Int 
柯里化 
fold(z: Int)(f: (Int, Int) => Int) 
val list = List(1, 2, 3, 4) 
def fold0: ((Int, Int) => Int) => Int = 
list.foldLeft(0) 
def fold1: ((Int, Int) => Int) => Int = 
list.foldLeft(1) 
fold0((x, y) => x + y) 
fold1((x, y) => x * y)
A 
B 
[A] -> (A -> B) -> [B] 高阶函数 
Map
[A] -> (A -> Boolean) -> [A] 高阶函数 
A 
A 
Filter 
A 
?
[A] -> B -> (B -> A -> B) -> [B] 高阶函数 
A 
B 
Fold 
自然 
元素

Recommended for you

Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark

This document provides an introduction to Apache Spark, including its core components, architecture, and programming model. Some key points: - Spark uses Resilient Distributed Datasets (RDDs) as its fundamental data structure, which are immutable distributed collections that allow in-memory computing across a cluster. - RDDs support transformations like map, filter, reduce, and actions like collect that return results. Transformations are lazy while actions trigger computation. - Spark's execution model involves a driver program that coordinates tasks on worker nodes using an optimized scheduler. - Spark SQL, MLlib, GraphX, and Spark Streaming extend the core Spark API for structured data, machine learning, graph processing, and stream processing

big datascalakafka
Workshop Scala
Workshop ScalaWorkshop Scala
Workshop Scala

I used these slides for a Scala workshop that I gave. They are based on these: http://www.scala-lang.org/node/4454. Thanks to Alf Kristian Støyle and Fredrik Vraalsen for sharing!

programmingworkshopscala
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015

- The document discusses a presentation given by Jongwook Woo on introducing Spark and its uses for big data analysis. It includes information on Woo's background and experience with big data, an overview of Spark and its components like RDDs and task scheduling, and examples of using Spark for different types of data analysis and use cases.

sparkbig datascala
[A] 
A 
[[A]] -> [A] 高阶函数 
Flatten
A 
B 
Map 
[A] -> (A -> B) -> [B] 
Par 
高阶函数
惰性求值惰性求值 
val foo = List(1, 2, 3, 4, 5) 
baz = foo.map(5 +).map(3 +).filter(_ > 10).map(4 *) 
baz.take(2) 
我们却得到了 
foo.map(5 +) 
foo.map(5 +).map(3 +) 
foo.map(5 +).map(3 +).filter(_ > 10) 
三个中间结果 
在命令式语言中: 
for(int i = 0; i < 5; ++i) { 
int x = foo[i] + 5 + 3 
if (x > 10) 
bar.add(x * 4) 
else 
continue; 
{ 
在我们声明时 
我们想要的是一个愿望(计算) 
而不是结果
A 
B 
Map 
[A] -> (A -> B) -> [B] 
View 
高阶函数

Recommended for you

NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides

This document provides an overview of a machine learning workshop including tutorials on decision tree classification for flight delays, clustering news articles with k-means clustering, and collaborative filtering for movie recommendations using Spark. The tutorials demonstrate loading and preparing data, training models, evaluating performance, and making predictions or recommendations. They use Spark MLlib and are run in Apache Zeppelin notebooks.

HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce

This Hadoop HDFS Tutorial will unravel the complete Hadoop Distributed File System including HDFS Internals, HDFS Architecture, HDFS Commands & HDFS Components - Name Node & Secondary Node. Not only this, even Mapreduce & practical examples of HDFS Applications are showcased in the presentation. At the end, you'll have a strong knowledge regarding Hadoop HDFS Basics. Session Agenda: ✓ Introduction to BIG Data & Hadoop ✓ HDFS Internals - Name Node & Secondary Node ✓ MapReduce Architecture & Components ✓ MapReduce Dataflows ---------- What is HDFS? - Introduction to HDFS The Hadoop Distributed File System provides high-performance access to data across Hadoop clusters. It forms the crux of the entire Hadoop framework. ---------- What are HDFS Internals? HDFS Internals are: 1. Name Node – This is the master node from where all data is accessed across various directores. When a data file has to be pulled out & manipulated, it is accessed via the name node. 2. Secondary Node – This is the slave node where all data is stored. ---------- What is MapReduce? - Introduction to MapReduce MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java. ---------- What are HDFS Applications? 1. Data Mining 2. Document Indexing 3. Business Intelligence 4. Predictive Modelling 5. Hypothesis Testing ---------- Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance. Email: sales@skillspeed.com Website: https://www.skillspeed.com

hadoop & mapreducemapreduce in hdfsflume
Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]

這是Spark手把手的第二個活動[e2-spk-s02] 主要是介紹Spark SQL與DataFrame的應用。 Spark SQL結構化資料處理應用 不同資料結構型態的挑戰 SPARK RDD概念範例#2-進化版 Spark-SQL與DataFrame介詔 Spark-SQL與DataFrame應用範例

spark dataframespark sqleighty20.cc
流与惰性求值 
val fibs: Stream[Int] = 0 #:: 1 #:: fibs.zip(fibs.tail).map(n => n._1 + n._2) 
Quora 
惰性求值 
zip = ([A], [B]) => [(A, B)]
惰性求值 
Lazy val x = 3 + 3 
def number = {println("OK"); 3 + 3} 
class LazyValue(expr: => Int) { 
var evaluated: Boolean = false 
var value: Int = -1 
def get: Int = { 
if (!evaluated) { 
value = expr 
evaluated = true 
} 
value 
} 
} 
Call By Name 
val lazyValue = new LazyValue(number) 
println(lazyValue.get) 
println(lazyValue.get) 
Thinking in Java 
Map可以用装饰器模式来实现
Higher-Order Functions 
map(f: T => U): A[U] 
filter(f: T => Boolean): A[T] 
flatMap(f: T => A[T]): A[T] 
groupBy(f: T => K): A[(K, List[T])] 
sortBy(f: T => K): A[T] 
NEW 
Count: Int 
Force: A[T] 
Reduce(f: (T, T) => T): T 
T 
r 
a 
n 
f 
o 
r 
m 
a 
t 
i 
o 
n 
A 
c 
t 
i 
o 
n
面向对象 
Scala是一门面向对象的语言,至少面向对象的纯度要比Java高。 
包括1,2,1.1,等在内都是对象。 
我们所见到的1 + 2实际上是1.+(2) 
但在编译时会用原始类型来替代。 
而函数x: Int => x.toString 
则是Function1[Int, String] 
所以,你可以map(5 +) 
但不能map(+ 5)

Recommended for you

Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]

了解了Spark的基本架構之後, 對大多數的人來說如何可以開始把Spark跟自己的日常資料的處理程序掛上勾。Spark手把手:[e2-spk-s03]著重在體驗如何Spark SQL來處理資料庫的資料以及一些實用的小技巧。 內容包括了: • DataFrame Pivot功能 • Parquet資料格式與Spark的結合 • Spark SQL給合JDBC資訊源 • Spark-SQL/Zeppelin與北風資料庫結合練習

spark dataframeapache zeppelinpivot
Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]

「Spark 手把手 - 快速上手營」適合對於想了解如何應用Apache Spark/Hadoop/Kafka來建構大數據資料平台或是處理分析大數據的工程師或工程主管來參加。參加本次活動,參與者都將會有機會體驗 Apache Spark Family 中各項知名的專案,包括 Spark、Spark-SQL、Spark-Streaming、Kafka、HDFS、Parquet、Zeppelin 以及 Java, Scala等工具與技術。 整個活動設計區分為兩大階段: 階段一 Spark應用於批次型資料處理 (Spark for Batch process): 約4週, 每週2小時 階段二 Spark應用於串流式事件處理 (Spark for Streaming process): 約2週, 每週2小時

e2-spk-s01apache sparkspark
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)

Code examples demonstrating Functional Programming concepts, with JavaScript and Haskell. Part 1 can be found here - http://www.slideshare.net/calvinchengx/functional-programming-part01 Source code can be found here - http://github.com/calvinchengx/learnhaskell Let me know if you spot any errors! Thank you! :-)

functional programminghaskelljavascript
一些语法糖 
class Sugar(i: Int) { 
def unary_- = -i 
def apply(expr: => Unit) = for (j <- 1 to i) expr 
def +(that: Int) = i + that 
def +:(that: Int) = I + that 
} 
目的是为了做好DSL 
和延续函数式编程习惯 
val sugar = new Sugar(2) 
请注意谨慎使用 
-sugar 
sugar(println("aha")) 
sugar + 5 
5 + sugar 
前缀 
中缀 
省略方法名 
所有字母 
| 
^ 
& 
< > 
= ! 
: 注意右结 
合 
+ - 
* / % 
其他字符 
右结合
Scala+RDD
Trait & Mix-in 
Mix-in是一种多继承的手段,同Interface一样,通过限制第二个父类的方式 
来限制多继承的复杂关系,但它具有默认的实现。 
1.通常的继承提供单一继承 
2.第二个以及以上的父类必须是Trait 
3.不能单独生成实例 
Scala中的Trait可以在编译时进行混合也可以在运行时混合。 
设想我们要描述一种鸟,它可以唱歌也可以跑;由于它是一只鸟,它当然可 
以飞。 
abstract class Bird(kind: String) { 
val name: String 
def singMyName = println(s"$name is singing") 
val capability: Int 
def run = println(s"I can run $capability meters!!!") 
def fly = println(s"flying of kind: $kind") 
} 
但显然,一个人也可以跑可以唱歌……..不过他还可以编程. 
(虽然我不歧视鸟类,不过如果碰到会编程的鸟请通知我) 
继承
trait Runnable { 
val capability: Int 
def run = println(s"I can run $capability meters!!!") 
} 
trait Singer { 
val name: String 
def singMyName = println(s"$name is singing") 
} 
abstract class Bird(kind: String) { 
def fly = println(s"flying of kind: $kind") 
} 
继承

Recommended for you

Spark手把手:[e2-spk-s04]
Spark手把手:[e2-spk-s04]Spark手把手:[e2-spk-s04]
Spark手把手:[e2-spk-s04]

Spark手把手:[e2-spk-s04], 我們要以DevOpts的角度來討論Spark與現有資料處理系統的結合與異同, 同時討論一些在開發、設計、除錯、效能優化及部置Spark程式的相關問題。

spark trainingspark-devopsspark
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training

The document provides an agenda for a DevOps advanced class on Spark being held in June 2015. The class will cover topics such as RDD fundamentals, Spark runtime architecture, memory and persistence, Spark SQL, PySpark, and Spark Streaming. It will include labs on DevOps 101 and 102. The instructor has over 5 years of experience providing Big Data consulting and training, including over 100 classes taught.

spark summit 2015apache spark
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program

This document provides an overview of Hadoop and related big data technologies. It begins with defining big data and discussing why traditional systems are inadequate. It then introduces Hadoop as a framework for distributed storage and processing of large datasets. The key components of Hadoop - HDFS for storage and MapReduce for processing - are described at a high level. HDFS architecture and read/write operations are outlined. MapReduce paradigm and an example word count job are also summarized. Finally, Hive is introduced as a data warehouse tool built on Hadoop that provides SQL-like queries for large datasets.

hadooptraiing
class Nightingale extends Bird("Nightingale") with Singer with Runnable { 
val capability = 20 
val name = "poly" 
} 
val myTinyBird = new Nightingale 
myTinyBird.fly 
myTinyBird.singMyName 
myTinyBird.run 
class Coder(language: String) { 
val capability = 10 
val name = "Handemelindo" 
def code = println(s"coding in $language") 
} 
val me = new Coder("Scala") with Runnable with Singer 
me.code 
me.singMyName 
me.run 
继承
一个小伙伴 
伴生对象 
object Sugar { 
def apply(i: Int) = new Sugar(i) 
} 
可以在此实现工厂模式 
object本身是单例模式的,注意线程问题!!!
一些小伙伴 
Case Class与ADT 
abstract class Tree 
case class Leaf(info: String) extends Tree 
case class Node(left: Tree, right: Tree) extends Tree 
def traverse(tree: Tree): Unit = { 
tree match { 
case Leaf(info) => println(info) 
case Node(left, right) => { 
traverse(left) 
traverse(right) 
} 
} 
} 
val tree: Tree = new Node(new Node(new Leaf("1"), new Leaf("2")), new Leaf("3")) 
traverse(tree)
类型系统 
如果你是一个C程序员,那么类型系统是: 
用来告诉计算机它需要用多少字节来存放这些数字的指标 
如果你是一个Java程序员,那么类型系统是: 
用来表示存放实例的地方 
这样编译器就可以检查你的程序是否连续一致 
如果你是一个R程序员,那么类型系统是: 
用来标志对这些变量应该用何种统计计算 
如果你是一个Ruby程序员,那么类型系统是: 
你应该回避的东西 
而对于Scala程序员,类型系统是: 
如同UML之于Java,是正确性的保证,是程序的蓝图 
猜猜这是什么:e.g. [(K1, V1)] -> [(K2, [V2])] -> [(K2, V3)]

Recommended for you

Getting started with Apache Spark
Getting started with Apache SparkGetting started with Apache Spark
Getting started with Apache Spark

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

spark streamapache sparkspark streaming
Spark tutorial py con 2016 part 2
Spark tutorial py con 2016   part 2Spark tutorial py con 2016   part 2
Spark tutorial py con 2016 part 2

Discover insight about car manufacturers from Twitter Data using a Python Notebook connected to Apache Spark

apache sparkdashdbtwitter
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark

The document outlines an agenda for a conference on Apache Spark and data science, including sessions on Spark's capabilities and direction, using DataFrames in PySpark, linear regression, text analysis, classification, clustering, and recommendation engines using Spark MLlib. Breakout sessions are scheduled between many of the technical sessions to allow for hands-on work and discussion.

data scienceapache sparkanalytics
* 
Any 
Int 
1 
Pair[Int, Int] 
(1, 2) 
List[Int] 
[1, 2, 3] 
* * * * * 
List Pair 
Kind 
Type 
Value 
类型构造器 
类别 
子类型 
Generics of a Higher Kind - Martin Odersky 
=> => => 
Proper 
Type
type Int :: * 
type String :: * 
type (Int => String) :: * 
type List[Int] :: * 
type List :: ? 
type Function1 :: ?? 
做一些抽象练习吧 
type List :: * => * 
type function1 :: * => * => * Function1[-T, +R] 
def id(x: Int) = x 
type Id[A] = A 
def id(f: Int => Int, x: Int) = f(x) 
type id[A[_], B] = A[B]
设想,我们的程序要返回结果: 
(Set(x,x,x,x,x), List(x,x,x,x,x,x,x,x,x,x)) 
(* -> *) -> (* -> *) -> * 
type Pair[K[_], V[_]] = (K[A], V[A]) forSome { type A } 
val pair: Pair[Set, List] = (Set(“42”), List(52)) 
val pair: Pair[Set, List] = (Set(42), List(52)) 
做一些抽象练习吧
又例如,我们有以下这个函数: 
def foo[A[_]](bar: A[Int]): A[Int] = bar 
可以喂给它(* => *),例如 
val foo1 = foo[List](List(1, 2, 3, 5, 8, 13)) 
如果我们有: 
def baz(x: Int) = println(x) 
回想起type function1 :: * => * => * 
Type Lambda 
肿么办? 
因此: * => * = *[Unit] => *[Unit] 
val foo2 = foo[ ({type F[X] = Function1[X, Unit]})#F ](baz)

Recommended for you

Spark tutorial pycon 2016 part 1
Spark tutorial pycon 2016   part 1Spark tutorial pycon 2016   part 1
Spark tutorial pycon 2016 part 1

This document outlines steps for developing analytic applications using Apache Spark and Python. It covers prerequisites for accessing flight and weather data, deploying a simple data pipe tool to build training, test, and blind datasets, and using an IPython notebook to train predictive models on flight delay data. The agenda includes accessing necessary services on Bluemix, preparing the data, training models in the notebook, evaluating model accuracy, and deploying models.

pythonsparknotebook
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)

The document provides an overview of Spark and its machine learning library MLlib. It discusses how Spark uses resilient distributed datasets (RDDs) to perform distributed computing tasks across clusters in a fault-tolerant manner. It summarizes the key capabilities of MLlib, including its support for common machine learning algorithms and how MLlib can be used together with other Spark components like Spark Streaming, GraphX, and SQL. The document also briefly discusses future directions for MLlib, such as tighter integration with DataFrames and new optimization methods.

spark summit 2015apache spark
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX

Slides for the GraphX part of our Strata 2016 tutorial http://goo.gl/SfuSnn Updated for GraphX at Strata London

graphxapache sparkgraph analytics
trait Monoid[A]{ 
val zero: A 
def append(x: A, y: A): A 
} 
object IntNum extends Monoid[Int] { 
val zero = 0 
def append(x: Int, y: Int) = x + y 
} 
object DoubleNum extends Monoid[Double] { 
val zero = 0d 
def append(x: Double, y: Double) = x + y 
} 
def sum[A](nums: List[A])(tc: Monoid[A]) = 
nums.foldLeft(tc.zero)(tc.append) 
sum(List(1, 2, 3, 5, 8, 13))(IntNum) 
sum(List(3.14, 1.68, 2.72))(DoubleNum) 
对态射进行抽象
trait Monoid[A]{ 
val zero: A 
def append(x: A, y: A): A 
} 
object IntNum extends Monoid[Int] { 
val zero = 0 
def append(x: Int, y: Int) = x + y 
} 
object DoubleNum extends Monoid[Double] { 
val zero = 0d 
def append(x: Double, y: Double) = x + y 
} 
def sum[A](nums: List[A])(implicit tc: Monoid[A]) = 
nums.foldLeft(tc.zero)(tc.append) 
sum(List(1, 2, 3, 5, 8, 13)) 
sum(List(3.14, 1.68, 2.72)) 
implicit 
implicit 
Type Class 
1.抽象分离 
2.可组合 
3.可覆盖 
4.类型安全 
Type Class 
val list = List(1,3,234,56,5346,34) 
list.sorted sorted[B >: A](implicit ord: math.Ording[B])
逆变与协变 
List[+T] 
class Person(name: String) { 
def shut = println(s"I am $name") 
} 
class Coder(language: String, name: String) extends Person(name) { 
def code = println(s"Coding in $language") 
} 
val persons: List[Coder] = List(new Coder("Java", "Jeff"), 
new Coder("Haskell", "Harry")) 
def traverse(persons: List[Person]) = persons.foreach(_.shut) 
traverse(persons)
逆变与协变 
Function1[-T, +R] 
定义域值域 
函数f1 
X2 X1 Y1 Y2 
函数2

Recommended for you

Functional Programming for OO Programmers (part 1)
Functional Programming for OO Programmers (part 1)Functional Programming for OO Programmers (part 1)
Functional Programming for OO Programmers (part 1)

The Why and Benefits of Functional Programming paradigm. Part 2 with source code can be found here: http://www.slideshare.net/calvinchengx/functional-programming-for-oo-programmers-part-2 Related source code https://github.com/calvinchengx/learnhaskell

javascriptfunctional programminghaskell
functional-scala
functional-scalafunctional-scala
functional-scala

Scala中的函数式特征

scalafunctional
Js is js(程劭非) (1)
Js is js(程劭非) (1)Js is js(程劭非) (1)
Js is js(程劭非) (1)
Monad 
自函子范畴上的幺半群 
赫尔曼外尔-----思维的数学方式: 
现在到了数学抽象中最关键的一步:让我们忘记这些符号所表示的对象。我们不 
应在这里停步,有许多操作可以应用于这些符号,而根本不必考虑他们到底代表 
着什么东西。 
Philip Wadler
Group 
什么是群(Group) 
(1)封闭性(Closure):对于任意a,b∈G,有a*b∈G 
(2)结合律(Associativity):对于任意a,b,c∈G,有(a*b)*c=a*(b*c) 
(3)幺元(Identity):存在幺元e,使得对于任意a∈G,e*a=a*e=a 
(4)逆元:对于任意a∈G,存在逆元a^-1,使得a^-1*a=a*a^-1=e 
什么是半群(SemiGroup) 
只满足1,2, 
什么是幺半群(Monoid) 
满足1,2,3
Monoid 
废话少说,放码过来 
trait SemiGroup[T] { 
def append(a: T, b: T): T 
} 
trait Monoid[T] extends SemiGroup[T] { 
def zero: T 
} 
class listMonoid[T] extends Monoid[List[T]]{ 
def zero = Nil 
def append(a: List[T], b: List[T]) = a ++ b 
}
Functor 
函子(Functor)是什么 
Int List[Int] 
Functor 
Int List[Int]

Recommended for you

ncuma_pylab.pptx
ncuma_pylab.pptxncuma_pylab.pptx
ncuma_pylab.pptx

ncuma_pylab

Scala再探
Scala再探Scala再探
Scala再探

小米第二期scala交流演讲

scala
Arrays的Sort算法分析
Arrays的Sort算法分析Arrays的Sort算法分析
Arrays的Sort算法分析

1、Arrays.sort方法概述 2、分析int[]的排序实现 3、对象实现了可比较接口Comparable 进行比较 4、对象没有可比较性,进行比较时需要靠比较器进行 5、以上排序算法是qsort和合并排序算法的实现

parenttoolbox
Functor 
函子(Functor)是什么 
trait Functor[F[_]] { 
def map[A, B](fa: F[A], f: (A) => B): F[B] 
} 
map[B](f: (A) => B): List[B]
Scala+RDD
Monad 
自函子上的幺半群 
回想一下幺半群的单位元 
回想一下fold函数 
什么是自函子上的单位元呢? 
什么是自函子上的结合运算呢? 
Unit x >>= f ≡ f x 
M >>= unit ≡ m 
(m >>= f) >>= g ≡ m >>= (λx . F x >>= g) 
单位元:将元素提升进计算语境 
结合律:结合简单运算形成复杂运算
一些常见Monad 
Option 
Option或叫Maybe,表示可能失败的计算 
由Some(Value)或None表示 
Some(x) fMap (f: A => Some[B]) = Some(f(x)) 
None fMap(f: A => Some[B]) = None 
Unit = Some 
val maybe: Option[Int] = Some(4) 
val none: Option[Int] = None 
def calculate(maybe: Option[Int]): 
Option[Int] = for { 
value <- maybe 
} yield value + 5 
calculate(maybe) 
calculate(none)

Recommended for you

Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫

2012 Java TWO 你可以在以下鏈結找到中文內容: http://www.codedata.com.tw/java/understanding-lambda-closure-1-from-javascript-function-1/

java se 8lambdajdk8
Ihome inaction 篇外篇之fp介绍
Ihome inaction 篇外篇之fp介绍Ihome inaction 篇外篇之fp介绍
Ihome inaction 篇外篇之fp介绍

很早以前做的一次函数式编程的介绍,很不成熟,做个备份。

fp 函数式语言 introduction
Scala
ScalaScala
Scala

2012-2-26珠三角沙龙

一些常见Monad 
List 
集合本身是Proper type,它代表的是不确定性 
Unit = List 
val list1 = List(2, 4, 6, 8) 
val list2 = List(1, 3, 5, 7) 
for { 
value1 <- list1 
value2 <- list2 
} yield value1 + value2
Scala+RDD
1 •介绍 
2 •从FP看MR 
3 •从FP看RDD 
4 • RDD 
5 •MLlib
Spark 
一种通用并行计算框架 
Spark Map Reduce 
生态系统Spark平台已经基本成熟, 
但相关的Mllib、Spark SQL等依然在发展中 
非常成熟,有很多应用 
计算模型类Monadic,Functor Map Reduce 
存储主要是内存主要是磁盘 
编程风格面向集合面向接口

Recommended for you

Scala
ScalaScala
Scala

给同事写的一个scala的分享,基本上是按照《Programming in scala 2nd Edition》的大纲写的

scala
Hi Haskell
Hi HaskellHi Haskell
Hi Haskell

Haskell语言的概览性介绍,内容有: 若干语言特性(是什么让Haskell如此独特?):Lambda,Curry, Algebraic Data Type,Type Class, Purity,Lazy Evaluation; 对并行与并发的支持; 若干有启发性的例子以及性能问题; Haskell在工业界的应用;

haskellpurely functionalparser combinator
JavaScript 快速複習 2017Q1
JavaScript 快速複習 2017Q1JavaScript 快速複習 2017Q1
JavaScript 快速複習 2017Q1

1. 宣告式編程 2. 點來點去的鏈式調用如 [].map().filter().sort() 3. ECMAScript 6 和 7 新出現的語法(只講常用) 4. 非同步的基本概念複習 5. callback、ES6 Promise、async/await 之間的關係

javascript
Spark 
Spark SQL MLlib GraphX 
Spark 
Streaming 
Spark 
Map Reduce Monadic 
本地 
运行模式 
独立 
运行模式 
YARN Mesos 
HDFS Amazon S3 Hypertable Hbase etc.
优点 
1面向集合,便于开发 
2支持的计算模型较MR要多 
3内存计算速度更快,可以进 
行持久化以便于迭代;数据不 
“大”,还可兼顾 
“快”
缺点 
1内存消耗快,注意使用kryo等 
序列化库 
2惰性求值的计算时间不宜估计 
优化难度高
Word Count 
Map Reduce 
[(K1, V1)] -> [(K2, [V2])] -> [(K2, V3)] 
[Line] 
flatMap(_.split(“s+”)).map((_, 1)) 
[(Word, 1)] -> [(Word, [1])] -> [(Word, n)] 
groupBy(_._1) 
-> 
reduceBy(_._1)(_._2 + _._2)

Recommended for you

Scilab introduction(Scilab 介紹)
Scilab introduction(Scilab 介紹)Scilab introduction(Scilab 介紹)
Scilab introduction(Scilab 介紹)

Scilab basic introduction.(Scilab 基礎介紹。)

scilabscilab basic grammar(scilab 基礎語法)scilab introduction(scilab 介紹)
Jscex:案例、阻碍、体会、展望
Jscex:案例、阻碍、体会、展望Jscex:案例、阻碍、体会、展望
Jscex:案例、阻碍、体会、展望

Talks in ADC Workshop in July, 2012.

Pythonic
PythonicPythonic
Pythonic
python pythonic
map(f: T => U) 
filter(f: T => Boolean) 
flatMap(f: T => Seq[U]) 
sample(fraction: Float) 
groupByKey() 
reduceByKey(f: (V, V) => V) 
mapValues(f: V => W) 
RDD 
NEW 
Count() 
Collect() 
Reduce(f: (T, T) => T) 
Lookup(k: K) 
Save(path: String) 
take(n: Int) 
T 
r 
a 
n 
f 
o 
r 
m 
a 
t 
i 
o 
n 
A 
c 
t 
i 
o 
n 
union() 
join() 
cogroup() 
crossProduct 
sort(c Comparator[K]) 
partitionBy(p: Partitioner[K])
Word Count 
[(K1, V1)] -> [(K2, [V2])] -> [(K2, V3)] 
lines = spark.textFile("hdfs://...") 
words = lines.flatMap(_.split(“//s+”)) 
wordCounts = words.map((_, 1)) 
result = wordCounts.reduceByKey(_ + _) 
result.save(“hdfs://…”) 
RDD
什么是RDD 
RDD的特点 
• 不可变的、已分区的集合 
• 只能通过读取文件或Transformation的方式来创建 
• 容错 
通过血统重新计算 
• 可控制存储级别 
new StorageLevel(useDisk, useMemory, deserialized, replication) 
• 可缓存 
cache()方法 
• 粗粒度模型 
• 静态类型的
什么是RDD 
一个惰性的并行计算集合 
• 惰性: 
• 惰性的优点:单次计算,信息量充足,可自动批处理。 
每一个Transformation代表着该数据将被执行何种操作 
• 并行:我们将数据放在计算语境中 
计算语境会自动将计算并行化 
RDD是面向集合的

Recommended for you

Rde packagean zhuang_ji_ji_ben_cao_zuo_
Rde packagean zhuang_ji_ji_ben_cao_zuo_Rde packagean zhuang_ji_ji_ben_cao_zuo_
Rde packagean zhuang_ji_ji_ben_cao_zuo_
20161209-Julia Taiwan first meetup-julia語言入門
20161209-Julia Taiwan first meetup-julia語言入門20161209-Julia Taiwan first meetup-julia語言入門
20161209-Julia Taiwan first meetup-julia語言入門

Julia Taiwan 第一次 meetup

julialang
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...

Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Frameworks

swiftstandfordpeter pan
RDD的实现 
一个五元组 
• Partitions: 
一片数据原子,例如HDFS的块,代表数据 
• Preferred Location: 
列出了partition可以从哪里进行更快速的访问 
• Dependencies: 
与父节点的依赖,子节点是由父节点计算出来的 
• Computation: 
代表计算,在父节点的数据上应用该计算则可得到子节点的数据 
• Metadata: 
储存例如该节点的地址和分片方式的元数据
RDD的实现 
如何表示惰性计算 
对于我们目前见到的惰性计算,他们都是线性的,可以表示为 
Map +5 Map *7 Filter _ % 2 == 0 Collect 
但其他的计算呢?
RDD的实现 
如何表示惰性计算 
DAG 
通过拓扑排序: 
1. 追踪到源头开始进行计算 
2. 将不需要混合的数据划分到同一组处理当中
RDD的实现 
血统(Lineage) 
表示计算之间的联系: 
• Narrow Dependencies:开销小 
如Map, Union。 
表现为一个或多个父RDD的分区对应于一个子RDD分区 
可以本地化 
• Wide Dependencies:开销大 
如GroupBy。 
表现为一个父RDD分区对应多个子RDD分区 
需要Shuffling

Recommended for you

Jscex:案例、经验、阻碍、展望
Jscex:案例、经验、阻碍、展望Jscex:案例、经验、阻碍、展望
Jscex:案例、经验、阻碍、展望
jscex
Ppt 78-100
Ppt 78-100Ppt 78-100
Ppt 78-100

Ppt 78-100

Ppt 78-100
Ppt 78-100Ppt 78-100
Ppt 78-100

Ppt 78-100

RDD的执行 
SparkContext Cluster Manager 
Executor Cache 
Task Task 
Executor Cache 
Task Task
RDD的执行 
1.RDD直接从外部数据源创建(HDFS、本地文件等) 
2.RDD经历一系列的TRANSFORMATION 
3.执行ACTION,将最后一个RDD进行转换,输出到外部数据源。 
同时:自动优化分块,分发闭包,混合数据,均衡负载
MLlib 
SVM with SGD 
NB 
各类决策树 
Classification 
LabeledPoint(Double, Vector) 
val data = sc.textFile(“….") 
val parsedData = data.map { line => 
val parts = line.split(' ') 
LabeledPoint(parts(0).toDouble, parts.tail.map(x => x.toDouble).toArray) 
} 
val numIterations = 20 
val model = SVMWithSGD.train(parsedData, numIterations) 
val labelAndPreds = parsedData.map { point => 
val prediction = model.predict(point.features) 
(point.label, prediction) 
} 
val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
MLlib 
逻辑回归Regression 
岭回归与 
拉锁回归 
LabeledPoint(Double <- Vector) 
val data = sc.textFile(“….") 
val parsedData = data.map { line => 
val parts = line.split(',') 
LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray) 
} 
val numIterations = 20 
val model = LinearRegressionWithSGD.train(parsedData, numIterations) 
val valuesAndPreds = parsedData.map { point => 
val prediction = model.predict(point.features) 
(point.label, prediction) 
} 
val MSE = valuesAndPreds.map{ case(v, p) => 
math.pow((v - p), 2)}.reduce(_ + _) / valuesAndPreds.count

Recommended for you

MLlib 
Clustering 
Clustering: 
k均值 
及其变种k均值++ Vector 
val data = sc.textFile(“….") 
val parsedData = data.map( _.split(' ').map(_.toDouble)) 
val numIterations = 20 
val numClusters = 2 
val clusters = KMeans.train(parsedData, numClusters, numIterations) 
val WSSSE = clusters.computeCost(parsedData)
MLlib 
支持显性和隐性的ALS 
Collaborate Filtering 
Rating(Int, Int, Double) 
val data = sc.textFile(“….") 
val ratings = data.map(_.split(',') match { 
case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) 
}) 
val numIterations = 20 
val model = ALS.train(ratings, 1, 20, 0.01) 
val usersProducts = ratings.map{ case Rating(user, product, rate) => (user, product)} 
val predictions = model.predict(usersProducts).map{ 
case Rating(user, product, rate) => ((user, product), rate) 
} 
val ratesAndPreds = ratings.map{ 
case Rating(user, product, rate) => ((user, product), rate) 
}.join(predictions) 
val MSE = ratesAndPreds.map{ 
case ((user, product), (r1, r2)) => math.pow((r1 - r2), 2) 
}.reduce(_ + _) / ratesAndPreds.count
THANKS 
of your attention! 
Happy! 
Hacking! 
China Mobile

More Related Content

What's hot

Appendix B 範例
Appendix B 範例Appendix B 範例
Appendix B 範例
hungchiayang1
 
Use Lambdas in Android
Use Lambdas in AndroidUse Lambdas in Android
Use Lambdas in Android
koji lin
 
Ch4 教學
Ch4 教學Ch4 教學
Ch4 教學
hungchiayang1
 
Ch5 教學
Ch5 教學Ch5 教學
Ch5 教學
hungchiayang1
 
Data Analysis with Python - Pandas | WeiYuan
Data Analysis with Python - Pandas | WeiYuanData Analysis with Python - Pandas | WeiYuan
Data Analysis with Python - Pandas | WeiYuan
Wei-Yuan Chang
 
Ch9 教學
Ch9 教學Ch9 教學
Ch9 教學
hungchiayang1
 
Ch8 教學
Ch8 教學Ch8 教學
Ch8 教學
hungchiayang1
 
Python basic - v01
Python   basic - v01Python   basic - v01
Python basic - v01
ssuser5e7722
 
Java8 lambda
Java8 lambdaJava8 lambda
Java8 lambda
koji lin
 
Introduction to Basic Haskell Components (In Chinese)
Introduction to Basic Haskell Components (In Chinese)Introduction to Basic Haskell Components (In Chinese)
Introduction to Basic Haskell Components (In Chinese)
ChengHui Weng
 
Python串列資料應用
Python串列資料應用Python串列資料應用
Python串列資料應用
吳錫修 (ShyiShiou Wu)
 
Python元組,字典,集合
Python元組,字典,集合Python元組,字典,集合
Python元組,字典,集合
吳錫修 (ShyiShiou Wu)
 
Python learn guide
Python learn guidePython learn guide
Python learn guide
robin yang
 
Ch7 教學
Ch7 教學Ch7 教學
Ch7 教學
hungchiayang1
 
Python入門:5大概念初心者必備 2021/11/18
Python入門:5大概念初心者必備 2021/11/18Python入門:5大概念初心者必備 2021/11/18
Python入門:5大概念初心者必備 2021/11/18
Derek Lee
 
Python学习笔记
Python学习笔记Python学习笔记
Python学习笔记
Lingfei Kong
 
第5章数组
第5章数组第5章数组
第5章数组
summerfeng
 
Python入門:5大概念初心者必備
Python入門:5大概念初心者必備Python入門:5大概念初心者必備
Python入門:5大概念初心者必備
Derek Lee
 
Appendix A 教學
Appendix A 教學Appendix A 教學
Appendix A 教學
hungchiayang1
 

What's hot (20)

Appendix B 範例
Appendix B 範例Appendix B 範例
Appendix B 範例
 
Use Lambdas in Android
Use Lambdas in AndroidUse Lambdas in Android
Use Lambdas in Android
 
Ch4 教學
Ch4 教學Ch4 教學
Ch4 教學
 
Ch5 教學
Ch5 教學Ch5 教學
Ch5 教學
 
Data Analysis with Python - Pandas | WeiYuan
Data Analysis with Python - Pandas | WeiYuanData Analysis with Python - Pandas | WeiYuan
Data Analysis with Python - Pandas | WeiYuan
 
Ch9 教學
Ch9 教學Ch9 教學
Ch9 教學
 
Ch8 教學
Ch8 教學Ch8 教學
Ch8 教學
 
Python basic - v01
Python   basic - v01Python   basic - v01
Python basic - v01
 
Java8 lambda
Java8 lambdaJava8 lambda
Java8 lambda
 
Introduction to Basic Haskell Components (In Chinese)
Introduction to Basic Haskell Components (In Chinese)Introduction to Basic Haskell Components (In Chinese)
Introduction to Basic Haskell Components (In Chinese)
 
Python串列資料應用
Python串列資料應用Python串列資料應用
Python串列資料應用
 
Python元組,字典,集合
Python元組,字典,集合Python元組,字典,集合
Python元組,字典,集合
 
Python learn guide
Python learn guidePython learn guide
Python learn guide
 
Ch7 教學
Ch7 教學Ch7 教學
Ch7 教學
 
Python入門:5大概念初心者必備 2021/11/18
Python入門:5大概念初心者必備 2021/11/18Python入門:5大概念初心者必備 2021/11/18
Python入門:5大概念初心者必備 2021/11/18
 
Ch8
Ch8Ch8
Ch8
 
Python学习笔记
Python学习笔记Python学习笔记
Python学习笔记
 
第5章数组
第5章数组第5章数组
第5章数组
 
Python入門:5大概念初心者必備
Python入門:5大概念初心者必備Python入門:5大概念初心者必備
Python入門:5大概念初心者必備
 
Appendix A 教學
Appendix A 教學Appendix A 教學
Appendix A 教學
 

Viewers also liked

ScalaTrainings
ScalaTrainingsScalaTrainings
ScalaTrainings
Chinedu Ekwunife
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
Javier Arrieta
 
Workshop Scala
Workshop ScalaWorkshop Scala
Workshop Scala
Bert Van Vreckem
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
Jongwook Woo
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
Nathan Halko
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
Skillspeed
 
Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]
Erhwen Kuo
 
Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]
Erhwen Kuo
 
Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]
Erhwen Kuo
 
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)
Calvin Cheng
 
Spark手把手:[e2-spk-s04]
Spark手把手:[e2-spk-s04]Spark手把手:[e2-spk-s04]
Spark手把手:[e2-spk-s04]
二文 郭
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
Spark Summit
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
Skillspeed
 
Getting started with Apache Spark
Getting started with Apache SparkGetting started with Apache Spark
Getting started with Apache Spark
Habib Ahmed Bhutto
 
Spark tutorial py con 2016 part 2
Spark tutorial py con 2016   part 2Spark tutorial py con 2016   part 2
Spark tutorial py con 2016 part 2
David Taieb
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
Krishna Sankar
 
Spark tutorial pycon 2016 part 1
Spark tutorial pycon 2016   part 1Spark tutorial pycon 2016   part 1
Spark tutorial pycon 2016 part 1
David Taieb
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
Functional Programming for OO Programmers (part 1)
Functional Programming for OO Programmers (part 1)Functional Programming for OO Programmers (part 1)
Functional Programming for OO Programmers (part 1)
Calvin Cheng
 

Viewers also liked (20)

ScalaTrainings
ScalaTrainingsScalaTrainings
ScalaTrainings
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Workshop Scala
Workshop ScalaWorkshop Scala
Workshop Scala
 
Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015Spark tutorial @ KCC 2015
Spark tutorial @ KCC 2015
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
HDFS & MapReduce
HDFS & MapReduceHDFS & MapReduce
HDFS & MapReduce
 
Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]Spark手把手:[e2-spk-s02]
Spark手把手:[e2-spk-s02]
 
Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]Spark手把手:[e2-spk-s03]
Spark手把手:[e2-spk-s03]
 
Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]Spark手把手:[e2-spk-s01]
Spark手把手:[e2-spk-s01]
 
Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)Functional Programming for OO Programmers (part 2)
Functional Programming for OO Programmers (part 2)
 
Spark手把手:[e2-spk-s04]
Spark手把手:[e2-spk-s04]Spark手把手:[e2-spk-s04]
Spark手把手:[e2-spk-s04]
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Getting started with Apache Spark
Getting started with Apache SparkGetting started with Apache Spark
Getting started with Apache Spark
 
Spark tutorial py con 2016 part 2
Spark tutorial py con 2016   part 2Spark tutorial py con 2016   part 2
Spark tutorial py con 2016 part 2
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Spark tutorial pycon 2016 part 1
Spark tutorial pycon 2016   part 1Spark tutorial pycon 2016   part 1
Spark tutorial pycon 2016 part 1
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Functional Programming for OO Programmers (part 1)
Functional Programming for OO Programmers (part 1)Functional Programming for OO Programmers (part 1)
Functional Programming for OO Programmers (part 1)
 

Similar to Scala+RDD

functional-scala
functional-scalafunctional-scala
functional-scala
wang hongjiang
 
Js is js(程劭非) (1)
Js is js(程劭非) (1)Js is js(程劭非) (1)
Js is js(程劭非) (1)
looneyren
 
ncuma_pylab.pptx
ncuma_pylab.pptxncuma_pylab.pptx
ncuma_pylab.pptx
NCU MCL
 
Scala再探
Scala再探Scala再探
Scala再探
afeihehe
 
Arrays的Sort算法分析
Arrays的Sort算法分析Arrays的Sort算法分析
Arrays的Sort算法分析
Zianed Hou
 
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Justin Lin
 
Ihome inaction 篇外篇之fp介绍
Ihome inaction 篇外篇之fp介绍Ihome inaction 篇外篇之fp介绍
Ihome inaction 篇外篇之fp介绍
dennis zhuang
 
Scala
ScalaScala
Scala
popeast
 
Scala
ScalaScala
Scala
deathxlent
 
Hi Haskell
Hi HaskellHi Haskell
Hi Haskell
Jifeng Deng
 
JavaScript 快速複習 2017Q1
JavaScript 快速複習 2017Q1JavaScript 快速複習 2017Q1
JavaScript 快速複習 2017Q1
Sheng-Han Su
 
Scilab introduction(Scilab 介紹)
Scilab introduction(Scilab 介紹)Scilab introduction(Scilab 介紹)
Scilab introduction(Scilab 介紹)
JIANG MING-LI
 
Jscex:案例、阻碍、体会、展望
Jscex:案例、阻碍、体会、展望Jscex:案例、阻碍、体会、展望
Jscex:案例、阻碍、体会、展望
jeffz
 
Rde packagean zhuang_ji_ji_ben_cao_zuo_
Rde packagean zhuang_ji_ji_ben_cao_zuo_Rde packagean zhuang_ji_ji_ben_cao_zuo_
Rde packagean zhuang_ji_ji_ben_cao_zuo_
vinsin27
 
20161209-Julia Taiwan first meetup-julia語言入門
20161209-Julia Taiwan first meetup-julia語言入門20161209-Julia Taiwan first meetup-julia語言入門
20161209-Julia Taiwan first meetup-julia語言入門
岳華 杜
 
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
彼得潘 Pan
 
Jscex:案例、经验、阻碍、展望
Jscex:案例、经验、阻碍、展望Jscex:案例、经验、阻碍、展望
Jscex:案例、经验、阻碍、展望
jeffz
 
Ppt 78-100
Ppt 78-100Ppt 78-100
Ppt 78-100
hungchiayang1
 
Ppt 78-100
Ppt 78-100Ppt 78-100
Ppt 78-100
hungchiayang1
 

Similar to Scala+RDD (20)

functional-scala
functional-scalafunctional-scala
functional-scala
 
Js is js(程劭非) (1)
Js is js(程劭非) (1)Js is js(程劭非) (1)
Js is js(程劭非) (1)
 
ncuma_pylab.pptx
ncuma_pylab.pptxncuma_pylab.pptx
ncuma_pylab.pptx
 
Scala再探
Scala再探Scala再探
Scala再探
 
Arrays的Sort算法分析
Arrays的Sort算法分析Arrays的Sort算法分析
Arrays的Sort算法分析
 
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
Java SE 8 的 Lambda 連鎖效應 - 語法、風格與程式庫
 
Ihome inaction 篇外篇之fp介绍
Ihome inaction 篇外篇之fp介绍Ihome inaction 篇外篇之fp介绍
Ihome inaction 篇外篇之fp介绍
 
Scala
ScalaScala
Scala
 
Scala
ScalaScala
Scala
 
Hi Haskell
Hi HaskellHi Haskell
Hi Haskell
 
JavaScript 快速複習 2017Q1
JavaScript 快速複習 2017Q1JavaScript 快速複習 2017Q1
JavaScript 快速複習 2017Q1
 
Scilab introduction(Scilab 介紹)
Scilab introduction(Scilab 介紹)Scilab introduction(Scilab 介紹)
Scilab introduction(Scilab 介紹)
 
Jscex:案例、阻碍、体会、展望
Jscex:案例、阻碍、体会、展望Jscex:案例、阻碍、体会、展望
Jscex:案例、阻碍、体会、展望
 
Pythonic
PythonicPythonic
Pythonic
 
Rde packagean zhuang_ji_ji_ben_cao_zuo_
Rde packagean zhuang_ji_ji_ben_cao_zuo_Rde packagean zhuang_ji_ji_ben_cao_zuo_
Rde packagean zhuang_ji_ji_ben_cao_zuo_
 
20161209-Julia Taiwan first meetup-julia語言入門
20161209-Julia Taiwan first meetup-julia語言入門20161209-Julia Taiwan first meetup-julia語言入門
20161209-Julia Taiwan first meetup-julia語言入門
 
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
Standford 2015 iOS讀書會 week2: 1. Applying MVC 2. More Swift and Foundation Fra...
 
Jscex:案例、经验、阻碍、展望
Jscex:案例、经验、阻碍、展望Jscex:案例、经验、阻碍、展望
Jscex:案例、经验、阻碍、展望
 
Ppt 78-100
Ppt 78-100Ppt 78-100
Ppt 78-100
 
Ppt 78-100
Ppt 78-100Ppt 78-100
Ppt 78-100
 

Scala+RDD

  • 2. 忘记自己学过的编程语言 忘记类型 忘记链表 忘记for循环 忘记while 忘记方法调用
  • 4. Scalable Language “ ” China Mobile
  • 5. 1 • 引入 2 • 函数式编程(FP) 3 • 面向对象(OO) 4 • 类型系统(Type System) 5 • 单子(Monad)
  • 6. 搭建了当前的Javac Generic Java的设计者之一 Martin Odersky --Scala的设计者 编译成Java字节码 与Java几乎无缝调用 静态类型 强大的类型系统 Who?
  • 7. Erlang Java Lisp 天下语言出Lisp, 且Scala的设计哲学是和Lisp比较近的 Haskell What? JVM 与Java相互调用 Monad 并行计算模型 类型系统 Scala
  • 8. Linkedin Spark Coursera Meetup Gilt Four square 谁在使用Scala? Scala的使用比较多样化,既有Spark的应用,也有很多网站使用Scala做后端
  • 9. Spark 阿里中间 件团队 蘑菇街 看处方 乔布堂 唯品会 谁在使用Scala? 大公司基本都是由Spark驱动,且用Scala做中间件的较多,对外暴露语言无关的接口
  • 10. 优点 1多范式混合,表达能力强 2可以调用Java包,兼容性强 3静态强类型,直接编译为二 进制码,速度与Java不相上下
  • 11. 3类型系统很复杂,学习 曲线陡峭 缺点 1函数式编程 2函数式编程
  • 13. countChange :: Int -> Int countChange amount = let coins = [1, 2, 5, 10, 20, 50, 100, 200] in cc coins amount where cc :: [Int] -> Int -> Int cc [] remain = 0 cc coins remain | remain < 0 = 0 | remain == 0 = 1 | otherwise = cc coins (remain - (head coins)) + cc (tail coins) remain 也不至于基本没有括号
  • 14. type Segment = (List[Int], List[Int], List[Int]) object Split { def unapply (xs: List[Int]) = { Extractor val pivot = xs(xs.size / 2) @tailrec def partition (s: Segment, ys: List[Int]): Segment = { val (left, mid, right) = s ys match { Guard case Nil => s case head :: tail if head < pivot => partition((head :: left, mid, right), tail) case head :: tail if head == pivot => partition((left, head :: mid, right), tail) case head :: tail if head > pivot => partition((left, mid, head :: right), tail) } } Some(partition((Nil, Nil, Nil), xs)) } } def qsort(xs: List[Int]): List[Int] = xs match { case Nil => xs case Split(left, pivot, right) => qsort(left) ::: pivot ::: qsort(right) } Quick Sort 尾递归 模式匹配
  • 17. var list = (1 to 100).toArray for (int i = 1; i <= 100; i++) { list[i] += 1 } list = list.map(1 +) 为 什 么 要 函 数 式 编 程
  • 18. var list = (1 to 100).toArray for (int i = 1; i <= 100; i++) { list[i] += 1 } list = list.view.map(1 +) 为 什 么 要 函 数 式 编 程
  • 19. var list = (1 to 100).toArray for (int i = 1; i <= 100; i++) { list[i] += 1 } list = list.par.map(1 +) 为 什 么 要 函 数 式 编 程
  • 20. 6 ^ 6 6 * 6 * 6 * 6 * 6 * 6 def ^(x: Int, y: Int) = { if (y == 0) 1 else if (x % 2 == 0) ^(x * x, y / 2) else x * ^(x, y – 1) } 为 什 么 要 函 数 式 编 程
  • 21. Quick Sort object Split { def unapply (xs: List[Int]) = { val pivot = xs(xs.size / 2) Some(xs.partitionBy(pivot)) } } def qsort(xs: List[Int]): List[Int] = xs match { case Nil => xs case Split(left, pivot, right) => qsort(left) ::: pivot ::: qsort(right) }
  • 22. Quick Sort 隐式转换 type Segment = (List[Int], List[Int], List[Int]) implicit class ListWithPartition(list: List[Int]) { def partitionBy(p: Int): Segment = { val idenElem = (List[Int](), List[Int](), List[Int]()) def partition(result: Segment, x: Int): Segment = { val (left, mid, right) = result if (x < p) (x :: left, mid, right) else if (x == p) (left, x :: mid, right) else (left, mid, x :: right) } list.foldLeft(idenElem)(partition) } }
  • 23. 副作用 值与址 class Pair[A](var x: A, var y: A) { def modifyX(x: A) = this.x = x def modifyY(y: A) = this.y = y } var pair = new Pair(1, 2) var pair1 = new Pair(pair, pair) var pair2 = new Pair(pair, new Pair(1, 2)) pair.modifyX(3)
  • 24. 副作用 结合律 var variable = 0 implicit class FooInt(i: Int) { def |+|(j: Int) = { variable = (i + j) / 2 i + j + variable } } (1 |+| 2) |+| 3 1 |+| (2 |+| 3) = 10 = 12
  • 25. X + y: (Int, Int) => Int X + : Int => Int 柯里化 fold(z: Int)(f: (Int, Int) => Int) val list = List(1, 2, 3, 4) def fold0: ((Int, Int) => Int) => Int = list.foldLeft(0) def fold1: ((Int, Int) => Int) => Int = list.foldLeft(1) fold0((x, y) => x + y) fold1((x, y) => x * y)
  • 26. A B [A] -> (A -> B) -> [B] 高阶函数 Map
  • 27. [A] -> (A -> Boolean) -> [A] 高阶函数 A A Filter A ?
  • 28. [A] -> B -> (B -> A -> B) -> [B] 高阶函数 A B Fold 自然 元素
  • 29. [A] A [[A]] -> [A] 高阶函数 Flatten
  • 30. A B Map [A] -> (A -> B) -> [B] Par 高阶函数
  • 31. 惰性求值惰性求值 val foo = List(1, 2, 3, 4, 5) baz = foo.map(5 +).map(3 +).filter(_ > 10).map(4 *) baz.take(2) 我们却得到了 foo.map(5 +) foo.map(5 +).map(3 +) foo.map(5 +).map(3 +).filter(_ > 10) 三个中间结果 在命令式语言中: for(int i = 0; i < 5; ++i) { int x = foo[i] + 5 + 3 if (x > 10) bar.add(x * 4) else continue; { 在我们声明时 我们想要的是一个愿望(计算) 而不是结果
  • 32. A B Map [A] -> (A -> B) -> [B] View 高阶函数
  • 33. 流与惰性求值 val fibs: Stream[Int] = 0 #:: 1 #:: fibs.zip(fibs.tail).map(n => n._1 + n._2) Quora 惰性求值 zip = ([A], [B]) => [(A, B)]
  • 34. 惰性求值 Lazy val x = 3 + 3 def number = {println("OK"); 3 + 3} class LazyValue(expr: => Int) { var evaluated: Boolean = false var value: Int = -1 def get: Int = { if (!evaluated) { value = expr evaluated = true } value } } Call By Name val lazyValue = new LazyValue(number) println(lazyValue.get) println(lazyValue.get) Thinking in Java Map可以用装饰器模式来实现
  • 35. Higher-Order Functions map(f: T => U): A[U] filter(f: T => Boolean): A[T] flatMap(f: T => A[T]): A[T] groupBy(f: T => K): A[(K, List[T])] sortBy(f: T => K): A[T] NEW Count: Int Force: A[T] Reduce(f: (T, T) => T): T T r a n f o r m a t i o n A c t i o n
  • 36. 面向对象 Scala是一门面向对象的语言,至少面向对象的纯度要比Java高。 包括1,2,1.1,等在内都是对象。 我们所见到的1 + 2实际上是1.+(2) 但在编译时会用原始类型来替代。 而函数x: Int => x.toString 则是Function1[Int, String] 所以,你可以map(5 +) 但不能map(+ 5)
  • 37. 一些语法糖 class Sugar(i: Int) { def unary_- = -i def apply(expr: => Unit) = for (j <- 1 to i) expr def +(that: Int) = i + that def +:(that: Int) = I + that } 目的是为了做好DSL 和延续函数式编程习惯 val sugar = new Sugar(2) 请注意谨慎使用 -sugar sugar(println("aha")) sugar + 5 5 + sugar 前缀 中缀 省略方法名 所有字母 | ^ & < > = ! : 注意右结 合 + - * / % 其他字符 右结合
  • 39. Trait & Mix-in Mix-in是一种多继承的手段,同Interface一样,通过限制第二个父类的方式 来限制多继承的复杂关系,但它具有默认的实现。 1.通常的继承提供单一继承 2.第二个以及以上的父类必须是Trait 3.不能单独生成实例 Scala中的Trait可以在编译时进行混合也可以在运行时混合。 设想我们要描述一种鸟,它可以唱歌也可以跑;由于它是一只鸟,它当然可 以飞。 abstract class Bird(kind: String) { val name: String def singMyName = println(s"$name is singing") val capability: Int def run = println(s"I can run $capability meters!!!") def fly = println(s"flying of kind: $kind") } 但显然,一个人也可以跑可以唱歌……..不过他还可以编程. (虽然我不歧视鸟类,不过如果碰到会编程的鸟请通知我) 继承
  • 40. trait Runnable { val capability: Int def run = println(s"I can run $capability meters!!!") } trait Singer { val name: String def singMyName = println(s"$name is singing") } abstract class Bird(kind: String) { def fly = println(s"flying of kind: $kind") } 继承
  • 41. class Nightingale extends Bird("Nightingale") with Singer with Runnable { val capability = 20 val name = "poly" } val myTinyBird = new Nightingale myTinyBird.fly myTinyBird.singMyName myTinyBird.run class Coder(language: String) { val capability = 10 val name = "Handemelindo" def code = println(s"coding in $language") } val me = new Coder("Scala") with Runnable with Singer me.code me.singMyName me.run 继承
  • 42. 一个小伙伴 伴生对象 object Sugar { def apply(i: Int) = new Sugar(i) } 可以在此实现工厂模式 object本身是单例模式的,注意线程问题!!!
  • 43. 一些小伙伴 Case Class与ADT abstract class Tree case class Leaf(info: String) extends Tree case class Node(left: Tree, right: Tree) extends Tree def traverse(tree: Tree): Unit = { tree match { case Leaf(info) => println(info) case Node(left, right) => { traverse(left) traverse(right) } } } val tree: Tree = new Node(new Node(new Leaf("1"), new Leaf("2")), new Leaf("3")) traverse(tree)
  • 44. 类型系统 如果你是一个C程序员,那么类型系统是: 用来告诉计算机它需要用多少字节来存放这些数字的指标 如果你是一个Java程序员,那么类型系统是: 用来表示存放实例的地方 这样编译器就可以检查你的程序是否连续一致 如果你是一个R程序员,那么类型系统是: 用来标志对这些变量应该用何种统计计算 如果你是一个Ruby程序员,那么类型系统是: 你应该回避的东西 而对于Scala程序员,类型系统是: 如同UML之于Java,是正确性的保证,是程序的蓝图 猜猜这是什么:e.g. [(K1, V1)] -> [(K2, [V2])] -> [(K2, V3)]
  • 45. * Any Int 1 Pair[Int, Int] (1, 2) List[Int] [1, 2, 3] * * * * * List Pair Kind Type Value 类型构造器 类别 子类型 Generics of a Higher Kind - Martin Odersky => => => Proper Type
  • 46. type Int :: * type String :: * type (Int => String) :: * type List[Int] :: * type List :: ? type Function1 :: ?? 做一些抽象练习吧 type List :: * => * type function1 :: * => * => * Function1[-T, +R] def id(x: Int) = x type Id[A] = A def id(f: Int => Int, x: Int) = f(x) type id[A[_], B] = A[B]
  • 47. 设想,我们的程序要返回结果: (Set(x,x,x,x,x), List(x,x,x,x,x,x,x,x,x,x)) (* -> *) -> (* -> *) -> * type Pair[K[_], V[_]] = (K[A], V[A]) forSome { type A } val pair: Pair[Set, List] = (Set(“42”), List(52)) val pair: Pair[Set, List] = (Set(42), List(52)) 做一些抽象练习吧
  • 48. 又例如,我们有以下这个函数: def foo[A[_]](bar: A[Int]): A[Int] = bar 可以喂给它(* => *),例如 val foo1 = foo[List](List(1, 2, 3, 5, 8, 13)) 如果我们有: def baz(x: Int) = println(x) 回想起type function1 :: * => * => * Type Lambda 肿么办? 因此: * => * = *[Unit] => *[Unit] val foo2 = foo[ ({type F[X] = Function1[X, Unit]})#F ](baz)
  • 49. trait Monoid[A]{ val zero: A def append(x: A, y: A): A } object IntNum extends Monoid[Int] { val zero = 0 def append(x: Int, y: Int) = x + y } object DoubleNum extends Monoid[Double] { val zero = 0d def append(x: Double, y: Double) = x + y } def sum[A](nums: List[A])(tc: Monoid[A]) = nums.foldLeft(tc.zero)(tc.append) sum(List(1, 2, 3, 5, 8, 13))(IntNum) sum(List(3.14, 1.68, 2.72))(DoubleNum) 对态射进行抽象
  • 50. trait Monoid[A]{ val zero: A def append(x: A, y: A): A } object IntNum extends Monoid[Int] { val zero = 0 def append(x: Int, y: Int) = x + y } object DoubleNum extends Monoid[Double] { val zero = 0d def append(x: Double, y: Double) = x + y } def sum[A](nums: List[A])(implicit tc: Monoid[A]) = nums.foldLeft(tc.zero)(tc.append) sum(List(1, 2, 3, 5, 8, 13)) sum(List(3.14, 1.68, 2.72)) implicit implicit Type Class 1.抽象分离 2.可组合 3.可覆盖 4.类型安全 Type Class val list = List(1,3,234,56,5346,34) list.sorted sorted[B >: A](implicit ord: math.Ording[B])
  • 51. 逆变与协变 List[+T] class Person(name: String) { def shut = println(s"I am $name") } class Coder(language: String, name: String) extends Person(name) { def code = println(s"Coding in $language") } val persons: List[Coder] = List(new Coder("Java", "Jeff"), new Coder("Haskell", "Harry")) def traverse(persons: List[Person]) = persons.foreach(_.shut) traverse(persons)
  • 52. 逆变与协变 Function1[-T, +R] 定义域值域 函数f1 X2 X1 Y1 Y2 函数2
  • 53. Monad 自函子范畴上的幺半群 赫尔曼外尔-----思维的数学方式: 现在到了数学抽象中最关键的一步:让我们忘记这些符号所表示的对象。我们不 应在这里停步,有许多操作可以应用于这些符号,而根本不必考虑他们到底代表 着什么东西。 Philip Wadler
  • 54. Group 什么是群(Group) (1)封闭性(Closure):对于任意a,b∈G,有a*b∈G (2)结合律(Associativity):对于任意a,b,c∈G,有(a*b)*c=a*(b*c) (3)幺元(Identity):存在幺元e,使得对于任意a∈G,e*a=a*e=a (4)逆元:对于任意a∈G,存在逆元a^-1,使得a^-1*a=a*a^-1=e 什么是半群(SemiGroup) 只满足1,2, 什么是幺半群(Monoid) 满足1,2,3
  • 55. Monoid 废话少说,放码过来 trait SemiGroup[T] { def append(a: T, b: T): T } trait Monoid[T] extends SemiGroup[T] { def zero: T } class listMonoid[T] extends Monoid[List[T]]{ def zero = Nil def append(a: List[T], b: List[T]) = a ++ b }
  • 56. Functor 函子(Functor)是什么 Int List[Int] Functor Int List[Int]
  • 57. Functor 函子(Functor)是什么 trait Functor[F[_]] { def map[A, B](fa: F[A], f: (A) => B): F[B] } map[B](f: (A) => B): List[B]
  • 59. Monad 自函子上的幺半群 回想一下幺半群的单位元 回想一下fold函数 什么是自函子上的单位元呢? 什么是自函子上的结合运算呢? Unit x >>= f ≡ f x M >>= unit ≡ m (m >>= f) >>= g ≡ m >>= (λx . F x >>= g) 单位元:将元素提升进计算语境 结合律:结合简单运算形成复杂运算
  • 60. 一些常见Monad Option Option或叫Maybe,表示可能失败的计算 由Some(Value)或None表示 Some(x) fMap (f: A => Some[B]) = Some(f(x)) None fMap(f: A => Some[B]) = None Unit = Some val maybe: Option[Int] = Some(4) val none: Option[Int] = None def calculate(maybe: Option[Int]): Option[Int] = for { value <- maybe } yield value + 5 calculate(maybe) calculate(none)
  • 61. 一些常见Monad List 集合本身是Proper type,它代表的是不确定性 Unit = List val list1 = List(2, 4, 6, 8) val list2 = List(1, 3, 5, 7) for { value1 <- list1 value2 <- list2 } yield value1 + value2
  • 63. 1 •介绍 2 •从FP看MR 3 •从FP看RDD 4 • RDD 5 •MLlib
  • 64. Spark 一种通用并行计算框架 Spark Map Reduce 生态系统Spark平台已经基本成熟, 但相关的Mllib、Spark SQL等依然在发展中 非常成熟,有很多应用 计算模型类Monadic,Functor Map Reduce 存储主要是内存主要是磁盘 编程风格面向集合面向接口
  • 65. Spark Spark SQL MLlib GraphX Spark Streaming Spark Map Reduce Monadic 本地 运行模式 独立 运行模式 YARN Mesos HDFS Amazon S3 Hypertable Hbase etc.
  • 66. 优点 1面向集合,便于开发 2支持的计算模型较MR要多 3内存计算速度更快,可以进 行持久化以便于迭代;数据不 “大”,还可兼顾 “快”
  • 67. 缺点 1内存消耗快,注意使用kryo等 序列化库 2惰性求值的计算时间不宜估计 优化难度高
  • 68. Word Count Map Reduce [(K1, V1)] -> [(K2, [V2])] -> [(K2, V3)] [Line] flatMap(_.split(“s+”)).map((_, 1)) [(Word, 1)] -> [(Word, [1])] -> [(Word, n)] groupBy(_._1) -> reduceBy(_._1)(_._2 + _._2)
  • 69. map(f: T => U) filter(f: T => Boolean) flatMap(f: T => Seq[U]) sample(fraction: Float) groupByKey() reduceByKey(f: (V, V) => V) mapValues(f: V => W) RDD NEW Count() Collect() Reduce(f: (T, T) => T) Lookup(k: K) Save(path: String) take(n: Int) T r a n f o r m a t i o n A c t i o n union() join() cogroup() crossProduct sort(c Comparator[K]) partitionBy(p: Partitioner[K])
  • 70. Word Count [(K1, V1)] -> [(K2, [V2])] -> [(K2, V3)] lines = spark.textFile("hdfs://...") words = lines.flatMap(_.split(“//s+”)) wordCounts = words.map((_, 1)) result = wordCounts.reduceByKey(_ + _) result.save(“hdfs://…”) RDD
  • 71. 什么是RDD RDD的特点 • 不可变的、已分区的集合 • 只能通过读取文件或Transformation的方式来创建 • 容错 通过血统重新计算 • 可控制存储级别 new StorageLevel(useDisk, useMemory, deserialized, replication) • 可缓存 cache()方法 • 粗粒度模型 • 静态类型的
  • 72. 什么是RDD 一个惰性的并行计算集合 • 惰性: • 惰性的优点:单次计算,信息量充足,可自动批处理。 每一个Transformation代表着该数据将被执行何种操作 • 并行:我们将数据放在计算语境中 计算语境会自动将计算并行化 RDD是面向集合的
  • 73. RDD的实现 一个五元组 • Partitions: 一片数据原子,例如HDFS的块,代表数据 • Preferred Location: 列出了partition可以从哪里进行更快速的访问 • Dependencies: 与父节点的依赖,子节点是由父节点计算出来的 • Computation: 代表计算,在父节点的数据上应用该计算则可得到子节点的数据 • Metadata: 储存例如该节点的地址和分片方式的元数据
  • 75. RDD的实现 如何表示惰性计算 DAG 通过拓扑排序: 1. 追踪到源头开始进行计算 2. 将不需要混合的数据划分到同一组处理当中
  • 76. RDD的实现 血统(Lineage) 表示计算之间的联系: • Narrow Dependencies:开销小 如Map, Union。 表现为一个或多个父RDD的分区对应于一个子RDD分区 可以本地化 • Wide Dependencies:开销大 如GroupBy。 表现为一个父RDD分区对应多个子RDD分区 需要Shuffling
  • 77. RDD的执行 SparkContext Cluster Manager Executor Cache Task Task Executor Cache Task Task
  • 78. RDD的执行 1.RDD直接从外部数据源创建(HDFS、本地文件等) 2.RDD经历一系列的TRANSFORMATION 3.执行ACTION,将最后一个RDD进行转换,输出到外部数据源。 同时:自动优化分块,分发闭包,混合数据,均衡负载
  • 79. MLlib SVM with SGD NB 各类决策树 Classification LabeledPoint(Double, Vector) val data = sc.textFile(“….") val parsedData = data.map { line => val parts = line.split(' ') LabeledPoint(parts(0).toDouble, parts.tail.map(x => x.toDouble).toArray) } val numIterations = 20 val model = SVMWithSGD.train(parsedData, numIterations) val labelAndPreds = parsedData.map { point => val prediction = model.predict(point.features) (point.label, prediction) } val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / parsedData.count
  • 80. MLlib 逻辑回归Regression 岭回归与 拉锁回归 LabeledPoint(Double <- Vector) val data = sc.textFile(“….") val parsedData = data.map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, parts(1).split(' ').map(x => x.toDouble).toArray) } val numIterations = 20 val model = LinearRegressionWithSGD.train(parsedData, numIterations) val valuesAndPreds = parsedData.map { point => val prediction = model.predict(point.features) (point.label, prediction) } val MSE = valuesAndPreds.map{ case(v, p) => math.pow((v - p), 2)}.reduce(_ + _) / valuesAndPreds.count
  • 81. MLlib Clustering Clustering: k均值 及其变种k均值++ Vector val data = sc.textFile(“….") val parsedData = data.map( _.split(' ').map(_.toDouble)) val numIterations = 20 val numClusters = 2 val clusters = KMeans.train(parsedData, numClusters, numIterations) val WSSSE = clusters.computeCost(parsedData)
  • 82. MLlib 支持显性和隐性的ALS Collaborate Filtering Rating(Int, Int, Double) val data = sc.textFile(“….") val ratings = data.map(_.split(',') match { case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble) }) val numIterations = 20 val model = ALS.train(ratings, 1, 20, 0.01) val usersProducts = ratings.map{ case Rating(user, product, rate) => (user, product)} val predictions = model.predict(usersProducts).map{ case Rating(user, product, rate) => ((user, product), rate) } val ratesAndPreds = ratings.map{ case Rating(user, product, rate) => ((user, product), rate) }.join(predictions) val MSE = ratesAndPreds.map{ case ((user, product), (r1, r2)) => math.pow((r1 - r2), 2) }.reduce(_ + _) / ratesAndPreds.count
  • 83. THANKS of your attention! Happy! Hacking! China Mobile