APartSML(3)——复合类型

Posted Nov 24, 2024

By ZoeyWang

23 min read

APartSML(3)——复合类型

1.Conceptual Ways to Build New Types

前面介绍的各种基本类型，比如int，bool，char等，以及一些复合类型：tuple，lists, options等，接下来介绍更多创建复合类型的方式。

对于任何语言，有3种最重要的类型构建块：

Each of：一个t值包含t1,t2…tn的每个值
One of：一个t值包含t1,t2…tn的一个值
Self reference：一个t值指向其它的t值

许多数据类型都可以用这三种构建块描述，可能在不同语言中命名不同

构建ML的each of类型：records
构建ML的one of类型：patter-matching

2.Records

records是Each of类型，每个组成部分都是一个命名的字段

syntax：{f1 = v1, …, fn vn}
typing-check：{f1 : t1,…, fn:tn}

获取record中的某个值：

对于record e，#fieldname e获取e中fieldname字段值

  
- val x = {bar=(1+2, true andalso true), foo=3+4, baz=(false, 9)};
val x = {bar=(3,true),baz=(false,9),foo=7}
  : {bar:int * bool, baz:bool * int, foo:int}
- #bar x;
val it = (3,true) : int * bool

record和tuple不同的是，两种获取元素值，record通过name，tuple通过位置，这也是构建某些语法时大部分情况下都要决定的事，是通过位置(by position)还是通过名称(by name)来获取某些内容。

3.tuples as syntactic sugar

tuple实际上是record的一种特例，只是字段名称是1，2….

syntax：(e1, …, en)另一种完整写法是{1=e1, …, n=en}
typing-check：t1 * … * tn 另一种完整写法是{1:t1, … , n:tn}

但是{1=4, 2=7, 3=9}这种写法比较糟糕，因此提供了tuple这种类型，tuple是一种语法糖(syntactic sugar)，完全能实现record的功能，但是使语言看起来更简单，更容易实现

  
- val a = (3+1, 4+2);
val a = (4,6) : int * int
- val b = {2=5, 1=6};
val b = (6,5) : int * int
- #1 a + #2 b;
val it = 9 : int
- val y = {3="hi", 1=true, 2=3+2};
val y = (true,5,"hi") : bool * int * string

4.Datatype Bindings

bingds有三种：variable bindings，function bindings和datatype bindings。前面已经介绍过variable bindings，function bindings，接下来介绍datatype bindings。

假设想要定义一个新类型，可能是int * int，可能是string，也可能nothing，通过如下代码实现：

在环境中增加了一个新类型mytype
在环境中增加了3个构造器constructors：TwoInts，Str和Pizza

构造器有两种情况：

一个函数，用来创建新类型的值
就是代表新类型的值

TwoInts和Str都是属于第一种构造器，Pizza属于第二种构造器，调用这些构造器就可以创建属于mytype类型，包括三种可能类型

  
datatype mytype = TwoInts of int * int
                | Str of string
                | Pizza

val a = Str "hi";
val b = Str;
val c = Pizza;
val d = TwoInts(1+2, 3+4);
val e = a;

- use "datatype_bindings.sml";
[opening datatype_bindings.sml]
datatype mytype = Pizza | Str of string | TwoInts of int * int
val a = Str "hi" : mytype
val b = fn : string -> mytype
val c = Pizza : mytype
val d = TwoInts (3,7) : mytype
val e = Str "hi" : mytype
val it = () : unit

5.case expression

如果给定了一个类型为mytype的变量，怎么获取存储在里面的数据呢？在前面学习list，tuple时，这些类型都属于One of类型，提供了一些函数比如null，isSome来判断是什么类型的变量，hd,tl和valOf来获取其中的数据，对于自定义的复合类型，ML提供了更好的方法：case expression和pattern-matching

  
datatype mytype = TwoInts of int * int
                | Str of string
                | Pizza

fun f x =
    case x of
        Pizza => 3
      | Str s => 8
      | TwoInts(i1, i2) => i1 + i2;

- use "case.sml";
[opening case.sml]
datatype mytype = Pizza | Str of string | TwoInts of int * int
val f = fn : mytype -> int
val it = () : unit
- f (Str "hi");
val it = 8 : int
- f (TwoInts(7, 9));
val it = 16 : int

case expression比较像if else then表达式，每个分支branch都有一个形式：p=>e，其中p是一种模式(pattern)，e是一种表达式，p是用来和x进行匹配的(match)，因此这种case-expression被称为pattern-matching。每个模式都使用不同的构造器，模式匹配会选择正确的分支，评估e的值并返回。模式匹配的时候不仅仅会关注类型，有时也会用到携带的数据，比如TwoInts携带了两个值，评估e的时候取的这两个值的和。

6.useful datatypes

目前为止，主要讲了两个内容：each-of 类型的record和one-of类型的datatype，那么one-of类型有什么用处

（1）方便枚举一些固定的可选集：许多语言都支持枚举类型enumeration，用each-of类型可以把两者结合：suit * rank

  
datatype rank = Jack | Queen | King | Ace | Num of int;
datatype suit = Club | Diamond | Heart | Spade;

（2）用于不同情况有不同的数据：比如想要用学生id来唯一识别学生，但是有些情况下学习还没有id，那可能想用姓名识别

  
datatype id = StudentNum of int
            | Name of string * (string option) * string;

如果用each-of类型来表示，那么需要record来记录所有数据，但是实际上不是这些字短都需要，会造成空间的浪费，因此使用one-of类型更好

  
{ student_num : int option,
  first : string,
  middle : string option,
  last : string }

（3）基于self-reference的算术表达式：在定义datatype时可以进行自我引用，实现递归

  
datatype exp = Constant of int
             | Negate of exp
             | Add of exp * exp
             | Multiply of exp * exp;

fun eval e =
    case e of
        Constant i => i
      | Negate e2 => ~ (eval e2)
      | Add(e1, e2) => (eval e1) + (eval e2)
      | Multiply(e1, e2) => (eval e1) * (eval e2);

fun number_of_adds e =
    case e of
        Constant i => 0
      | Negate e2 => number_of_adds e2
      | Add(e1, e2) => 1 + number_of_adds e1 + number_of_adds e2
      | Multiply(e1, e2) => number_of_adds e1 + number_of_adds e2;

val example_exp : exp = Add(Constant 19, Negate (Constant 4));
val example_ans : int = eval example_exp;
val example_addcount = number_of_adds(Multiply(example_exp, example_exp));

-  use "useful.sml";
[opening useful.sml]
val eval = fn : exp -> int
val number_of_adds = fn : exp -> int
val example_exp = Add (Constant 19,Negate (Constant 4)) : exp
val example_ans = 15 : int
val example_addcount = 2 : int
val it = () : unitz

总结学习到的datatype和pattern-matching，datatype bindings如下：定义新类型t，每个构造器Ci时一个类型为ti->t的函数，如果没有ti，表示不携带内容

datatype t = C1 of t1 | C2 of t2 | ... | Cn of tn

使用case expression获取t的内容，pi是模式，找到与e匹配的pi，然后评估ei的值，就是整个case expression的结果

case e of p1 => e1 | p2 => e2 | ... | pn => en

7.another expression example

获取表达式中最大的Constant，好的代码：

使用case expression
使用局部帮助函数
避免重复的递归
使用库函数来获取更简单的解

  
datatype exp = Constant of int
             | Negate of exp
             | Add of exp * exp
             | Multiply of exp * exp;

fun max_constant e =
    let
    fun max_of_two(e1, e2) =
        Int.max(max_constant e1, max_constant e2)
    in
    case e of
    Constant i => i
      | Negate e2 => max_constant e2
      | Add(e1, e2) => max_of_two(e1, e2)
      | Multiply(e1, e2) => max_of_two(e1, e2)
    end;

val test_exp = Add(Constant 19, Negate(Constant 4));
val ans = max_constant test_exp;

8.type synonyms

类型同义词(type synonyms)是给类型取了另一个名字：type aname = t，任何情形下这两种类型都是可互换的，card是suit * rank的另一种命名，因此card -> int等同于suit * rank ->int

  
type card = suit * rank

9.lists and options are Datatypes

前面学习到list和option，以及一些方法null，hd，tl，SOME，NONE，实际上list和option也是数据类型，可以通过自己定义这个数据类型来实现

定义my_int_list数据类型：构造器包括空列表Empty和非空Cons列表

  
datatype my_int_list = Empty
                     |Cons of int * my_int_list;

定义一个变量，类型为my_int_list，用到Empty和Cons构造器

  
val one_two_three = Cons(1, Cons(2, Cons(3, Empty)));

把两个列表连接起来

  
fun append_mylist (xs, ys) =
    case xs of
        Empty => ys
      | Cons(x, xs') => Cons(x, append_mylist(xs', ys));

option类型

  
fun inc_or_zero intoption =
    case intoption of
        NONE => 0
      | SOME i => i+1;

10.the true of val-bindings

Every function in ML takes exactly one argument!

在ML里，函数的参数实际上只有一个，以元组形式存在，然后通过样式匹配来提取每个部分。在下面代码中，如果函数的参数传入的是3个值，返回的是一个元组，那么rotate_left(rotate_left triple)是有问题的，因为rotate_left triple返回元组，与参数要求不一样，但是由于在ML里函数的参数实际上只有一个，因此这样执行是可以的。

  
fun rotate_left (x, y, z) = (y, z, x);
fun rotate_right triple = rotate_left(rotate_left triple);

11.type inference

在前面的代码中，函数的参数要定义类型，这是为了方便进行type-check，但是如果我们使用模式匹配来获取tuple和record的值，而不是#，那么就不需要定义类型。ML进行type-check时可以进行类型推断(type inference)。编译器根据函数的结构和操作自动确定变量和表达式的类型，而不需要显式地声明类型

triple模式匹配成(x, y, z)，表明是一个三元组，而z + y + x表明是三个整数，因此类型推断为int * int * int -> int
也可以直接在参数传入时声明类型为int * int * int

  
fun sum_triple1 triple =
    case triple of
    (x, y, z) => z + y + x;

fun sum_triple2 triple =
    let val (x,y,z) = triple
    in
    x + y + z
    end;

fun sum_triple3 (x, y, z) =
    x + y + z;

fun sum_triple4 (triple: int * int * int) =
    #1 triple + #2 triple + #3 triple;

- use "type_inference.sml";
[opening type_inference.sml]
val sum_triple1 = fn : int * int * int -> int
val sum_triple2 = fn : int * int * int -> int
val sum_triple3 = fn : int * int * int -> int
val sum_triple4 = fn : int * int * int -> int

但是如果不声明类型，并且使用#就会出错，下面代码中，ML只能推断出triple前三个是int * int * int，但是完整的类型可能是int * int * int * string或者int * int * int * string * bool，甚至无限长度

  
fun sum_triple5 triple =
     #1 triple + #2 triple + #3 triple;

type_inference.sml:18.1-19.39 Error: unresolved flex record (need to know the names of ALL the fields
 in this context)
  type: {1:'Y[OL(+,+)], 2:'Y[OL(+,+)], 3:'Y[OL(+,+)]; 'Z}

在类型推断时，有时你可能会定义一个函数，按照你的理解，它只适用于某些特定类型的输入。然而，编译器在分析函数的行为时，可能会发现这个函数实际上适用于更广泛的类型。

  
fun partial_sum (x, y, z) = x + z;
val partial_sum = fn : int * 'a * int -> int

12.nested pattern

the semantics of pattern-matching is that the value being matched must have the same shape as the pattern and variables are bound to the right pieces.

pattern-matching is about taking a value and a pattern and (1) deciding if the pattern matches the value and (2) if so, binding variables to the right parts of the value.

如果一个构造器有多个参数，不需要写成C(x1, … , xn)，可以写成C x，实际上所有的构造器都只有0或者1个参数，C(x1, … , xn)是一种嵌套模式，(x1, … , xn)也是一种模式匹配所有带n个部分的元组。模式可以嵌套使用

假设xs匹配空列表[]，那么返回0，假设xs匹配非空列表，其中x::xs’表示一个非空列表，第一个元素是x，后面是xs’列表，可能为空，那么返回1+xs’的长度

  
fun len xs =
    case xs of
        [] => 0
      | x::xs' => 1 + len xs';

在上面代码中，x实际上没有被使用，因此可以用_表示，下划线匹配任何值v，并且没有绑定

  
fun len xs =
    case xs of
        [] => 0
      | _::xs' => 1 + len xs';

假设要写两个函数，zip和unzip，基于之前学的列表内容，可以使用null，hl和tl来实现

  
exception ListLengthMismatch;

fun old_zip3 (l1, l2, l3) =
    if null l1 andalso null l2 andalso null l3
    then []
    else if null l1 orelse null l2 orelse null l3
    then raise ListLengthMismatch
    else (hd l1, hd l2, hd l3)::old_zip3(tl l1, tl l2, tl l3);

但是如果不使用这些功能函数，使用case expression，需要列举所有的可能性

  
fun shallow_zip3 (l1, l2, l3) =
    case l1 of
    [] => (case l2 of
           [] => (case l3 of
                  [] => []
                | _ => raise ListLengthMismatch)
         | _ => raise ListLengthMismatch)
      | hd1::tl1 => (case l2 of
            [] => raise ListLengthMismatch
              | hd2::tl2 => (case l3 of
                     [] => raise ListLengthMismatch
                       | hd3::tl3 => (hd1,hd2,hd3)::shallow_zip3(tl1, tl2, tl3)));

一种更简洁的方式，在case expression里，用下划线代替所有其它的可能

  
fun zip3 list_triple =
    case list_triple of
    ([], [], []) => []
      | (hd1::tl1, hd2::tl2, hd3::tl3) => (hd1, hd2, hd3)::zip3(tl1, tl2, tl3)
      | _ => raise ListLengthMismatch;


fun unzip3 lst =
    case lst of
    [] => ([], [], [])
      | (a, b, c)::tl => let val (l1, l2, l3) = unzip3 tl
             in
                 (a::l1, b::l2, c::l3)
             end;


- zip3 ([1,2,3], [4,5,6], [7,8,9]);
val it = [(1,4,7),(2,5,8),(3,6,9)] : (int * int * int) list
- unzip3 [(1,4,7),(2,5,8),(3,6,9)];
val it = ([1,2,3],[4,5,6],[7,8,9]) : int list * int list * int list

还有其它更多例子，判断列表非递减，判断两个数的乘积符号等

  
fun nondecreasing intlist =
    case intlist of
    [] => true
      | _::[] => true
      | head::(neck::rest) => (head <= neck) andalso nondecreasing(neck::rest);

datatype sgn = P|N|Z;

fun multsign (x1, x2) =
    let fun sign x = if x = 0 then Z else if x>0 then P else N
    in
    case (sign x1, sign x2) of
        (Z, _) => Z
      | (_, Z) => Z
      | (P, P) => P
      | (N, N) => P
      | _      => N
    end;

13.Tail Recursion and Accumulators

尾部递归tail recursion,：用于在函数式编程语言中写出高效的递归函数

累计器accumulators：实现尾部递归的工具

举例：对一个列表求和，有两种写法

  
fun sum1 xs =
    case xs of
            [] => 0
      | i::xs' => i + sum1 xs';


fun sum2 xs =
    let fun f (xs, acc) =
        case xs of
                [] => acc
          | i::xs' => f(xs', acc+i)
    in
    f(xs, 0)
    end;

两种写法的功能是一样的

  
- use "tail.sml";
[opening tail.sml]
val sum1 = fn : int list -> int
val sum2 = fn : int list -> int
val it = () : unit
- sum1 [1, 2, 3];
val it = 6 : int
- sum2 [1, 2, 3];
val it = 6 : int

第二种写法似乎更复杂，但是实际上比第一种要更好。

函数的调用通过栈实现的(call stack)，里面存放了每个已经开始但是没有完成的函数调用。

对于sum1，调用栈里存放里i + sum1 xs’，只用sum1 xs’获取到下一个调用的返回结果，并且加上i，当前的调用才算完成，并且推出，因此栈的深度和列表长度一样。
对于sum2，调用栈存放f(xs’, acc+i)，当下一个调用结果返回后，当前调用的结果也会返回，也就死说，对于caller，只需要等callee的结果即可，此外没有任何其它操作。在这种情况下，可以进行优化，不需要存入所有的调用进入栈中，可以用callee的调用替代caller，因为我们想要的caller结果就是callee结果，栈的深度只有一个。累计器可以记录截止目前的callee的值

更多尾部递归的例子，斐波列契函数，经典的递归例子，两种写法的计算过程不一样

  
fun fact1 n =
    if n=0
    then 1
    else n * fact1(n-1);

fun fact2 n =
    let fun aux(n, acc) =
        case n of
        0 => acc
          | _ => aux(n-1, n*acc)
    in
    aux(n, 1)
    end;

- fact1 4;
val it = 24 : int  # 4*(3*(2*(1*1)))
- fact2 4;
val it = 24 : int  # (((1*4)*3)*2)*1

倒序列表，第一种写法复杂度为$n^2$，栈的深度为n，连接两个列表的复杂度为n，第二种写法采用了尾部递归和$::$

  
fun rev1 lst =
    case lst of
    [] => []
      | x::xs' => rev1(xs') @ [x];

fun rev2 lst =
    let fun aux(lst, acc) =
        case lst of
        [] => acc
          | x::xs => aux(xs, x::acc)
    in
    aux(lst, [])
    end;

尾部调用的定义：尾部位置的调用，即调用后没有其它操作

Coursera, ProgrammingLanguage

programming

This post is licensed under CC BY 4.0 by the author.