
 2022-08-12 17:00:46

Chapter 2

Linear Algebra

Linear algebra is a branch of mathematics that is widely used throughout science and engineering. Yet because linear algebra is a form of continuous rather than discrete mathematics, many computer scientists have little experience with it. A good understanding of linear algebra is essential for understanding and working with many machine learning algorithms, especially deep learning algorithms. We therefore precede our introduction to deep learning with a focused presentation of the key linear algebra prerequisites.

If you are already familiar with linear algebra, feel free to skip this chapter. If you have previous experience with these concepts but need a detailed reference sheet to review key formulas, we recommend The Matrix Cookbook (Petersen and Pedersen 2006 , ). If you have had no exposure at all to linear algebra, this chapter will teach you enough to read this book, but we highly recommend that you also consult another resource focused exclusively on teaching linear algebra, such as Shilov (1997). This chapter completely omits many important linear algebra topics that are not essential for understanding deep learning.

2. 1 Scalars, Vectors, Matrices and Tensors

The study of linear algebra involves several types of mathematical objects:

bull; Scalars: A scalar is just a single number, in contrast to most of the other objects studied in linear algebra, which are usually arrays of multiple numbers. We write scalars in italics. We usually give scalars lowercase variable names. When we introduce them, we specify what kind of number they are. For example, we might say “Let s isin; R be the slope of the line,” while defining a real-valued scalar, or “Let n isin; N be the number of units,” while defining a natural number scalar.

bull; Vectors: A vector is an array of numbers. The numbers are arranged in order. We can identify each individual number by its index in that ordering. Typically we give vectors lowercase names in bold typeface, such as x . The elements of the vector are identified by writing its name in italic typeface, with a subscript. The first element of x is , the second element is , and so on. We also need to say what kind of numbers are stored in the vector. If each element is in R, and the vector has n elements, then the vector lies in the set formed by taking the Cartesian product of times, denoted as . When we need to explicitly identify the elements of a vector, we write them as a column enclosed in square brackets:

. (2.1)

We can think of vectors as identifying points in space, with each element giving the coordinate along a different axis.

Sometimes we need to index a set of elements of a vector. In this case, we define a set containing the indices and write the set as a subscript. For example, to access and , we define the set S = {1,3,6} and write . We use the minus;sign to index the complement of a set. For example is the vector containing all elements of x except for , and is the vector containing all elements of x except for and .

bull; Matrices: A matrix is a 2-D array of numbers, so each element is identified by two indices instead of just one. We usually give matrices uppercase variable names with bold typeface, such as .If a real-valued matrix A has a height of m and a width of , then we say that Aisin; .We usually identify the elements of a matrix using its name in italic but not bold font, and the indices are listed with separating commas. For example, ,is the upper left entry of A and is the bottom right entry. We can identify all the numbers with vertical coordinate i by writing a “:” for the horizontal coordinate. For example, denotes the horizontal cross section of A with vertical coordinate . This is known as the i-th row of . Likewise, is the i-th of . When we need to explicitly identify the elements of a matrix, we write them as an array enclosed in square brackets:

. (2.2)

Sometimes we may need to index matrix-valued expressions that are not just a single letter. In this case, we use subscripts after the expression but do not convert anything to lowercase. For example, gives element of the matrix computed by applying the function to.

bull; Tensors: In some cases we will need an array with more than two axes. In the general case, an array of numbers arranged on a regular grid with a variable number of axes is known as a tensor. We denote a tensor named “A” with this typeface: . We identify the element of at coordinates by writing

One important operation on matrices is the transpose. The transpose of a matrix is the mirror image of the matrix across a diagonal line, called the main diagonal, running down and to the right, starting from its upper left corner. See figure for a graphical depiction of this operation. We denote the transpose of a 2.1 matrix A as A , and it is defined such that


Vectors can be thought of as matrices that contain only one column. The transpose of a vector is therefore a matrix with only one row. Sometimes we

Figure 2.1: The transpose of the matrix can be thought of as a mirror image across the main diagonal.

define a vector by writing out its elements in the text inline as a row matrix, then using the transpose operator to turn it into a standard column vector, for example .

A scalar can be thought of as a matrix with only a single entry. From this, we can see that a scalar is its own transpose:

We can add matrices to each other, as long as they have the same shape, just by adding their corresponding elements: C = A B where = .

We can also add a scalar to a mat





如果您以前有这些概念的经验,但需要详细的参考表来查看关键公式,我们建议使用矩阵说明书(Petersen 和 Pedersen,2006 年)。如果您根本没有接触线性代数,本章将教你完全的阅读这本书,但我们强烈建议您也参考另一个专门教授线性代数的资源,如Shilov(1977年)。本章完全省略了许多重要的线性代数主题,这些主题对于理解深度学习来说并非必不可少。

2.1 标量、矢量、矩阵和张力


标量:标量只是一个数字,与线性代数中研究的大多数其他对象形成对比,这些对象通常是多数的数组。我们用斜体书写标线。我们通常给标量小写变量名称。当我们介绍它们时,我们会指定它们是什么类型的数字。例如,在定义具有真实价值的标量时,我们可以说“让 s isin; R 是线的斜率”,或者“让 n isin; N 是单位数”,同时定义自然数标量。

向量:向量是数字数组。数字按顺序排列。我们可以通过该排序中的索引来标识每个编号。通常,我们以粗体字体(如 )为向量提供小写名称。以斜体字体(下标)书写向量的名称来标识向量的元素。 是的第一个元素,第二个元素是 ,等等。我们还需要说明向量中存储了哪些类型的数字。如果每个元素都位于 中,并且向量具有 个元素,则向量位于采用 倍的笛卡尔积(表示为 )形成的集合中。:

. (2.1)

我们可以将向量视为识别空间中的点,每个元素沿不同的轴给出坐标。有时我们需要索引向量的一组元素。在这种情况下,我们定义一个包含索引的集合并将该集合写为下标。例如,要访问,和,我们定义集合S = {1,3,6}并写入。我们使用-符号来索引集合的补数。例如,是包含x的所有元素(除之外)的向量,是包含x的所有元素(除,和之外)的向量。


. (2.2)


张量:在某些情况下,我们将需要一个具有两个以上轴的数组。通常情况下,排列在规则网格上且具有可变轴数的数字数组称为张量。我们用这种字体表示一个名为“ A”的张量:A。我们通过写来识别A中坐标为(i,j,k)的元素。






只要矩阵具有相同的形状,我们就可以相互添加矩阵,只需添加它们相应的元素即可:C = A B其中 = 。

我们还可以向矩阵添加标量,或将矩阵乘以标量,只需对矩阵的每个元素执行该操作即可:其中 。

在深度学习的背景下,我们也使用一些不太传统的表示法。我们允许添加矩阵和向量,生成另一个矩阵:,其中 。换句话说,向量 被添加到矩阵的每一行。此速记无需在添加到每一行之前定义一个矩阵。这种对很多地方的隐式复制称为广播


涉及矩阵的最重要操作之一是两个矩阵的乘法。矩阵 和 的矩阵乘积是第三个矩阵。为了定义此乘积, 必须具有与 具有行相同的列数。如果 的形状 ,的形状 ,则 为形状。我们可以通过将两个或多个矩阵放在一起来编写矩阵产品,例如

. (2.4)



请注意,两个矩阵的标准乘积不仅仅是包含单个元素的乘积的矩阵。此类操作存在,称为元素即乘积或哈达德乘积,并表示为 。






与标量乘法不同,矩阵乘法不是可交换的(条件 不始终持有)。但是,两个向量之间的点积是交换的:




这使我们能够通过利用此类乘积的值是标量,因此等于其自身的转置,来演示方程 2.8:





其中 是已知矩阵, 是已知向量,是我们想要求解的未知变量的向量。 的每个元素 都是这些未知变量之一。A 的每一行和 b 的每个元素都提供了另一个约束。我们可以重写方程2.11作为:











2.3 单位和反向矩阵


为了描述矩阵反转,我们首先需要定义单位矩阵的概念。单位矩阵是当我们将该向量乘以该矩阵时不更改任何向量的矩阵。我们将保留的维向量的单位矩阵表示为 正式,,


单位矩阵的结构很简单:主对角线上的所有条目都是 1,而所有其他条目都是零。有关示例,请参阅图 2.2。

A 的矩阵逆表示为 ,它定义为矩阵,例如:


现在,我们可以使用以下步骤求解方程 2.11:




图 2.2:示例标识矩阵:


当然,此过程取决于能否找到 。我们将在下一节中讨论 存在的条件。

当 存在时,几种不同的算法可以以封闭形式找到它。从理论上讲,相同的反向矩阵可用于多次求解。不同值的方程,这主要作为理论工具有用,而且实际上不应在大多数软件应用中实际使用。由于 在数字计算机上只能以有限的精度表示,因此使用 值的算法通常可以获得更准确的 估计值。

2.4 线性依赖性和跨度

要存在 ,方程 2.11 必须对于 的每个值必须只具有一个解。方程系统对于 的某些值也可能存在任何解或无限多解。然而,对于一个特定来说,具有多个但少于无限多个解;如果 和 都是解



要分析方程有多少解,可以将 的列视为指定我们可以从原点进入的不同方向(所有零的向量指定的点),然后确定有多少种方法可以到达. 在此视图中, 的每个元素指定我们在每个方向中应前进多远, 指定向列 的方向移动多远:


通常,这种操作称为线性组合。从形式上讲,通过将每个向量 乘以相应的标量系数并添加结果,给出了某些向量集 的线性组合:



因此,确定 是否具有解相当于测试 是否位于 列的跨度内。此特定范围称为 的列空间范围

为了使系统 具有 的所有值的解,因此,我们要求 的列空间为全部 。如果 中的任何点从列空间中排除,则该点是没有解的 的潜在值。 的列空间为 的所有要求意味着 必须至少具有 列,即 。例如,考虑一个 3times;2 矩阵。目标 是 3-D,但 只有 2-D,因此修改 的值最多只能让我们在 中跟踪二维平面。如果且仅当 位于该平面上时,则方程具有解。

拥有 只是每个点都有解的必要条件。这不是一个足够的条件,因为某些列可能是冗余的。考虑一个 2times;2 矩阵,其中两列都相同。这具有与 2times;1 矩阵相同的列空间,该矩阵仅包含复制列的一个副本。换句话说,列空间仍然只是一行,并且无法包含所有 ,即使有两列。

从形式上讲,这种冗余称为线性依赖。如果集合中的向量不是其他向量的线性组合,则一组向量是线性独立的。如果我们向集合中添加一个向量,该向量是该集中其他向量的线性组合,则新向量不会向集合的跨度添加任何点。这意味着,矩阵的列空间要包含所有 ,矩阵必须至少包含一组 线性独立列。对于方程 2.11 来说,此条件对于 的每个值都有解是必要和充分的。 请注意,要求集合具有完全独立于 的列,而不是至少 。没有一组



原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付
