We will make use of the covariance and the variance matrix, which we define as follows in terms of the expected value:. We make use of the so-called GIID matrices. These are matrices with independent and identically distributed draws from the Gaussian distribution. We assume the use of real matrices, although most techniques we describe extend to the complex case.
This is known as the economic form of the SVD [ 1 ]. In this case, the numerical rank of A may be smaller than the actual rank due to the use of finite precision arithmetic. The factorization 1 can then be written as:. The matrix P does not need to be explicitly formed. Instead, vector I gives the permutation information. Many matrices in applications turn out to be sparse. The simplest sparse format is the so called coordinate sparse format, common to, e. In this format, we store the integers row, column, and floating point value for each nonzero of the sparse matrix A : a set of triplets of the form i j v.
However, we do not need to store all the row and column indices of the nonzero elements. Below, we summarize the two commonly used sparse formats for an example matrix.
In the compressed column format, the dc array stores the nonzero elements, scanned row by row. The array ic stores the row index of the corresponding data element, and the array pc stores the index of the start of each column in the data array dc. Similarly, for the compressed row format, all the column indices of nonzeros are given, but the row information is compressed by giving in pr , the index of the start of each row in dr.
Moreover, if needed, the three vectors for the sparse representation above can be further compressed, with, e. BLAS operations on sparse matrices can be performed directly using these storage formats. For relatively small systems, the construction of such solutions is often desired over least squares formulations, when a solution is known to exist. The construction of factorizations QR, LU with column pivoting can be applied to system solves involving rank deficient matrices. Notice that multiplying by P can be done efficiently, simply be re-arranging the elements of y.
The implementations of the back substitution and forward substitution algorithms are given below. Prior to discussing two-norm or what is more commonly known as Tikhonov regularization, we mention the least squares problem:. Let us now look at the solution of 4 in more detail. As the minimization problem is convex and the functional quadratic, we obtain the minimum by setting the gradient of the functional to zero:. A common choice of solution to the quadratic Eq. Typically, the least squares problem is solved by an iterative method such as conjugate gradients CG or related techniques such as LSQR.
Suppose the noise vector e behaves like white noise. For these reasons, adding additional terms regularization to the optimization problem is often necessary.
Since the functional in brackets is convex, we can get the solution by again setting the gradient to zero:. A : blurring operator - given. These methods have varying degrees of applicability in this framework. Leeuwen, T. In order to discretize the delta source function, we use Kaiser-windowed sinc interpolation Hicks,
Having discussed the least squares approach we turn our attention to the simplest form of Tikhonov Regularization:. Since the functional in brackets is convex, we can get the solution by again setting the gradient to zero:. If we now compute the norm of the solution variance, we obtain:.
Thus, the solution to the system from 8 even if obtained from a CG type scheme after a finite number of iterations relieves the problems due to small singular values and noise, which affect the solution from 5. In practice, a slight generalization of 7 is performed by adding a regularization matrix L :. Generally, L is some kind of sharpening difference operator.
The penalization of the term Lx , thus results in smoothing. If the coordinate system x is one-dimensional, it could take the form:. When the coordinate system corresponding to the model vector x is higher dimensional, L will be correspondingly more complicated and will generally have block structure. Note that the solution to 9 can be obtained through the linear equations:. This is an important point with regards to implementation, as it means that codes which solve the normal equations for standard least squares can be readily modified to solve 9.
In many applications, a sparse solution is sought where, a significant percentage of the coefficients of x are zero. This measure is not a norm, as it does not satisfy, e. In practice, a nonrandom A from a physical application would not satisfy the RIP conditions.
To account for the possibility of employing a sparse promoting penalty and also for more general treatment of the residual term, which we discuss more below, we will consider the two-parameter functional [ 4 ]:. The soft thresholding function satisfies a useful identity:. The spectral norm of a matrix A can easily be estimated using so called power iteration [ 1 ]. Let us assume that A is square. A simple computation yields the dominant eigenvalue.
This algorithm is simple to implement and readily adapts to parallel architectures. Two possible approaches are either to vary the thresholding, starting from soft thresholding and slowly approaching the discontinuous hard thresholding function, or to use a function which better mimics hard thresholding away from zero. The use of different thresholding functions alters the optimization problem being solved. Thresholding-based techniques are simple to implement but are not effective in all situations, particularly when only a few iterations are feasible for example, when A is large.
In this case, two interesting approaches are iteratively re-weighted least squares IRLS and convolution smoothing [ 4 ]. Both techniques replace the nonsmooth part of the functional namely, the absolute value function x by a smooth approximation. Moreover, both techniques have the particular advantage of being able to employ gradient-based methods such as CG at each iteration, considerably increasing the per-iteration performance. The IRLS approach is based on the approximation:. The resulting algorithm [ 4 ] for the minimization of 11 can be written as:.
This volume is a result of two international workshops, namely the Second Annual Workshop on Inverse Problems and the Workshop on Large-Scale Modeling. Buy Inverse Problems and Large-Scale Computations (Springer Proceedings in Mathematics & Statistics) on quicetuawebbte.cf ✓ FREE SHIPPING on qualified.
The diagonal matrices or simply the vectors holding the diagonal elements are updated at each iteration and the system in 6. Another advantage of the IRLS approach is that the powers p can be made component dependent. This then allows for better inversion of partially sparse signals if of course, the location of the sparse part can be estimated with some accuracy.
An example is illustrated in Figure 4 and further discussed in [ 8 ]. We mention here the classical image deconvolution problem. For such situations, a TV total variation norm penalty is frequently used, for purposes of noise removal [ 9 ]. Various iterative schemes have been developed for such penalty functions [ 9 ]. In the case of Tikhonov regularization with smoothing as in 9 more than one parameter is present. The parameter N can vary by application but is typically in the range [ 5 , 10 ].
Two typical Strategies for parameter selection are employed. The first is based on a target residual value, typically determined by the estimate of the noise norm. If however, the target residual norm is not available, other techniques must be used. In practice, neither of these quantities should dominate over the other. Hence, an established strategy is to look for the point of maximum curvature along the L -curve [ 11 ]. If we set:. We can then compute the curvature by the formula:. We illustrate various plots for a synthetic example in Figure 5.
In the residual plot, the target residual is taken to be the magnitude of the noise vector norm. Regularization parameter picking. Set 1: Residuals and percent errors vs. In many cases, the inverse problem may be posed in terms of a nonlinear function F x , t with x a vector of variables, which may be time dependent with parameter t.
Unfortunately, this method is not stable and will typically not converge if initialized far away from a minimum solution [ 1 ]. Improvements include the introduction of a step size parameter:. In many applications, there are large matrices with rapidly decaying singular values.
In such cases, low-rank matrix approximations like the low-rank SVD are useful for compression, speed gains, and data analysis purposes. By the Eckart-Young theorem [ 12 ]:. While the construction of Ak is expensive requiring in most cases the SVD of A , it can be approximated very accurately via randomized algorithms , which requires only the SVD of a smaller matrix. Various randomized algorithms for constructing the low-rank SVD and related factorizations are described in [ 13 ].
Techniques for computing the low-rank SVD of a matrix rely on a simple principal. The Dolfin framework Farrell et al. For the geophysical examples, the finite element method does not easily lend itself to applying a perfectly-matched layer to the problem compared to finite differences, although some progress has been made in this front, i.
In general, finite difference methods are significantly easier to implement, especially in a matrix-free manner, than finite element methods, although the latter have a much stronger convergence theory. Moreover, for systems with oscillatory solutions such as the Helmholtz equation, applying the standard 7 point stencil to the problem is inadvisable due to the large amount of numerical dispersion introduced, resulting in a system matrix with a large number of unknowns.
This fact, along with the indefiniteness of the underlying matrix, makes it very challenging to solve with standard Krylov methods. More involved approaches are needed to adequately discretize such equations, see, e.
Chen, The Jinv package Ruthotto et al. Rather, one should aim to use the strengths of a high level language to express mathematical ideas cleanly in code and exploit the efficiency of a low level language, at the proper instance, to speed up primitive operations such as a multi-threaded matrix-vector product. These considerations are also necessary to manage the complexity of the code and ensure that it functions properly. Researcher time, along with computational time, is valuable and we should aim to preserve productivity by designing these systems with these goals in mind.