Fundamental Algorithms for Bioinformatics: Gap Minimization

Slides from University about Fundamental Algorithms for Bioinformatics. The Pdf explores physical mapping by hybridization and the impact of errors, defining the gap minimization problem. The Presentation, suitable for University Computer Science students, includes visual examples and tables to clarify concepts related to algorithm design and gap minimization.

12 Pages

Fundamental Algorithms

for Bioinformatics

Gap Minimization

Algorithm Design

Fund-Algo

Introduction

Physical Map by Hybridization

What happens if there are errors?

Preview

Introduction to Fundamental Algorithms for Bioinformatics

Physical Map by Hybridization and Errors

Fund-Algo 2 Physical Map by Hybridization What happens if there are errors?Introduction Physical Map - the effect of errors x w probes ... s t y u A D B E V F C G t s u v w x y z A 0 0 0 1 0 00 0 B 0 0 0 1 0 0 0 1 C 0 0 0 0 0 1 0 1 D 00 1 0 1 0 1 0 0 0 F 1 1 1 0 1 0 1 0 G 1 0 1 0 1 0 1 0 experiments erroneously report the 1's are missing. This is a gap in the consecutive 1s of the permuted matrix v z x w u y t s A 1 000000 0 B 1 1 0 000 0 0 C 0 1 1 0 000 0 D 0 0 1 0 1 0 0 0 E 0 0 0 1 1 0 1 0 F 00 0 1 1 1 1 1 G 0 0 0 1 1 1 1 1 Fund-Algo 3 DNA z v clones .. 1 0 0 1 0 0 E If these two hybridizationFund-Algo

Physical Map vs. CP1 Models

Introduction Physical Map vs. CP1 - different models 4 " If experiments (hybridizations) are fully reliable (correct) . The permutation corresponding to the correct physical map is among the solutions of the CP1 problem " If experiments might contain errors " False positives are spurious 1's in the matrix · False negatives are missing 1's in the matrix C1P is not the right property to look for . It might not hold anymore . We may instead look for the permutation which has the minimum number of gaps . We assume a sort of parsimony principle here too . We hope that the minimum number of changes to get from the erroneous matrix to one with the C1P property is given by the set of errors that actually happenedIntroduction

Physical Map Error Effects and Gap Minimization

Physical Map - the effect of errors Fund-Algo t s u v w x y z A 0 0 0 1 0 00 0 B 000 1 0 00 1 C 0 0 0 0 0 1 0 1 0 E 1 0 1 0 1 0 0 0 F 1 1 1 0 1 0 1 0 G 1 0 1 0 1 0 1 0 If these two hybridization experiments A 1 0 0 0 000 0 erroneously report no complementarity Some 1's are missing This is a gap in the consecutive 1s of rows C 0 1 1 0 000 0 D 0 0 1 0 1 0 0 0 E 0 0 0 1 1 0 1 0 F 00 0 1 1 1 1 1 G 000 1 1 1 1 1 BLOCK/GAP MINIMIZATION PROBLEM (GAP-MIN) Input: A Boolean matrix A Output: A permutation of the columns of A that minimizes the number of blocks of consecutive 1's in the rows of the resulting matrix Definition. Given a Boolean matrix, where each row has at least one 1. A block in a row is a sequence of 1's surrounded by 0's v z x w u y t s B 1 1 0 0000 0 D 00 1 0 0 1 0Introduction

Permuting Rows to Reduce Gaps

Physical Map - the effect of errors Fund-Algo Let's try to understand what it means to permute a single row in order to reduce the number of gaps in the row obtained after the permutation. A permutation can be interpreted as a walk jumping from one column to another and touching all the columns, i.e., the picture shows the permutation 2, 4, 6, 3, 1, 5 A block is identified by the presence of a 0-> 1 transition followed at some point by a 1-> 0 transition In order not to have to deal differently with transition at the borders and transitions in the middle, let us add a column 0 with a 0 entry and consider instead of a path a cycle, like in the new picture: 0, 2, 4, 6, 3, 1, 5, 0 Then a permutation putting on each row the 1's in consecutive positions will have exactly one 0->1 and one of type 1->0 per row (i.e., one block per row) Each more block will add an additional pair of transitions. Then for a permutation T1, T2, ... , Im (of the m columns): 2(#blocks) = d(πο, πι) + d (πι, π2) + ... + d(In-1, Πη) + d (Πρ. Πρ) 1 2 3 4 5 6 r 0 1 0 1 0 0 0 1 2 3 4 5 6 r 0 0 1 0 1 0 0 The d(.,.) is defined as follows in order to counts transitions d(x,y) = 0 if x = y d(x,y) = 1 ifx#yIntroduction

Physical Map as Tours on Columns

Physical Map as Tours on the set of columns Fund-Algo We interpret a permutation as a cycle on the set of extended columns we add a column made of only O's Each block will add one additional pair of transitions. Then for a permutation T1, T2, ... , Im (for a single row r) 2(#blocks) = d (πο, πι) + d (πι, π2) + ... + d(T2-1, ΠΠ) + d(Im, πο) In the picture we have a permutation which puts the 1's consecutively and the sum on the right hand side is 2 V 0 1 2 3 4 5 6 r 0 0 1 0 1 0 0 The d(.,.) is defined as follows in order to counts transitions d(x,y) = 0 if x = y d(x,y) = 1 if x # y Let A be an n x m matrix. Extend A by a column of 0's, i.e., A[i,0] = 0 for i=1, ... ,n Let It = T1 ... Im be a permutation of the indices of columns. Extend it with Tto = 0 (the column 0 remains where it is) Let #blocks(i) denote the number of blocks in the permuted version of row r; according to T. We have 2(#blocks(i)) = Σj-0 ... ,m d(A[i, n;], A[i, π;+1]) (we assume columns in a circle, so Im+1 = "o ) Summing over all rows, we define S(A, π) = Σi=0 ... ,n Σj=0 ... ,m d (A[i, 7;], A[i, π+1]) Then S(A, TT) = Ei=o ... ,n 2(#blocks(i)) = 2(#blocks in the permuted A)Introduction

Minimizing S(A, π) for Block Reduction

Physical Map as Tours on the set of columns Fund-Algo Let A be an n x m matrix. Extend A by a column of 0's, i.e., A[i,0] = 0 for i=1, ... ,n Let It = T1 ... Im be a permutation of the indices of columns. Extend it with To = 0 (the column 0 remains where it is) We have defined the quantity S(A, π) = Σi=0 ... ,n j=0 ... ,m d (A[i, n;], A[i, j+1]) and we have S(A, TT) = 2(total #blocks in the permuted A) Therefore we have the following A permutation a that minimizes S(A, TT) also minimizes the #blocks in the resulting matrixIntroduction

Hamming Distance and S(A, π)

Physical Map as Tours on the set of columns Fund-Algo Our problem has become: · Given an n x m Boolean matrix A · Find a permutation #that minimizes S(A, π) = Σi=0 ... ,η Σj=0 ... ,m d (A[i, ;], A[i, n;+1]) Definition [Hamming distance] Given two n-bit vectors x =x1 X2 ... Xn , y = y1 y2 ... Yn, the Hamming distance of x and y, denoted by dH(x, y), is the number of positions in which x and y differ. In terms of the d() defined before we have: d.(x, y) = Zi=o ... ,n d(xi, yi) Example: x = 0 0 1010 and y = 1 0 1 0 1 1 | Then d (x, y) = 2 Let's re-write the quantity S(A, 7) so that we can look at it from a different perspective. For j=1, ... , m, let c(j) denote the j-th column of the matrix A. ( c(j) is a vector of n bits) S(A, π) = Σi=0 .... η Σj=0 ... ,m d (A[i, m], Α[i,11+1]) = [j-0 ... ,m' Ei=0 ... ,n d(A[i, n;], A[i, πιο1]) = {j=o., dH( c(Tj), c(Tj+1) ) This is the Hamming distance between columns it; and Tj+1Introduction

Physical Map as TSP-Tours

Physical Map as Tours on the set of columns Fund-Algo Our problem has become: . Given an n x m Boolean matrix A, where c; denotes the j-th column of A . Find a permutation a that minimizes S(A, TT) = Ej=o ... ,m dH( c(Tj), c(Tj+1) ) Definition [TSP-graph of A] Let A be an n x m Boolean matrix. For j=1, ... ,m let c; denotes the j-th column of A. Let co be the n-bit vector of all zeroes. The TSP-graph of A, denoted by KA is a weighted complete undirected graph with: · vertex set V = {co, C1, ... , Cm} • edge weights given by the Hamming distance of the vertices: . for each i, j, the edge e;j =(ci, C;) has weight w(ej) = dH (Ci, C;) Therefore, the above problem is equivalent to: . Given an n x m Boolean matrix A, where c; denotes the j-th column of A . Find a permutation a that minimizes the TSP tour in the TSP-graph of A (as defined above)Introduction

TSP-Tours Example and Solution

Physical Map as TSP-Tours Fund-Algo The Problem of finding a permutation that minimizes the blockss is equivalent to: . Given an n x m Boolean matrix A, where c; denotes the j-th column of A · Find a permutation a that minimizes the TSP tour in the TSP-graph of A (as defined above) Example: Consider the matrix A A C1 C2 C3 C4 r1 0 1 0 1 r2 1 1 0 0 r3 1 0 1 1 C0 2 2 C1 CA 2 2 2 2 1 2 1 C, 3 C3 the TSP-graph of A C0 C1 C4 . 2 . . . 2 1 . 2 1 . C2 C3 A minimum length TSP tour is given by the solid lines. AT C2 C4 C1 C3 r1 1 1 0 0 r2 1 0 1 0 r3 0 1 1 1 The corresponding permutation is C2, C4, C1 , C3 that gives the matrix A" on the left With only one gap. In fact, Cost(tour) = 8 = 2(#blocks)Introduction

TSP-Tours and GAP-MIN Problem

Physical Map as TSP-Tours Fund-Algo The Problem of finding a permutation that minimizes the blocks is equivalent to: . Given an n x m Boolean matrix A, where c; denotes the j-th column of A · Find a permutation a that minimizes the TSP tour in the TSP-graph of A (as defined above) Observations 1. In general, finding a minimum length TSP tour is an NP-hard problem 2. It is NP-hard also when the vertices are binary vectors and the distances are the Hamming distances among these vectors 3. However, the Hamming distance is a metric. Therefore, the TSP problem we have to solve is an instance of the METRIC-TSP 4. METRIC-TSP is Np-hard but also 2-approximable (as shown in the previous lecture) 5. Therefore we have the following : Theorem. We can find a 2-approximation solution to GAP-MIN in polynomial time --- via the 2- approximation algorithm for METRIC-TSP BLOCK/GAP MINIMIZATION PROBLEM (GAP-MIN) Input: A Boolean matrix A Output: A permutation of the columns of A that minimizes the number of blocks in the rows of the resulting matrix

Can’t find what you’re looking for?

Explore more topics in the Algor library or create your own materials with AI.