Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Object Tracking and Data Analysis with Time-Lapse Video of in Vitro MTLn3 Cell Lines Research Supervisors Dr. Ir. Fons Verbeek Imagery & Media Section Imaging & Bioinformatics Leiden Institute of Advanced Computer Science (LIACS) Leiden University Prof. Dr. Bob van der Water Section Toxicology Leiden/Amsterdam Center of Drug Research (LACDR) Leiden University Master Thesis Report Kuan Yan Section Imaging & Bioinformatics Leiden Institute of Advanced Computer Science (LIACS) Leiden University Abstract This paper considers the problem the improvement and application of the KDE Mean Shift tracking algorithm and data analysis in a migration study of MTLn3 cells. The aim is to convert cell migration videos into numeric description of changes in cell behavior. The choice of KDE Mean Shift Tracking is based on its robust and prominent performance in time-lapse studies compared to other algorithms. Morphology and motility measurements are defined by considering both statistics and true biological translation. The detection of significant behavior change relies on both feature selection and hypothesis testing in the feature space. The feedback from the “wet-lab” to our tracking analysis and results indicate that the accuracy of cell migration analysis is increased significantly and labor time is reduced enormously (over 300%). In addition, it is also believed that the current feature measurements reveal several important behavior measurements, which cannot be determined by manual observation. Supported by the results in cellular analysis we intend to extend our experiments into molecular analysis such as focal adhesions in the near future. This project is conducted under joint operation of Leiden Institute of Advanced Computer Science (LIACS) Leiden University with Leiden/Amsterdam Center of Drug Research (LACDR) Leiden University. -2- Table of Content Abstract............................................................................................................................................ 2 Table of Content .............................................................................................................................. 3 1 Introduction ................................................................................................................................... 5 2 Video Data Analysis ..................................................................................................................... 7 3 Image Segmentation and Cell Labeling ....................................................................................... 9 3.1 Region Based Segmentation ............................................................................................... 10 3.1.1 Isodata Segmentation ................................................................................................... 10 3.1.2 Local Adaptive Segmentation ....................................................................................... 11 3.1.3 Region Based Segmentation in Cell Images................................................................. 12 3.2 Edge Based Segmentation .................................................................................................. 14 3.2.1 Derivative of Gaussian Edge Detector .......................................................................... 14 3.2.2 Laplacian of Gaussian Edge Detector........................................................................... 15 3.2.3 Canny Edge Detector .................................................................................................... 16 3.2.4 Edge Based Segmentation in Cell Images.................................................................... 17 3.3 Region Growth Segmentation.............................................................................................. 19 3.3.1 Region Growth Mechanism ........................................................................................... 19 3.3.2 Different Similarity Criterions......................................................................................... 20 3.3.3 Region Growth Segmentation in Cell Images ............................................................... 20 3.4 Object Labeling and Filtering ............................................................................................... 22 3.4.1 Region Based Labeling ................................................................................................. 22 3.4.2 Object Filtering .............................................................................................................. 22 4 Object Tracking Algorithms ........................................................................................................ 25 4.1 KDE Mean-Shift Tracking Algorithm .................................................................................... 26 4.1.1 KDE Mean-Shift using Steepest AscentOptimization ................................................... 26 4.1.2 KDE Mean-Shift Tracking in Cell Imaging..................................................................... 28 4.2 KDE Pairwise Alignment Tracking ....................................................................................... 31 4.2.1 Pairwise Alignment Algorithm ....................................................................................... 31 4.2.2 KDE Pairwise Alignment Tracking ................................................................................ 33 4.2.3 KDE Pairwise Alignment Tracking in Cell Image .......................................................... 34 4.2.4 Future Development...................................................................................................... 36 4.3 Overlap Ratio Based Tracking ............................................................................................. 37 4.3.1 Overlap Ratio Based Tracking ...................................................................................... 37 4.3.2 Overlap Ratio Based Tracking in Cell Image ................................................................ 37 4.4 Parametric Motion Model Tracking ...................................................................................... 40 4.5 Artificial Object Test ............................................................................................................. 41 4.5.1 Virtual Cell Factory ........................................................................................................ 41 4.5.2 Error Estimation in Artificial Object Test........................................................................ 45 4.5.3 Perturbation Study in Artificial Object Test.................................................................... 46 4.6 Measurements in Cell Migration Video ................................................................................ 49 4.6.1 Static Features .............................................................................................................. 49 4.6.2 Continuous Features ..................................................................................................... 52 4.6.3 Summarization Approaches .......................................................................................... 55 4.6.4 Noise Suppression in Measurement ............................................................................. 57 4.7 Measurement Bias in Time Resolution ................................................................................ 60 4.7.1 Time Resolution Experiment ......................................................................................... 61 4.7.2 Weighted Correlation..................................................................................................... 62 4.7.3 Sampling Rate............................................................................................................... 65 5 MTLn3 Cell Study........................................................................................................................ 67 5.1 High-Throughput Data Analysis ........................................................................................... 67 5.1.1 Feature Selection .......................................................................................................... 67 5.1.2 Statistical Hypothesis Test ............................................................................................ 68 5.1.3 High-Throughput Data Analysis in Cell Migration Study ............................................... 69 5.2 MTLn3 Cell Controlled Experiment....................................................................................... 72 -3- 5.2.1 Experiment Protocol ...................................................................................................... 73 5.2.2 Experiment Conclusion ................................................................................................. 73 5.3 Large Volume siRNA Screening using MTLn3 plGFP.......................................................... 75 6 Conclusion .................................................................................................................................. 77 7 Discussion .................................................................................................................................. 78 Appendix I: Workflow of Video Data Analysis ............................................................................... 80 Appendix II: Index List of Feature Measurements......................................................................... 81 Appendix III: Artificial Object Test Data Sheet .............................................................................. 82 III-1 Artificial Object Test and Perturbation Experiment............................................................. 82 Appendix IV: MTLn3 Cell Migration Data Sheet............................................................................ 84 Appendix V: Large Volume siRNA Screening Results .................................................................. 85 Appendix VI: Experiment Protocol of siRNA Screening ................................................................ 88 Reference ...................................................................................................................................... 89 -4- 1 Introduction This report is contributed to the first stage of a cancer research project. This project is conducted under collaborated operation of Section Imaging & Bioinformatics LIACS and Section Toxicology LACDR, which is supervised by Dr. Ir. Fons Verbeek and Prof. Dr. Bob van der Water. The main goal of this project is to establish the correlation between phenotypic behavior and focal adhesion behavior. Phenotypic research, contrary to genotypic research, is a study of how transfection affects cytomorphology and cell migration. Compared with genotypic research, phenotypic research has to major differences. First, genotypic behavior must be combined with special environmental condition in order to result in certain phenotypic behavior. Second, certain phenotypic behavior may be a result of multiple-transfection. However, sole study of phenotype may not reveal the true mechanism of RNA translation. Thus, it must be combined with integrin protein. Focal adhesion is believed to be an important integrin signaling kinase, which is found at the cell [4] membrane where the cytoskeleton interacts with proteins of the extracellular matrix . It is also believed that a correlation existing between phenotypic behavior and focal adhesion behavior. If such correlation can be established, it may reveal the molecular mechanism of cellular communication. Current study of both phenotype and focal adhesion relies mostly on manual observation and judgment. Therefore, object tracking is employed to provide objective measurements and possibility of high-throughput. Object tracking is a prosperous research field in computer vision. The time of its exact appearing is ambiguous and oldest article can be dated back to early 1992. Object tracking has been successfully applied in video surveillance [8], traffic control [9], military intelligence, etc. The applications of object tracking in biological and medical domain appear in 1999[10]. Recent study of blood cell experiment [12] shows a positive feedback to object tracking. In this report, we will demonstrate the performance estimation of existing tracking algorithms in complicated time-lapse video of living cells. The performance of each tracking algorithms is tested by an artificial object test. The major function of this artificial test is to create objects that simulating the cell migration. Tracking algorithms are applied to the video created from these artificial objects. Tracking biases of each algorithm are studied under different parameter settings for artificial objects. The tracking bias of the time-lapse recoding movie is done by specialist visual confirmation. Kernel Density Estimation (KDE) Mean Shift tracking shows nearly ideal performance in both artificial tests and cell movies. In our artificial test, it has an accuracy of 96% in optimal parameter settings, where cells are uniformly distributed, and 85% in random parameter settings, where cells are randomly distributed. The tracking result of cell movies shows that KDE provides an accuracy of 75%~85% by specialist visual confirmation. Measurements given by tracking algorithm show a positive outcome in preserving global trends of cell migration. Yet, the fundamental principle of current object tracking algorithms assumes a significant discrimination between aligned subjects and non-aligned subjects while such trait is not guaranteed in biological imaging. Therefore, our study also includes finding the factors that may affect the tracking accuracy in biological domain while tracking bias or performance estimation is the main criterion being involved. With cell trajectories, feature measurements of cell migration can be extracted from raw video data. These feature measurements are proven valuable in MTLn3 cell study. Several predefined experiments such as EGFR upregulation and S178A downregulation are confirmed by current measurements. By implementing feature selection and statistical test with current feature -5- measurement, full automatic and high-throughput data analysis of cell video is showing its great potential in cell migration study. The major sections of report are given as follows: Chapter 2, Video Data Analysis, focuses on background information of time-lapse video analysis. It describes the major image analysis principle, which will be involved in later section. Chapter 3, Image Labeling, focuses on introducing popular segmentation algorithm. Image labeling procedure is defined as the step to label all objects in image based on segmentation result. Region based segmentation, edge based segmentation, and region growth segmentation is introduced in this chapter. Chapter 4, Object Tracking Algorithm, demonstrate our research of several existing tracking algorithms and error estimations replying on artificial object tests. Tracking analysis of real video (MTLn3 cell) is also given in this chapter. Chapter 4, Section 4.1~4.4 focus on four tracking algorithms under different principles. It covers 1 affine transformation tracking , KDE mean shift tracking, overlap tracking, and KDE pairwise tracking. Chapter 4, Section 4.5, Measurements in Cell Migration Video, will introduce the motility and morphology measurement employed in our tracking analysis. Chapter 4, Section 4.6, Artificial Object Test, will describe the principle of artificial object test. Under different parameter setting, three tracking algorithm will be compared. The exclusion of affine transformation tracking is due to its limitation by definition, which is discussed in chapter 4. Perturbation test on parameter setting demonstrates which factors may cause tracking bias. Chapter 4, Section 4.7, Measurement Bias due to Time Resolution, is an important study of bias due to sampling rate. Sampling rate is a critical factor in time-lapse video analysis study. This chapter demonstrates how sampling rate affects measurements. Chapter 5, section 5.1 MTLn3 Cell Test, shows several sample of applying tracking analysis in MTLn3 cell migration study. Well-known experiments such as EGF stimulation, EGFR upregulation, and S178A downregulation will be introduced and tracking analysis results are studied and compared. The measurement results show nearly identical traits described in existing literatures related to these experiments. Chapter 5, section 5.2, High-throughput Data Analysis, is a study of full automatic cell video analysis. With the help of tracking analysis and feature measurement, we are able to collect numeric estimation of cell migration. Combing with statistical tools such as feature selection and statistic test, significant cell behavior changes can be determined automatically. Chapter 5, section 5.3, Large Volume siRNA Screening using MTLn3 plGFP, is a study of applicability of tracking analysis. A demonstration of high-throughput data analysis with siRNA screening experiment will be given in this section. 1 Also known as rigid object tracking -6- 2 Video Data Analysis Object or video tracking shows value applications in video surveillance, traffic control, military intelligence, etc. Combing with pattern recognition and other statistic approaches, important information can be extracted from video data. We now implement video data analysis for cell migration study. In order to standardize procedure (see figure 1), we define the following steps as the guideline for our research: • Image Enhancing • Image Segmenting • Object Labeling • Object Tracking • Feature Measuring • Data Analysis Figure 1 Image Enhancing is the step to prepare raw image for further processing. Live cell image contains a certain level of noise, which may not be easily improved by microscope settings. Inconsistent staining is a common noise in fluorescence imaging. Figure 2 and Figure 3 shows two cancer cells cropped from the same image. Threshold segmentation method may extract foreground as figure 2, but not as figure 3. Only partial of cell in figure 3 has a significant intensity value. Digital processing must be implemented in for further improvement. -7- Figure 2 Figure 3 Segmentation is an important step in image analysis. In order to label object from image, pixels belonging to foreground must be first segmented from other pixels. Unlike segmentation in real world image, microscopy image has a significant discrimination between pixels of foreground and background. Our study shows that a sensitive segmentation method is the key to obtain maximum tracking efficiency. The major purpose of Object Labeling is to label isolated foreground region defined by segmentation. Each object will be given a unique label. However, labeling method must be consistent for each frame. The order of labeling may be critical for tracking algorithm such as KDE pairwise tracking. Object Tracking is the major aspect of research. The basic idea of tracking algorithm is to build up alignment between objects in subsequential frames. Most tracking algorithms rely on similarity measurements. If a strong connection can be built between two objects in subsequential frames, an alignment will be made. However, high similarity of statistic measurement may not have biology significant. The chance of find false candidates in cell video, which is due to high similarity of objects, is considerable significant than any other tracking problem. Feature Measuring is the purpose of tracking. If sensitive numeric measurement can be extracted from cell tracking result, a more objective conclusion can be drawn. Compared with specialist observation, which is relative subjective and doubetful, feature measurement based on video tracking is indeed a major improvement in cellular and molecular video analysis. Data Analysis is the step of finding implicit structure from feature measurements. Pattern recognition and other statistic analysis are employed as major tools. We expect these feature measurements may eventually establish the correlation between phenotypic behavior and focal adhesion behavior. -8- 3 Image Segmentation and Cell Labeling The image segmentation is the operations to define the foreground from raw image. Identifying foreground in image is non-trivial. There is no universal solution to all problems. Different segmentation algorithm may only be suitable for several particular problems. When foreground is defined, unique label is given to each isolated foreground region. Thus, potential regions of cells can be derived from raw image. We will briefly introduce three types of image segmentation algorithms. • Region Based Segmentation • Edge Based Segmentation • Region Growth Segmentation [16] Region based segmentation algorithm is regularly used in microscopy image analysis . It assumes that there exists a significant intensity boundary between foreground and background. With a global threshold or local threshold value, foreground can be defined. Mechanism such as iterative mean calculation of isodata threshold may provide automatic finding of intensity threshold. Edge based segmentation is a special segmentation method based on edge detection[16]. It determines the significant gradient changes in intensity value. Laplacian filter of Gaussian edge detection and Canny edge detection is commonly used. Based on a closed contour path, object labeling can be given to each isolated contour path. However, convolution algorithm may result in discontinuous edge. Thus, a path finding method is always needed for edge based segmentation. Another approach based on pixel similarity is the region growth segmentation[16]. Region growth algorithm intends to find similar pixels around seeds based on predefined similarity criterions. The dynamic programming grows from one pixel to its neighbor pixels until criterions are no longer fulfilled. The differentiation of gradient of intensity and intensity difference are frequently used as similarity criterions. When the image is segmented into foreground and background, each isolated region will be given a unique label. The procedure of labeling is an important step before tracking. -9- 3.1 Region Based Segmentation Region Based Segmentation[16] is a segmentation technique given a predefined threshold value. Pixel with intensity higher than threshold will be defined as foreground, otherwise background. The binary labeling will be used for object labeling (see figure 4). Pseudo Code: When foreground has a higher intensity than background For each pixel in image If current pixel has intensity larger than threshold value Pixel is labeled as foreground Else Pixel is labeled as background For each pixel Label pixel as background false Intensity>threshold true Label pixel as foreground Figure 4 3.1.1 Isodata Segmentation Isodata segmentation [5][16] is defined as automatic threshold finding procedure (see figure 5). The basic principle is identical as region based segmentation. The threshold is calculated based on iteratively segmented image. A new threshold is defined as the mean value given average intensity in foreground and background. The dynamic programming will stop when threshold calculation is convergent. Pseudo Code: Given an initial threshold (also known as pervious threshold) Do { For each pixel in image If current pixel has intensity larger than threshold value Pixel is labeled as foreground Else Pixel is labeled as background For each pixel in image based on labeling result Calculate average intensity for foreground based on binary mask Calculate average intensity for background based on binary mask New threshold = (foreground average intensity + background average intensity)/2 } While (new threshold is not equal to pervious threshold) - 10 - Single Threshold Segmentation Procedure on given threshold All pixels are labeled Define new Threshold=(average foreground + average background)/2 Calculate average intensity for foreground and background New threshold=Old threshold false true Labeling convergent Figure 5 3.1.2 Local Adaptive Segmentation Instead of looking for global significant[16], local adaptive segmentation uses neighbor pixels around targeted pixel to find local significant (see figure 6). The segmentation will only be performed for the center pixel. Pseudo Code: For each pixel in image Define neighborhood region by 8-connectivity based on current pixel Perform segmentation on neighborhood region Labeling only current pixel by segmentation result - 11 - For each region around pixel Use threshold from pervious pixel Monochromatic Is Polychromatic Polychromatic Perform segmentation based on neighborhood region Labeling Center Pixel Figure 6 3.1.3 Region Based Segmentation in Cell Images The table 1 gives a sample of converting a raw cell image into binary mask using region based segmentation and its ramifications. Although, it appears that region based segmentation shows a nearly prefect binary mask, it requires operator guiding for each images. Manually choosing threshold is not considered as high-throughput method. Thus, automatic approach like isodata segmentation is more preferred in cell image analysis. Table 1 Raw Image Region based Segmentation Isodata Segmentation Local Adaptive Segmentation with Isodata - 12 - - 13 - 3.2 Edge Based Segmentation Edge based segmentation[16] follows the principle of finding isolated contour path (closed edge). A binary mask of object can be defined by assign all pixels within its edges. Edge of object can be determined by edge detection method. An edge is a set of connected pixels that are on the boundary between two regions. Edge detection assumes that significant gradient changes will occur at edge (regularly in intensity space). However, edge detection is susceptible to noise. Noise suppression is required before edge detection can be performed. Gaussian blurring is a frequently used noise suppression procedure in edge detection. We will briefly introduce the following edge detection methods: • Derivative of Gaussian (DoG) Edge Detector • Laplacian of Gaussian (LoG) Edge Detector • Canny Edge Detector Still, edge detection procedures do not guarantee a continuous contour around object. An edge linking operation (path finding) is needed when edge detection procedure is accomplished. [19] Different edge linking schema can be used and we will use Boie-Cox algorithm and hard c[20] Mean algorithm. 3.2.1 Derivative of Gaussian Edge Detector DoG edge detector[12][16] is a gradient-based edge detection procedure. With convolution operation, DoG emphasizes pixels with a significant gradient change. The convolution operator is the spatial approximation of frequency multiplication operator in frequency domain. Wrapper frequency (also known as masking frequency) must be transformed into spatial matrix based on either square distance or circle distance. Convolution operator[16] is required for responding calculation. Pseudo Code: Convolution operation is performed for each pixel Each significant responding of DoG is a potential edge Magnitude of x-direction and y-direction is calculated For each magnitude of pixel If magnitude is higher than threshold It is an edge Else It is not an edge Each discontinuous contour is connect by edge linking schema The filter can be described as follow: G (r ) = 1 2πσ − e DoG → G ' (r ) = r2 2σ 2 ,r = −r 2 −r 2πσ x2 + y2 3 e 2σ 2 - 14 - Figure 7 Edge pixels can be determined by threshold segmentation on filtered image. A common approach is to label all pixels as edge when its edge strength is above a threshold (see figure 7). 3.2.2 Laplacian of Gaussian Edge Detector LoG edge detector[3][12][16] is designed for edge detection based on LoG responding (see Figure 8). The simplest edge detection calculation is to threshold the LoG output at zero crossing (see Figure 9). These boundaries can then be easily detected and marked in single pass. Pseudo Code: Convolution operation is performed for each pixel Each significant responding of LoG is a potential edge For each responding of pixel If responding is a zero-crossing Pixel is an edge Else Pixel is not an edge Each discontinuous contour is connect by edge linking schema The Laplacian filter of Gaussian kernel defined as: G (r ) = 1 2πσ − e r2 2σ 2 ,r = x2 + y2 ∂ I ∂ I L(x, y ) = 2 + 2 , LoG → G" (r ) = ∂x ∂y 2 2 −1 2πσ 3 Figure 8 Response from LoG (see Figure 9) - 15 - −r2 e 2σ 2 ⎡ r2 ⎤ ⎢1 − 2 ⎥ ⎣ 2σ ⎦ Figure 9 3.2.3 Canny Edge Detector Canny edge detector[3][12] [16] [18] is a robust method, which widely used in autonomous system. Its aim was to discover the optimal edge detection algorithm. In this situation, an "optimal" edge detector means: • • • Good detection - the algorithm should mark as many real edges in the image as possible. Good localization - edges marked should be as close as possible to the edge in the real image. Minimal response - a given edge in the image should only be marked once, and where possible, image noise should not create false edges. To satisfy these requirements Canny used the calculus of variations - a technique which finds the function which optimizes a given functional. The optimal function in canny edge detector is described by the sum of four exponential terms, but can be approximated by the first derivative of a Gaussian. Step 1: Noise Suppression Because the Canny edge detector uses a filter based on the first derivative of a Gaussian, it is susceptible to noise present on raw unprocessed image data, so to begin with the raw image is convolved with a Gaussian filter. The result is as a slightly blurred version of the original, which is not affected by a single noisy pixel to any significant degree. Step 2: Calculating intensity gradient of the image An edge in an image may point in a variety of directions, so the Canny algorithm uses four filters to detect horizontal, vertical and diagonal edges in the blurred image. The edge detection operator (Roberts, Prewitt, Sobel for example) returns a value for the first derivative in the horizontal direction (Gy) and the vertical direction (Gx). From this the edge gradient and direction can be determined: 2 1⎤ ⎡1 ⎡1 0 − 1 ⎤ G = G x2 + G y2 , G x = ⎢⎢2 0 − 2⎥⎥ * A , G y = ⎢⎢ 0 0 0 ⎥⎥ * A ⎢⎣− 1 − 2 − 1⎥⎦ ⎢⎣1 0 − 1⎥⎦ ⎛G ⎞ θ = arctan⎜⎜ x ⎟⎟ ⎝ Gy ⎠ - 16 - The edge direction angle is rounded to one of four angles representing vertical, horizontal and the two diagonals (0, 45, 90 and 135 degrees for example). An edge point is defined to be a point whose strength is local maximum in the direction of gradient. Step 3: Non-maximum suppression The edge points determined in Step 2 give rise to ridges in the gradient magnitude image. The algorithm then tracks along the top of these ridges and sets to zero all pixels that are not actually on the ridge top to give a thin line in the output, a process known as nonmaximal suppression. The ridge pixels are then thresholded using two thresholds, T1 and T2, with T1<T2. Ridge pixels with values greater than T2 are said to be “strong” edge pixels. Ridge pixels with values between T1 and T2 are said to be “weak” edge pixels. Step 4: Enclosing Contour Finally, the algorithm performs edge linking by incorporating the weak pixels that are 8-connected to the strong pixels. 3.2.4 Edge Based Segmentation in Cell Images The segmentation results of three methods are shown in Table 2. DoG and LoG shows less desired results. Edge detection methods are too sensitive to noise. Small structure is a common noise that requiring image to be blurred before edge detection. In addition, the structure of cell causes the detection of extra edge, which is in fact not required. Canny edge detector is more resistant to noise with its non-maximum suppression. Less significant gradient change will be removed. We believe that canny filter may be a robust approach for segmentation problem. Still, different edge linking schema may result in different contours or even introduce further error, which is beyond the scope of this report. Table 2 Original Image Edge by DoG Filter Edge by LoG Filter Edge by Canny Filter - 17 - - 18 - 3.3 Region Growth Segmentation 3.3.1 Region Growth Mechanism Region growth segmentation assumes a certain level of similarity among pixels belonging to same object. The algorithm will start from one prior pixel and grows until predefined similarity criterion is no longer fulfilled[12][16]. The similarity criterion (also known as homogeneous criterion) is regularly defined on pixel value considerations or on the anticipated size or shape of the object[12]. Figure 10 shows a successful segmentation sample on right kidney from a CT scan result based on intensity similarity. Figure 10 Pseudo Code: Interface Homogeneity_Detector If condition fulfills Return True Else Return False Function Region_Growth With given seed position For each pixels in neighbor If Homogeneity_Detector (current pixel) is fulfilled Pixel is given the same label from seed Perform Region_Growth (current pixel) Else Stop Moving to next pixel - 19 - Stop growing if no pixel fulfills similarity criterion Move to a new isolated seed To locate the start position (seed) automatically is non-trivial in cancer study. Moreover, starting from single pixel may not provide too many information in cell image. In our application, region growth segmentation will use a less optimal threshold segmentation result as an initial scenario and starts growing around the known region. 3.3.2 Different Similarity Criterions Several techniques are derived from basic intensity similarity criterions include: • Intensity difference • Intensity gradient • Probability threshold Intensity difference based similarity criterion assumes that all pixels in interesting region share a range of intensity value. Any pixel connected to initial region will be given the same label if its intensity is in the predefined range (see Figure 6)[12]. Pseudo Code: Function Intensity_Range (current pixel) interface Homogeneity_Detector If intensity value for current pixel is in given range Return true Else Return false Intensity gradient is a hybrid technique relies on edge detection and intensity information. [14] Significant gradient change suggests a nature boundary of interesting region . Pseudo Code: Function Intensity_Range (current pixel) interface Homogeneity_Detector If intensity gradient value for current pixel is in given range Return true Else Return false Probability threshold cuts off pixels with probability estimation lower than threshold given density estimation in feature spaces[15]. A predefined initial region is required. In addition, a density kernel must be defined. Pseudo Code: Function Intensity_Range (current pixel) interface Homogeneity_Detector If probability value for current pixel is larger than given threshold Return true Else Return false 3.3.3 Region Growth Segmentation in Cell Images Region growth segmentation requires pixels belonging to same object share a strong feature, such as very similar intensity value. However, cell image has many noises, which is difficult to find such strong feature. Our current implementation of region growth segmentation using gradient changes shows no significant improvement for segmentation result (see table 3). - 20 - Table 3 Original Image Local Adaptive with Isodata Region Growth Segmentation - 21 - 3.4 Object Labeling and Filtering 3.4.1 Region Based Labeling Segmentation procedure results in a binary image that representing foreground and background. Yet it cannot be used for tracking. Each object in foreground must be identified. A common approach is to give unique label to each separated foreground region (or interested region). Our version of object labeling is given as following (see figure 11): Pseudo Code: Region_Based_Growing_Labeling(a pixel, a link list) If current pixel is a foreground Register the current pixel to current link list Set current pixel to background Image_Labeling( a binary image) For each pixel in segmented image If current pixel is a foreground Region_Based_Growing_Labeling(current pixel, a link list for object) For each pixel around current pixel The current pixel is set to neighbor pixel and labeling is perform recursively If no neighbor pixel is foreground Current link list is terminated and a new object is created Figure 11 A sample of labeling program growing from one pixel to the rest However, due to intensity and many other factors, segmentation regularly results in a less desired result of foreground. Therefore, objects extracted from segmented image will be filtered and all abnormal objects will be removed before tracking starts. 3.4.2 Object Filtering According to the summarization of practical error from segmentation result, we define the following rule of object filtering. For clarification, some filtering rules are implemented during the performance of tracking while others are performed right after segmentation (see figure 12): - 22 - Figure 12 a flow chart of filtering Filtering after Segmentation Small objects are often dirt or died cells. It has no research value and causes bias in feature measurement. We have encountered both immobile and fast moving noises in experiment (section 5.2). They both result in an inconsistent measurement against initial experiment design. Therefore, they must be filtered. Extreme large objects are commonly clustered cell due to less-optimal segmentation result. Using separation method such as watershed, extreme large object may be avoided. Yet, the result of separation may consist of bias due to different distance transformation. Cells touching the edge of image are deformed. In addition, these cells are moving either away or into the screen. It leads to a dilemma for tracking algorithm due to both deformation and unpredictable direction of motion. To avoid tracking bias, cell-on-edge is often removed. Filtering during Tracking Due to high similarity between objects in microscopy image, it is essential to minimize the number of possible candidate. This is accomplished by implementing candidate filtering. Based on our research on artificial object (section 4.6) and MTLn3 (section 5.2), we notice that the most efficient way to limited the candidate is to set a distance radius around the predecessor cell. A predefined distance is given for each aligning. All cells that is outside of radius will be discarded from candidate. - 23 - Filtering mechanism shows clearly improvement on tracking accuracy, yet it may also result in problems such as early termination of trajectory or measurement bias (see chapter 4). Since bias of both tracking and measurement are dynamically linked, an interesting research may be conducted on the optimal seeking of both tracking and measurement. In next chapter, we will introduce tracking algorithm. - 24 - 4 Object Tracking Algorithms In order to perform phenotypic analysis with time-lapse video (see introduction), object tracking is employed. The fundamental principle of object tracking is to build up alignment between two objects in subsequential frames. Therefore, most tracking algorithm relies on similarity measurement. This chapter will introduce four tracking algorithms based on different principle of similarity measurements. These algorithms include: • KDE Mean Shift Tracking Algorithm[1] • KDE Pairwise Alignment Algorithm • Overlap Ratio Tracking Algorithm • Affine Transformation Based Algorithm[2] However, live cell video does not share several common characteristics of real world video. Thus, the performance of each tracking algorithm in time-lapse video will be compared to find a robust and sensitive tracking algorithm in live cell video. In addition, result of the chosen tracking algorithm will be demonstrated in this chapter. - 25 - 4.1 KDE Mean-Shift Tracking Algorithm Kernel Density Estimator with Mean Shift Tracking[1] is a popular kernel based tracking algorithm in computer vision. Kernel Based Mean-Shift tracking is considered as real world application of localization approach. It is assumed next the support of desired modules, which should provide (also see figure 13) (a) detection and localization in the initial frame of the objects to track (targets), and (b) periodic analysis of each object to account for possible updates of the target modules. The KDE Mean Shift Tracking consists of two steps: 1. The gradient of the nonparametric density function is estimated by a predefined kernel density estimator. 2. Starting from each sample point, the mean shift procedure iteratively finds a path along the gradient direction away from the valleys and towards the nearest peak. Figure 13 Workflow Diagram of KDE Mean Shift Tracking Algorithm The nearest peak, which is the maximum possibility estimation based on nonparametric density estimator (model), is the theoretical center of true aligned object. Object, whose center of mass is closest to the peak, will be the true aligned object. The method is introduced given as following. 4.1.1 KDE Mean-Shift using Steepest Ascent Optimization Given n data points xn in the d-dimensional space Rd, the kernel density estimator with kernel function K(x) and a window bandwidth h, is given by 1 fˆn ( x ) = d nh n ⎛ x − xi ⎞ ⎟ h ⎠ ∑ K ⎜⎝ Where the d-variate kernel K(x) is nonnegative and integrates to one. A widely used class of kernels is the radially symmetric kernels - 26 - ( ) K ( x ) = c k ,d k x 2 Where the function k(x) is called the profile of the kernel Normalization constant c k ,d makes K(x) integrate to one. The density estimator can be rewritten as c k ,d fˆh ,k ( x ) = d nh Where ⎛ x − xi ⎜ k ∑ ⎜ h i =1 ⎝ n 2 ⎞ ⎟ ⎟ ⎠ c k ,d is the normalization constant Commonly used kernels are the Epanechnikov kernel ( ⎧ 1 −1 ⎪ c d (d + 2) 1 − x K E (x ) = ⎨ 2 ⎪0 ⎩ 2 ) 0 ≤ x ≤1 x >1 And the multivariate Gaussian kernel K N ( x ) = (2π ) − d 2 ∑ − 1 2 e 1 − ∑ −1 x 2 2 ∑ is the covariance coefficient of module. , where The standard mean shift algorithm is a steepest ascent procedure, which requires estimation of the density gradient: ⎛ x − xi ∇fˆh ,k ( x ) = d + 2 ∑ ( xi − x )g ⎜ ⎜ h nh i =1 ⎝ ⎤ ⎡ n ⎛ x − xi 2 ⎞ ⎟ ⎥ ⎢ ∑ xi g ⎜ ⎜ h ⎟ i =1 ⎢ ⎝ ⎠ − x⎥ = c k ,d fˆh ,G ( x )⎢ ⎥ 2 n ⎛ ⎞ − x x ⎥ ⎢ i ⎜ ⎟ ⎥ ⎢ ∑ g⎜ h ⎟ ⎠ ⎦ ⎣ i =1 ⎝ 2c k , d n 2 ⎞ ⎟ ⎟ ⎠ , where g ( x ) = − k ' ( x ) can in turn be used as profile to define a kernel G(x). The kernel K(x) is called the shadow of G(x). fˆh ,G ( x ) is the density estimation with the kernel G. c k , d is the normalization coefficient. The last term is the mean shift ⎛ x − xi 2 ⎞ ∑i =1 xi g ⎜⎜ h ⎟⎟ ⎠−x ⎝ m( x ) = 2 ⎛ x−x ⎞ n ∑i =1 g ⎜⎜ h i ⎟⎟ ⎠ ⎝ n which is proportional to the normalized density gradient and always points toward the steepest ascent direction of the density function. The standard mean shift algorithm iteratively performs • ( ) Computation of the mean shift vector m x k +1 k ( ) • Updating the current position x = x +m x Until reaching the stationary point, which is the center of candidate cluster (See Figure 14). k - 27 - k Figure 14 4.1.2 KDE Mean-Shift Tracking in Cell Imaging With object annotation, potential region of cells are segmented into binary mask and labeled with unique number. Each cell is converted into nonparametric kernel estimator (module). Steepest descent is performed on the gradient space of density estimation. The illustration of tracking one cell in two subsequential images is given in figure 15 and 16. The initial cell is segmented from raw image and converted into density estimator. Each extracted cell will be converted into a three dimensional module. Each dimension is defined as (1) All x-coordinate of binary mask (2) All y-coordinate of binary mask (3) All intensity value of binary mask. The probability estimation is calculated based on following equation. For detail, please prefer section 4.1.1 KDE Mean-Shift using Steepest Ascent Optimization. c k ,d fˆh ,k ( x ) = d nh ⎛ x − xi k⎜ ∑ ⎜ h i =1 ⎝ n 2 ⎞ ⎟ ⎟ ⎠ Object to be tracked The next frame image (using raw image) Figure 15 Figure 16 The figure 17 is the surface plotting of the possibility estimation given a normalized model, which is converted from one labeled cell. Each peak represents the density estimation for each candidate by given module. The closest and highest peak will be the true alignment. However, normalized models are not implemented in our application. The reason is that normalization of - 28 - initial models will introduce remote candidate. In fact, the idea of mean shift is to measure the closest shifting of distribution. Thus, local optimization is preferred. Figure 17 The table 4 shows two samples of cell tracking in (a) video with low cell density and high magnification level (b) video with high cell density and low magnification level. Trajectories of two samples are approaching to 90% of tracking accuracy according to specialist visual observation. Table 4 First Frame of Video (a) First Frame of Video (b) Last Frame of Video (a) Last Frame of Video (b) - 29 - Trajectories of (a) Trajectories of (b) - 30 - 4.2 KDE Pairwise Alignment Tracking KDE Mean Shift tracking algorithm may result in combining of two cell trajectories into one. Such error will occur when two cells are nearly identical. In addition, cell migration, such as cell differentiation, will separate one object into two true alignments that both of them must be tracked. KDE Mean Shift tracking can only have one true candidate by definition, thus cannot solve such problem. In order to solve both of these potential problems, we propose an experimental tracking algorithm using pairwise alignment principle. The algorithm can be illustrated as following. #pseudo code Labeling all objects m in each frames For each m in frames Calculate sum density pairwise distance in frame m and m+1 Perform backtracking to find optimal through pairwise distance matrix with largest total score Adding new path The current KDE pairwise alignment relies on cumulative density estimation as its distance table. However, the chance of two identical objects is much higher in cell tracking, thus statistical significance may not represent biological meaning. In KDE pairwise alignment tracking, this drawback is becoming a decisive issue. 4.2.1 Pairwise Alignment Algorithm Pairwise alignment algorithm [5] (see Figure 19) uses a distance transformation matrix of two sets of objects. The distance may not necessary to be the Euclidean distance. It can also be KDE (kernel density estimation, MSE (mean square error) estimation, or other distance functions. The pairwise algorithm aims to find out the alignment path with highest score (prefect matching). We will briefly explain the pairwise alignment algorithm in this section. For gene sequence matching, a global pairwise alignment[5] can be calculated (also see figure 18): Figure 18 INPUT: Two sequences S = s1...sn and T = t1...tm (n and m are approximately the same) - 31 - QUESTION: Find an optimal alignment (see figure 16). Notation: Let σ(a, b) be the score (weight) of the alignment of character a with character b (including spaces). Lemma: Let V(i, j) be the optimal alignment score of S1...i and T1...j (0 ≤ i ≤ n, 0 ≤ j ≤ m). V(A,B) has the following properties: Recurrence Programming for (1 ≤ i ≤ n,1 ≤ j ≤ m ) ⎧V (i − 1, j − 1) + σ (S i , T j ) ⎪ V (i, j ) = max ⎨V (i − 1, j ) + σ (S i ,− ) ⎪V (i, j − 1) + σ (−, T ) j ⎩ Given σ (Si , T j ) is a mismatching penalty, it will be zero for a true matching σ (Si ,− ) is a deletion penalty, it stands for a deletion of nucleotide Given σ (−, T j ) is an insertion penalty, it stands for an insertion of nucleotide Given In normal application, a long sequence of deletions or insertions will be given a sequential gap penalty, which using an incremental penalty calculation. The sequential gap penalty is calculated as following, where open is the gap opening penalty, ext is the gap extension penalty, and n is the number of gap extended. penalty = open + ext ⋅ n If no incremental gap penalty is used, the pairwise alignment will provide global alignment, otherwise local alignment[5]. Local alignment is a special type of pairwise alignment, which long sequence of gaps is discouraged. Global alignment will always return a sequence with the same length of targeted sequence. Local alignment will return a sequence with the highest matching score with a minimum number of gaps. Figure 19 - 32 - In order to find a prefect match, the backtracking algorithm is used to find the path with maximum match score in distance table. The backtracking[13] is described as following (see figure 20): Figure 20 1. Starting at Root, your options are A and B. You choose A. 2. At A, your options are C and D. You choose C. 3. C is bad. Go back to A. 4. At A, you have already tried C, and it failed. Try D. 5. D is bad. Go back to A. 6. At A, you have no options left to try. Go back to Root. 7. At Root, you have already tried A. Try B. 8. At B, your options are E and F. Try E. 9. E is good 4.2.2 KDE Pairwise Alignment Tracking As pairwise algorithm used in gene matching, distance table and gap penalty is introduce for KDE Pairwise Tracking. However, local alignment is preferred in current situation while global alignment introduces many high matching with no biological meaning. In high identical object set such as cell, the chance of finding two identical objects is relatively higher than other tracking scenario. Table 5 shows a sample of cumulative density estimation. The density estimation is calculated based on kernel density estimator. c k ,d fˆh ,k ( x ) = d nh Table 5 obj1 obj1 7.8 obj2 5.5 obj3 4.3 obj4 5 obj5 3.1 ⎛ x − xi ⎜ k ∑ ⎜ h i =1 ⎝ n obj2 7.5 6.5 4.4 5.1 3.1 obj3 6.5 6.8 4.6 5.3 3.1 2 obj4 5.8 6.5 4.4 5.3 3.1 ⎞ ⎟ ⎟ ⎠ obj5 5.4 5.3 4.3 5.1 5.1 1 (x ) = (2π )− 2 ∑ − 2 e d where K N obj6 3.2 3.5 4.3 5.1 3.1 obj7 3.2 3.3 4.3 5.1 3.1 1 − ∑ −1 x 2 2 obj8 3.2 3.3 4.3 5.1 3.1 For pairwise alignment tracking, the gap penalty is calculated as following. Unless for very high diversity, there will be very little double matching in final sequence. Therefore, we discourage the gap by adding incremental penalty for each extension of gaps. penalty = ext n - 33 - Table 6 All Trajectory (Trajectory of object 3) obj3 trajectory 160 140 120 100 80 obj3 trajectory 60 40 20 0 0 (Trajectory of object 4) 100 200 300 400 500 600 700 800 (Trajectory of object 5) obj5 trajectory obj4 trajectory 250 350 300 200 250 150 200 obj4 trajectory obj5 trajectory 150 100 100 50 50 0 0 0 200 400 600 800 1000 1200 1400 1600 0 100 200 300 400 500 600 700 800 900 1000 The experiment (table 6) suggests that pairwise may still introduce remotely located but identical object. Therefore, a strong distance penalty is introduced for remote objects. Since cumulative density estimation is always ranged from 0 and 1, the distance penalty will set -1 to all objects, which is too far from current object. ⎧kde( x, xi ) where x i is the model for object i ⎩− 1 if distance of object is higher than threshold ωi (x ) = ⎨ Sample of distance limited table (see Table 7) Table 7 obj1 obj2 obj3 obj4 obj5 obj6 obj1 7.8 7.5 6.5 5.8 -1 -1 obj2 5.5 6.5 6.8 6.5 5.3 -1 obj3 -1 4.4 4.6 4.4 4.3 -1 obj4 -1 5.1 5.3 5.3 5.1 5.1 obj5 -1 -1 -1 -1 5.1 -1 obj7 -1 -1 -1 -1 -1 obj8 -1 -1 -1 -1 -1 4.2.3 KDE Pairwise Alignment Tracking in Cell Image KDE Pairwise Alignment algorithm shows potential in preserving cell migration information (see table 8). Compared with KDE Mean Shift algorithm, KDE Pairwise Alignment algorithm shows slight improvement when cells are dividing or escaping from screen. However, our implementation suggests that the runtime of KDE Pairwise Alignment algorithm is significantly higher than KDE Mean Shift tracking, which by average is nearly 40 times. It may be a major drawback of this approach. - 34 - Table 8 First Frame of Video (a) First Frame of Video (b) Trajectories of (a) KDE Mean Shift Trajectories of (b) KDE Mean Shift Trajectories of (a) KDE Pairwise Trajectories of (b) KDE Pairwise - 35 - 4.2.4 Future Development Pairwise alignment algorithm relies on distance matrixes (mutation table). To find such score system is nontrivial in video data. Popular mutation table for gene matching such as BLOSUM62 is calculated based is on possibility of each nucleotide base appearing in two manually picked sequences with 62% of similarity. However, mutation in video is highly dynamic. Such statistic score requires an iterative calculation based on each alignment. Yet, recursively calculation of mutation table increases computation costs. Our study suggests that even without considering multiple alignment, the average tracking time of pairwise alignment is nearly 40 times of KDE mean shift. It may be the major disadvantage of pairwise alignment. Further study of score system will be the major direction of pairwise alignment tracking. Regardless of its immaturity, pairwise alignment provides some advantages compared with other tracking algorithm. Pairwise alignment tracking makes it possible for one object to have multiple alignments. It provides the potential of capturing important cell activity such as cell differentiation. Tracking of multiple predecessors are very important in cell migration study. Current tracking algorithms, which relying on statistical similarity between alignment objects, show less robustness to such problem. Algorithm such as KDE mean shift has no implicit possibility of performing alignment of multiple predecessors pairwise alignment tracking may be a breakthrough in this aspect. - 36 - 4.3 Overlap Ratio Based Tracking Cell accomplishes its motion by elongating and shrinking its lamellipodium and cytoskeleton structure, which is also known as amoebic motion model. Such elongation guarantees that partial of cell body is always fixed. Meaning an overlap between objects in subsequential images can always be found. Overlap ratio tracking algorithm calculates the overlap ratio between two cells in subsequential frames. Thus, alignments can be made based on highest overlap ratio. The time resolution is a major fraud in overlap-based tracking. If time resolution is too low, missing trajectory or early termination will occur since no overlap can be determined. Therefore, a threshold time resolution must be defined and video must at least provide logical connection. Meaning the time interval must be small enough to provide correct overlap. In order to provide error-tolerance in less sensitive situation, a color-histogram based similarity measurement is introduced. 4.3.1 Overlap Ratio Based Tracking Overlap ratio can be measured by following formula: Given two objects in adjacent frames, obj1 and obj2 OverlapRatio(obj1, obj 2) = count ( pixel ∈ (mask obj1 I mask obj 2 )) count (mask obj1 ) However, if a small object is a subset of a large object in subsequential frame, wrong trajectory is frequently made between a small object and a large object. In case of objects with different size, uncorrected overlap ratio may have the following bias: Obj1 Large Small Obj2 Small Large Overlap Low High Therefore, we introduce a reverse-correction of overlap measurement. The idea of reversecorrection is to avoid alignment between large object and small object by requiring overlap information from both objects. The corrected overlap ratio is defined as the average of overlap ratio and reversed overlap ratio. Re verseOverlapRatio(obj 2, obj1) = OverlapRatioCorrected = count ( pixel ∈ (mask obj1 I mask obj 2 )) count (mask obj 2 ) OverlapRatio(obj1, obj 2) + Re verseOverlapRatio(obj 2, obj1) 2 A 100% overlap ratio may occur when the mask of obj1 is a subset of obj2, which may have no true connection in video. The reverse overlap ratio may only be 10%. The corrected overlap ratio is 55%. True alignment will only occur when two overlap ratios are high. If overlap ratio is 80% for obj1 and obj2 while reverse overlap ratio is 70%, the corrected overlap ratio will be 75%. The corrected overlap ratio significantly reduces the chance of subset-alignment. 4.3.2 Overlap Ratio Based Tracking in Cell Image The table 9 shows two samples of cell tracking in (a) video with low cell density and high magnification level (b) video with high cell density and low magnification level. Trajectories of two samples are approaching to 85% of tracking accuracy according to specialist visual confirmation. - 37 - Table 9 First Frame of Video (a) First Frame of Video (b) Last Frame of Video (a) Last Frame of Video (b) Trajectories of (a) Trajectories of (b) - 38 - - 39 - 4.4 Parametric Motion Model Tracking The Parametric Motion Model Tracking assumes that motion of object will result in rigid transform, which can be described by affine transformation. It is nearly true for all rigid objects with no detail motion behaviors or detail structure changing[2]. The Parametric Motion Model Tracking presumes linear motion models such as affine transformation. The motion model in this tracking algorithm does not describe the trajectory but the transformation of objects in 2D projection of 3D world. This approach requires high quality image with clear boundary of object. Normally, the tracking region is defined by rectangle or circle. Irregular shape of interesting region is very rare since it causes computation difficulty in affine transformation. The major usage of this approach is to track solid object with no detail structure changes, such as car, football, missile, or airplane. It also prefers implicit linear motion model. If image changes are approximately affine, correspondence for alignment can be achieved by treating an image as a function of an affine transformation given by parameters a= (a0, a1, a2, a3, a4, a5) T, where ⎡a ⎤ ⎡ a f ( x, a ) = ⎢ 0 ⎥ + ⎢ 1 ⎣ a3 ⎦ ⎣a 4 a 2 ⎤ ⎡ x − xc ⎤ a 5 ⎥⎦ ⎢⎣ y − y c ⎥⎦ Parametric motion model tracking is proven insensitive to shape changes in cell image. Given the amoebic motion, cell motion cannot be described by rigid motion model. Cell structure change is more pronounced than rigid object, which cannot be measured by rigid transform. Moreover, most parametric motion model presumes either a linear or quadratic motion model, which is also not guaranteed in cell image. However, it remains useful for cells with less pronounced shape changes and no self-controlled motility behavior. The blood cell is a typical cell type that can be tracked by parametric motion model. Since all four tracking principles have been introduced, we will start to compare the performance of each algorithm in next section, meaning tracking bias. However, only detail comparison of KDE Mean Shift tracking and Overlap Ratio tracking will be given. The idea of ruling out parametric motion model and KDE Pairwise Alignment tracking is due to their limitation and immaturity by principle. Estimation of tracking bias in real video contains certain level of bias, which is mainly due to random cell migration and error in manual observation. In addition, real cells are difficult to control and do not always correctly react towards treatments. In order to estimate tracking bias objectively, an artificial object test is employed in our research. In next section, we will give a detail introduction of this artificial object test. - 40 - 4.5 Artificial Object Test The error estimation in tracking algorithm is non-trivial and it is done mostly by specialist manual observation. It only provides abstract and individual impression of the tracking performance. No statistical estimation of error can be obtained. In addition, cell experiments may result in certain level of random bias. In order to estimate the performance of tracking algorithms, a controlled test is designed. Artificial object test creates videos of objects, which simulating basic cell migration, given userdefined parameter settings. These settings allow user to create video of artificial cells with nearly optimal or controllable behaviors. In addition, automatic error estimation can be done since true trajectories of artificial cells are known in advance. The error estimation of artificial object test is the decisive factors of our later tracking algorithm selection. The artificial object test consists of two parts. (1) Artificial cells are generated by the core program named virtual cell factory. Artificial video will be generated based on self-managed artificial cells. (2) Trajectories found by tracking algorithms will be compared with true trajectories. During this chapter, we will give detail explanation of our artificial object test. 4.5.1 Virtual Cell Factory Our version of artificial object test, named virtual cell factory, is designed to simulate two major behaviors of cell migration, position shifts and shape changes. Parameter settings are defined to create a user-controlled experiment condition. Virtual cells will be created given initial parameter settings. After being created, virtual cells will be self-managed based on their internal behavior simulators, thus, movie can be made from these artificial cells. (see figure 21) The following parameter settings are defined in our version of implementation. 1. Travel distance (maximum and minimum) 2. X-axis extension (maximum and minimum) 3. Y-axis extension (maximum and minimum) 4. Initial size (x-axis length and y-axis length) 5. Number of Cells 6. Uniformly distributing cells 7. Perfect Connectivity - 41 - Figure 21 Travel Distance Travel distance controls how quick artificial cell shifts position. It is used to simulate the motility ability of cell. Since the relation between lamellipodium extension and cell velocity is unknown, the settings of velocity and shape are given separately. X-axis Extension and Y-axis Extension X-axis extension and Y-axis extension control how artificial cell extends in both directions. It simulates lamellipodium extension of cells. Initial Size The initial size of artificial cell is used to simulate microscopy magnification level. High magnification level often results in larger cell size. Number of Cells Studying video with few cells has no statistical significance. Therefore, the artificial cell test must be able to create video with high cell density. User is allowed to create 200 cells at maximum but the initial size must be properly chosen. Extremely high cell density will disable motility behavior due to programming limitation. Uniformly Distributing Cells This setting allows user to prevent initial masking error. All cells are evenly located in the image with maximum distance and nearly zero overlapping. The chance of connected cells is nearly - 42 - impossible expect extremely high density of cells. If not set to true, cells are randomly located in image. Yet, in either even or random distribution, cell will not touch the border of screen. Prefect Connectivity Uniformly distributing cells may not prevent cells from later overlapping. After long-term moving, it is highly possible that cell trajectory will cross each other. Cells are completely isolated from others during entire video. However, similar to uniformly distributing cells option, this option cannot work with extremely high density of cells. Extremely high cell density will prohibit motility ability. Due to programming complicity, rotation and extension by angle is not allowed in current setting. Amoebic motion model does suggest that cell motion is completed by extending lamellipodium. However, it does not explicitly suggest that the motion direction is on the same axis of lamellipodium extension. Still, artificial cell test is designed to test tracking algorithm in an optimal situation. Introducing further complicity may not be necessary. The artificial cells created by the artificial cell factory have the following traits: 1. Artificial cell is given a numeric range for its elongation behavior 2. Elongation of artificial cell will be only in one direction 3. Velocity of artificial cell has a predefined linear correlation 4. No other cell activity will appear, such as division 5. Artificial cells will not suddenly appear or disappear, they remains in the screen 6. Cells will not crash into each other and will keep distance from each other 7. Cells can be evenly distributed in the image Artificial cell test allows us to create controlled cell behaviors to determine which factors will result in tracking bias. The images show two visualized trajectories of performing tracking algorithm on artificial videos (See Figure 22 and Figure 23). 49 artificial cells 9 artificial cells 49 artificial cells trajectory 9 artificial cells trajectory - 43 - Figure 22 Figure 23 In later experiment, we will use virtual cell factory to create iterative experiment with designed sets parameter settings. The idea is to find out, numerically, the critical factors that directly resulting in tracking bias. We will introduce the error estimation method in next section. - 44 - 4.5.2 Error Estimation in Artificial Object Test To compare the performance of each tracking algorithm, we must first define the error estimation for tracking biases. A mean-error based score system is designed in our experiments. The score is calculated based on the average percentage of correct tracked cells in each path times the ratio between the number of true paths S t → counting (true) and the number of founded paths S f → counting ( founded ) . The path ratio is given: PathRatio = Sf St The error of each tracked path (Tracked Percentage) is given: Tracked % = N ( tracked cell in true path ) N ( cell in true path ) The final score of tracking performance is given: score = PathRatio ⋅ Tracked % To obtain a high score, the tracking algorithm must return trajectories that fulfill the following conditions: • Finding all possible trajectories (path) of objects • Finding all correct alignments for each paths Calculation Example: We have a video has a true path of 20 objects and length of 30 images. The tracking algorithm is performed on this video. For each tracked path, the average error rate is 15%. Therefore, the correct tracked percentage is 85%. Only 18 true paths are found by tracking algorithm. Thus, the final score is calculated: score = 85% ⋅ 18 = 76.5% 20 The following table is a sample of score for each tracking algorithm. In section 4.5.3, a serial of iterative perturbation experiments will be conducted. We expect these experiments to providing not only the robustness estimation for each tracking algorithm, but also finding the decisive factors of successful tracking. Tracking Algorithm KDE Mean Shift Tracking Overlap Tracking Pairwise Tracking Standard Mean Error of 160 tests 85% 76% 52% When artificial object videos are created, tracking is performed independently on videos. The trajectories of tracking algorithms will be compared with the true trajectories recorded in advance. With the help of artificial object test, we can not only compare the tracking performance of each algorithm, but also perform a deep study of the factors that may cause tracking biases. In next section, we will demonstrate a perturbation study of tracking algorithm using artificial object test. - 45 - 4.5.3 Perturbation Study in Artificial Object Test 16 groups with 10 iterative tests for each are used to estimate the performance of three algorithms. The purpose of repeated test is to provide objective error estimation. The settings for groups are manually set and paired in order to perform the later perturbation test. 20 tests will be given to one set of perturbation setting (see Appendix III-1). The mean error of 160 tests is given as follow. For detail data, please referring to datasheet. Tracking Algorithm Standard Mean Error KDE Mean Shift Tracking 85% Overlap Tracking 76% Pairwise Tracking 52% The overall perform suggests that KDE mean shift tracking has the highest correct score while pairwise tracking is the lowest. KDE mean shift tracking and overlap tracking have similar performance but detail test suggests that overlap tracking relies on optimal situation while the KDE mean shift tracking does not. Moreover, due to the immaturity of pairwise tracking algorithm, it will not be involved in perturbation analysis section. As discussed in the very beginning of this report, several factors will causes the tracking algorithm to fail. These factors are involved in the following perturbation analysis. Several factors are believed causing more error in tracking algorithm. These factors include: 1. motion speed 2. cell density 3. cell remote 2 4. cell separation Experiment protocols are given as following: • 20 tests are equally divided into two groups. One group is given cell speed from 5 to 10 pixels. Another is given cell speed from 10 to 20 pixels. • 20 tests are equally divided into two groups. One group is given 5 cells in total. Another is given 20 cells in total. • 20 tests are equally divided into two groups. Cells in one group are randomly located. Cells in another group are located remotely. • 20 tests are equally divided into two groups. Cells in one group are randomly separated. Cells in another group are located separately. Each artificial cell is given following common setting. • Cells are given the same initial size (width 45 and height 30) • The total length of video is identical (30 frames) 2 The cell separation is not the cell differentiation by biological definition. We define the cell separation as the masking condition of cell connectivity. It refers to the overlap ratio between two non-alignment objects. - 46 - Two parameter settings subjected to target factor 10 iterative tracking based on parameter setting 1 10 iterative tracking based on parameter setting 2 Error estimation based on true trajectories Error estimation based on true trajectories Does tracking error significantly increase? false This factor is not decisive true This factor is decisive Figure 24 For detail of tracking error, please refer Appendix III-1: Artificial Object Test and Perturbation Test. Based on tracking results, we determine the following decisive factors that affecting tracking accuracy. Motion Speed High motion speed causes incorrect tracking. The differentiation ratio for both tracking algorithms is around 6% (see Table 10). Overlap tracking is less affected by motion speed, yet global performance of overlap tracking is still lower than KDE mean shift. Time resolution is a manifestation of motion speed setting. An extremely high motion speed may also be the result of a low time resolution. The problem of time resolution will be introduced in section 4.7 Time Resolution Study. Table 10 t(5/10) t(10/20) Diff % KDE 88.45 82.685 6.5178 Overlap 78.232 73.784 5.6857 Cell Density The cell density has a stable effect on tracking algorithm. Both tracking algorithms show that error is increased by 13% (see Table 11). More cells means higher chance of identity and worse connectivity condition, which causes both density based algorithm and non-density based algorithm to fail. However, the differentiation ratio suggests that overlap tracking is less affected by cell density. In addition, compared with motion speed, cell density has more affect on tracking. - 47 - Table 11 KDE Overlap c(5) 91.06 81.394 c(20) 78.632 70.465 Diff % 13.6481 13.4273 Cell Remote The initial separation of cell has a critical role in KDE mean shift algorithm. Accuracy of KDE mean shift will significantly drop due to bad initial cell location (See Table 12). The overlap tracking is less affected by the cell remote. Therefore, a correct initial mask is critical for a successful tracking. Table 12 r(true) r(false) Diff % KDE 91.936 76.718 16.55282 Overlap 80.556 71.461 11.29028 Cell Separation The later separation of cells during entire motion is a critical factor for overlap ratio tracking. Compared with initial separation (cell remote), both tracking algorithms are less affected by later separation (see Table 13). Yet, our further research on the combination of cell remote and cell separation reveals that the collaborated affect of both conditions will be a decisive factor (see Table 14). Table 13 S(true) s(false) Diff % KDE 87.874 82.031 6.649293 Overlap 79.831 72.186 9.57648 Combination of Cell Remote and Cell Separation Combination study of both cell remote and cell separation suggests that in optimal situation while cells are initially separated and later not connected. KDE mean shift tracking will provides accuracy around 96% and overlap ratio tracking will provides accuracy around 83% (see Table 14). Table 14 r(t) 3 &s(t) 4 r(t)&s(f) r(f)&s(f) r(f)&s(t) KDE 96.449 89.082 74.217 79.637 Overlap 83.598 77.515 66.856 76.066 Based on the general performance and perturbation test, we believe that KDE Mean Shift tracking is indeed a robust tracking algorithm in cell tracking. Our next step is to define several morphology and motility estimations for cell migration study. The definition of feature measurements will be discussed in next section. 3 4 Abbr. cell remote is set to true Abbr. cell separation is set to true - 48 - 4.6 Measurements in Cell Migration Video With the help of tracking algorithm, we can now perform statistic measurement on cell migration. Definitions of several statistic measurements (features) will be introduced in this section. These features are employed to preserve motility and morphology changes of cell migration. The current features used in our research are divided into two categories: 1. Static features 2. Continuous features Static features are the statistic measurements being extracted from each cell in each static frame. These features include area, perimeter, coordination of mass center, etc. The reason of such naming is due to their limitations of representing correlation between subsequential frames. Information related to motility or continuous morphology cannot be directly observed from static features, instead continuous features are introduced. Continuous features are the statistic changes extracted from static features. These features include velocity, motion linearity, etc. They are specially designed for preserving motility and morphology changes during cell migration. Since both static feature and continuous feature provide valuable information on cell migration, data analysis will be performed on both raw and generalized features. 4.6.1 Static Features The Static Features are the measurements result extract directly from static frames. The measurements are independent and no correlations are set for subsequential frames. The Static Features defined include: Raw Feature Description Area The size of cell Perimeter The perimeter of cell Center of Mass The mass center of the binary mask of cell Extension Extension measures how much the shape differs from the circle Dispersion Dispersion is the minimum extension that can be attained by uniform compression of the shape Elongation Elongation measures how much the shape must be compressed along its long axis in order to minimize the extension Circle It is the ratio of the area of the shape to the area of a circle (the most Compactness compact shape) having the same perimeter The mean of Static Features preserve global information such as the average size of cells, or the average elongation of cells. However, the static features do not preserve information about diversification of cell migration. Area The area of cell is measured based on the total number of pixels in the binary mask of cell. The area measurement of cell is identical to the size of cell. Frequently, area is also referred as zero order moment, which is given by sy −1 sx −1 μ pq = ∑∑ (x − x ) p ( y − y )q F (x, y ) ,where p=0 and q=0 0 0 By simplification, the area of binary mask is given by μ 00 sy −1 sx −1 = ∑∑ F (x, y ) 0 - 49 - 0 The F(x,y) is binary function that F(x,y)=1 if pixel at (x,y) belongs to object or F(x,y)=0. x is the x coordinate of mass center and y is the y coordinate of mass center. The F(x,y) can also be the intensity function, which will result in a intensity-weighted calculation. It may be useful when intensity value of cell contain a biological translation. Perimeter Perimeter of cell is calculated from the length of edge of the binary mask. With different choice of connectivity, the result of perimeter may different from each other. Normally, perimeter obtained from 4-connected is longer than from 8-connected. In our version, we will use 8-connected. Center of Mass By the definition of computer vision, the center of mass is the position given by: sy −1 sx −1 μ pq = ∑∑ (x − x ) p ( y − y )q F (x, y ) 0 0 μ10 μ 00 μ MC _ Y = 01 μ 00 The x-coordinate of mass center is MC _ X = F(x, y) = b (x, y) binary mask The y-coordinate of mass center is F(x, y) = b (x, y) binary mask We here assume that the mass center in binary mask is overlapping with the physical mass center of cells. Thus, the center of mass will be used as the reference point for distance and motion measurements. Extension Extension is a moment invariant measurement feature. Extension measures how much the shape differs from the circle. The calculation of extension is based on normalized moment of mask. sy −1 sx −1 μ pq = ∑∑ (x − x ) p ( y − y )q F (x, y ) 0 0 Normalized Moment η pq = μ pq p+q ,r = +1 r 2 μ 00 φ1 = η 02 + η 20 , φ 2 = (η 02 − η 20 )2 + 4η112 ( ) ( λ1 = 2π φ1 + φ 2 , λ 2 = 2π φ1 − φ 2 ) The extension is defined by E = ln λ1 The major advantage of moment invariant is its native scale-free behavior. Regardless of how microscopy magnification will change, the moment invariant remains consistent unless affected by the mask detail revealed by higher resolution. Dispersion Dispersion is a moment invariant measurement feature. Dispersion is the minimum extension that can be attained by uniform compression of the shape. It is invariant to stretching, compressing or shearing the shape in any direction. The calculation of dispersion is based on normalized moment of mask. - 50 - sy −1 sx −1 μ pq = ∑∑ (x − x ) p ( y − y )q F (x, y ) 0 0 Normalized Moment η pq = μ pq p+q ,r = +1 r 2 μ 00 φ1 = η 02 + η 20 , φ 2 = (η 02 − η 20 )2 + 4η112 ( ) ( λ1 = 2π φ1 + φ 2 , λ 2 = 2π φ1 − φ 2 The extension is defined by D = ) 1 ln λ1λ2 2 The major advantage of moment invariant is its native scale-free behavior. Regardless of how microscopy magnification will change, the moment invariant remains consistent unless affected by the mask detail revealed by higher resolution. Elongation Elongation is a moment invariant measurement feature. Elongation measures how much the shape must be compressed along its long axis in order to minimize the extension. The calculation of elongation is based on normalized moment of mask. sy −1 sx −1 μ pq = ∑∑ (x − x ) p ( y − y )q F (x, y ) 0 0 Normalized Moment η pq = μ pq p+q ,r = +1 r 2 μ 00 φ1 = η 02 + η 20 , φ 2 = (η 02 − η 20 )2 + 4η112 ( ) ( λ1 = 2π φ1 + φ 2 , λ 2 = 2π φ1 − φ 2 The extension is defined by L = ) 1 λ1 ln 2 λ2 The major advantage of moment invariant is its native scale-free behavior. Regardless of how microscopy magnification will change, the moment invariant remains consistent. The figure 25 shows samples of extension, dispersion, and elongation measurements for different typical shapes of cells. - 51 - Figure 25 Circle Compactness The circle compactness or shape factor[16] is a common shape measure. It is the ratio of the area of the shape to the area of a circle (the most compact shape) having the same perimeter. Circle compactness is calculated given formula compactness = Perimeter 2 .For a circle, the ratio is 4πArea one; for a square, it is π / 4; for an infinitely long and narrow shape, it is zero (see figure 26). Figure 26 4.6.2 Continuous Features The continuous features are features preserving trends of changing or correlation. The current continuous features include several important change rates and motion speed (see table 15). - 52 - Table 15 Raw Feature Velocity Absolute Position Shift Trajectory Length Motion Linearity Description The motion speed of cells The position shift between the first frame and last frame The total travel distance based on the mass center of cell in all frames It measures the linearity of motion trajectory Velocity By physical definition, velocity is defined as the rate of change of position. It is given by v = ΔS . Δt The position shifting ΔS is normally calculated by Euclidian distance between two mass centers (reference point) of the same object in subsequential frames. The velocity of cell in figure 24 is calculated as v AB = 2131.32 2026.89 , v BC = . 30 30 Absolute Position Shift The absolution position shift of cells is used to describe the position difference between first frame and last frame. Such position difference is calculated based on the mass center of each cell in the first frame and last frame. The figure 23 shows a sample of U-shape motion trajectory of one cell. The absolution position shift is 1751.55 in figure 27. Trajectory Length The trajectory length is defined as the total travel distance of cells. The distance of travel is calculated based on the mass center of cells in subsequential frames. The trajectory length of cell in figure 19 is S = 2131.32 + 2026.89 . Motion Linearity Motion linearity is widely used but not officially defined. Our definition of motion linearity is given as the ratio between absolute position shift and trajectory length: Linearity = S AC , with given motion trajectory (see figure 28) S AB + S BC For long-term running, global linearity may lose important detail or periodic trends. Special case such as U-type or O-type motion may result in low linearity, which is in fact considered linear. Therefore, for long-term measurement, window based calculation of motion linearity is employed in our version of implementation. We use the U-type motion shown above to explain how window based calculation is performed (see figure 27 and table 16). Table 16 Artificial Distance Measurement SEQ# Distance 1 0 2 1233.74 3 1010.58 4 1048.50 5 1032.42 Window based calculation of global motion linearity with a window-size of three: Linearity13 = 2113.70 = 0.9418 1233.74 + 1010.58 - 53 - 1964.83 = 0.9442 1048.5 + 1032.42 Linearity13 + Linearity 35 Linearity global = = 0.943 2 Linearity 35 = Standard calculation of global motion linearity: Linearity global = 1751.55 = 0.4049 1233.74 + 1010.58 + 1048.50 + 1032.42 A standard calculation of global motion linearity will be significantly smaller than window-based calculation of global motion linearity. Based on the discussion with biological specialists, a window-based calculation is preferred. It is believed that window-based calculation gives more meaningful measurement than non-window-based calculation. 1751.55 5 2 1032.42 19 64 .83 1233.74 3 211 .70 1 4 8 .5 10 10 3 50 8. 4 10 Figure 27 C B 9 6.8 202 A 2 2131.3 Figure 28 - 54 - These features have been frequently employed in both computer vision and microscopy techniques[22][23]. Yet, study of these features in a time-lapse and continuous style remains atelic. Feature measurements in time-lapse video result in a time-based signal (see figure 29) while summarization approaches such as mean and standard deviation are used without considering the statistic representation of these approaches in frequency domain. In next section, we will give the biological translation of summarization approaches including mean, standard deviation, and frequency using tracking analysis results from MTLn3 cell. elongation 1.2 1 0.8 0.6 0.4 0.2 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 Figure 29 the elongation measurement of one cell in a 120-frames video 4.6.3 Summarization Approaches The mean and standard deviation are popular summarization approaches in statistics. However, both of them may only preserve partial trends of dataset. To compensate the shortage of mean and standard deviation in time-based problem, the frequency will be employed. We will first introduce the difference between these three approaches. By mathematic definitions: • The mean describes the central location of the data • The standard deviation describes the spread of data • The frequency measures the number of occurrences of a repeating event per unit time By biological translation: • A mean (or average) of area for one cell in a time-lapse video gives the true size of this cell • A standard deviation of area for one cell in a time-lapse video gives how significant it changes its size - 55 - • A frequency of area for one cell in a time-lapse video gives how quickly it changes its size 5 To explain the difference between mean, standard deviation, and frequency, we introduce a family of sine function as an optimal result from tracking analysis. The figure 30 and table 17 give a brief idea of how these three approaches responding to this family of sine function. This experiment shows to which factors that each of these approaches responding. Mean, Std, and Frequency 5 y=sin(2*pi.*t) y=2.*sin(2*pi.*t) y=sin(4*pi.*t) y=sin(2*pi.*t)+4 4 3 2 1 0 -1 -2 1 1.5 2 2.5 3 t (time) 3.5 4 4.5 5 Figure 30 Table 17 Function y = sin (2πT ) y = 2 sin (2πT ) y = sin (4πT ) y = sin (2πT ) + 4 Description A measurement result from a optimal control group A measurement result from a treated group with larger lamellipodium extension changes A measurement result from a treated group with high frequency of lamellipodium extension changes A measurement result from a treated group with longer lamellipodium in general mean 0 std 0.707 frequency 2 0 1.414 2 0 0.707 4 4 0.707 2 The function y = sin (2πT ) is the control signal while functions y = 2 sin (2πT ) , y = sin (4πT ) , and y = sin (2πT ) + 4 are the treated signals. • • The mean only responds to the value-shift (see y = sin (2πT ) + 4 ) The standard deviation only responds to the amplification level (see y = 2 sin (2πT ) ) 5 By our definition, a continuous or gradually shape changing such as shrinking or extending in amoebic motion will not be considered as significant shape change. However, a complete shrinkextend motion circle will be considered as significant shape change. - 56 - • The frequency only responding to frequency change (see y = sin (4πT ) ) It is notable that all three summarization approaches preserving important but limited information on cell migration. Thus, all three of them will be consulted in our research. Figure 31 shows the procedure of converting raw feature measurement into dataset, which can be processed by data analysis. Feature measurement for cell 1 in frame 1 Feature measurement for cell 1 in frame n Feature measurement for cell n in frame 1 Feature measurement for cell n in frame n Mean, Standard Deviation, and Frequency summarization for cell 1 for each feature Summarized dataset for large volume data analysis Mean, Standard Deviation, and Frequency summarization for cell n for each feature Figure 31 With the combination of tracking algorithm and feature measurements, we can now convert raw video into objective and meaningful numeric estimation of cell migration. In large volume data analysis, measurement of single object in each frame will not be considered. Instead, these features will be summarized by mean, standard deviation, and frequency for each object. Yet, due to masking bias and time resolution problem, measurements will always contain a certain level of bias. Even with extremely high time resolution, small oscillations in numeric estimation are difficult to avoid. In addition, error in masking will result in exceptions in feature measurement. These measurement errors will especially affect the estimation of frequency. However, it can be fixed by implementing noise suppression using moving average filtering. 4.6.4 Noise Suppression in Measurement Bias or noise in feature measurement is difficult to avoid but fixable. By considering time-lapse measurement as a signal, these oscillation (or noise) can be suppressed by mean filtering. The idea of mean filtering is to correct the current value based on its neighbor. The algorithm of mean filtering in discrete domain is defined as[16]: xi = k 1 ∑ xi + j where xi is a measurement point at time i, 2k+1 is the kernel size 2k + 1 j = − k - 57 - Plot of y1 and y2 1.5 y1=sin(2*pi*t) y2=sin(2*pi*t)+0.5*rand-0.25 1 0.5 0 -0.5 -1 -1.5 0 50 100 150 200 250 Figure 32 a noise suppression using moving average filter Plot of y1 and y3 1 y1=sin(2*pi*t) y3=conv2(y2,meanfilter) 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0 50 100 150 200 250 Figure 33 Figure 32 shows a sample of noise suppression using moving average. An artificial noisy sinesignal y2 (blue line in figure 32) is created from adding a normally distributed noise ranging from 0.25 to 0.25 to a normal sine-signal y1 (red line figure 32). The sine-signal y3 (blue line in figure - 58 - 33) is the result of noise suppressed y2 using moving average. Compared with noisy signal y2, suppressed signal y3 is more likely to preserve the original trends of y1. This experiment is repeated by 1,000 times and the average of the summarization estimation of three signals is given in table 18. Table 18 signal sample rate 1000 Hz mean Std freq Y1 0.000 0.707 0.020 Y2 0.000 0.722 0.639 Y3 0.001 0.695 0.114 Table 18 shows that a normally distributed noise will not have significant effect on mean and standard deviation estimation. However, it significantly increased the frequency estimation. Such error can be significantly reduced by mean filtering. However, we also notice that noise suppression is significantly effected by sample rate of signal. It will only work properly when signal is oversampled. An undersampled signal will cause a notable error in mean and standard deviation estimation after noise suppression (see table 19 and table 20). Table 19 signal sample rate 200 Hz Mean std freq Y1 0.000 0.707 0.098 Y2 -0.004 0.712 0.341 Y3 -0.005 0.680 0.098 Table 20 signal sample rate 20 Hz Mean std freq Y1 0.000 0.000 0.600 Y2 -0.116 0.144 0.200 Y3 -0.082 0.036 0.400 It is obvious that when sample rate is reducing (table 18 at 1000Hz, table 19 at 200Hz and table 20 at 20Hz), the three estimations of original signal y1 start to gain more bias. The time-resolution (or sample rate) problem will be further discussed in section 4.7. - 59 - 4.7 Measurement Bias in Time Resolution The definition of time resolution may vary from different aspects. We define time resolution as the number of frame taken per second for each well in plate. In this section, we focus on measurement bias caused by time resolution. Figure 34 Time resolution is a critical imaging condition to time-lapse tracking problem. It is a special type of sampling problem with similar trait of sound sampling (see figure 34). In order to study the affect of sampling rate, a time-lapse video of MTLn3 cell with extreme-high time resolution is created for subsampling. In this section, we will introduce an experiment of estimating measurement bias using the same video with different time resolution. - 60 - 4.7.1 Time Resolution Experiment To study bias of feature measurements with different time resolution, a video of extreme-high time resolution is created. The ultimate video (mtln3-gfp-paxillin+hgf+egf_30) is a sequence of 120 images with approximate 15 cells taken at 30 seconds per frame per well. Subsampling is employed to create data with artificially reduced sample rate. The idea of subsampling is to extract a subset of frames from original sequence. Our rule of extracting subset is described in figure 35: false j=j+1 1 2 3 Divided original frames into n subgroups with m frames 4 1 2 3 4 1 2 3 4 Extract one frame from sub-group i at position j 1 2 3 4 1 2 3 False i=i+1 4 All sub-groups are visited 1 2 3 4 1 2 3 Bias estimation 4 true All positions in sub-group are visited true Mean bias estimations is calculated 1 2 3 4 Tracking is performed on subsampled video Figure 35 Pseudo Code: For time-resolution h times lower than original Define a window size h for all frames m For each position j in subgroup For each subgroup i defined by window size h Extract frame at position j A subset of original frames is created Bias of measurements is estimated between full frames and subset of frames Average of bias for each position j in subgroup is calculated In our implementation, weighted correlation, which is a score system combining correlation coefficient and mean square error, is used as bias estimation. - 61 - 4.7.2 Weighted Correlation In our implementation, the weighted correlation is defined as the ratio between correlation coefficient and mean square error: wc = Correlation MSE The best-subsampled dataset should have high correlation and low mean square error compared with original data, meaning preserve maximum global trends and minimum numeric difference. Since the numeric scalar of correlation coefficient and MSE are different, both of them must be normalized to range from zero to one given: Value N = Value − Min Max − Min One exception in weighted correlation (WC) calculation is that when MSE is equal to zero, the WC score will be set to one since zero MSE means measurements from subsampled data has no numeric difference from original data. The advantage of using weighted correlation to avoid the limitation of either correlation coefficient or mean square error. The table 21 explains the essential of combining both criteria (see also Figure 36~39). Summary of Correlation and MSE combination Table 21 Correlation MSE Description High High Subsampled data tracking result may have same trends in changing but values are highly different from non-subsampled data. Considering only correlation may mislead conclusion. High Low Subsampled data tracking result is nearly identical to non-subsampled. This is the best situation since subsampled data consists of low bias. It is the major reason why both correlation and MSE are employed. Low High This is the worst case. Neither the subsampled data has same trends in changing, nor is value even close. It may occur when the subsampled data completely lost its original features. Such situation should be avoided. Low Low This is a special case. Subsampled tracking result may not have same trends in changing, but its value is quite close to non-subsampled data. It happens if non-subsampled tracking result is close to global mean of subsampled data. However, such situation may occur rarely since unlikely low MSE will lead into low correlation. - 62 - Correlation/MSE Criteria 9.00 8.00 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 -1.00 -2.00 1 2 3 4 5 Non-Sub 1.00 2.00 3.00 4.00 5.00 Sub1 1.4 1.6 3.4 3.6 5.4 Sub2 1.8 1.2 3.8 3.2 5.8 Sub3 2.2 0.8 4.2 2.8 6.2 Sub4 2.6 0.4 4.6 2.4 6.6 Sub5 3 0 5 2 7 Sub6 3.4 -0.4 5.4 1.6 7.4 Sub7 3.8 -0.8 5.8 1.2 7.8 Figure 36 Correlation/MSE Compare 1.2 1 0.8 0.6 0.4 0.2 0 wc normal Sub1 1 Sub2 Sub3 Sub4 Sub5 0.219325894 0.079730253 0.034069449 0.014730053 Sub6 Sub7 0.00520286 0 Figure 37 Figure 37 shows the weighted correlation corresponding to global data trends. - 63 - MSE (Shifted Test) 7.00 6.00 5.00 4.00 3.00 2.00 1.00 0.00 -1.00 Non-Sub 1 2 3 4 5 1.00 2.00 3.00 4.00 5.00 Sub1 1.4 2.4 3.4 4.4 5.4 Sub2 1.80 2.80 3.80 4.80 5.80 Sub3 2.2 3.2 4.2 5.2 6.2 Sub4 2.60 3.60 4.60 5.60 6.60 Sub5 0.6 1.6 2.6 3.6 4.6 Sub6 0.20 1.20 2.20 3.20 4.20 Sub7 -0.2 0.8 1.8 2.8 3.8 Sub8 -0.60 0.40 1.40 2.40 3.40 Figure 38 wc normal 1.2 1 0.8 0.6 0.4 0.2 0 wc normal 1 2 1 0.2 3 4 0.05185185 2.3685E-17 5 6 7 8 1 0.2 0.05185185 0 Figure 39 Figure 39 shows two peaks (sub1 and sub5) with the shortest shift distance from non-sub data. - 64 - 4.7.3 Sampling Rate The weighted correlation of dataset (mtln3-gfp-paxillin+hgf+egf_30) is given in figure 40. Correlation/MSE Estimation (0-120) 1.2 1 Correlation/MSE 0.8 0.6 0.4 0.2 0 rate8 rate10 rate12 rate14 rate16 test1 1 0.86842984 0.743869814 0.003690154 0 test2 1 0.680856654 0.577925956 0.063385182 0 test3 0.939061381 1 0.478411354 0.508095131 0 test4 0.96650698 1 0.508286344 0.076730249 0 Subsample Rate Figure 40 The diagram (figure 41) is the average weighted correlation from four videos. The rate (number) means the subsample rate. A rate8 curve is the weighted correlation measurement with subsample rate 1-to-8, meaning every one frame will be extracted from each eight frames. Observable from the curve, the weighted correlation will reduce while decreasing subsample rate, meaning bias will increase while time resolution reduces. It also implicitly suggests a linear correlation between subsample rate and weighted correlation. Therefore, the optimal threshold subsample rate will always be the highest subsample rate. The current data suggests 1/8~1/12 subsampling will be an optimal range of subsample rate. The diagram (figure 41) also consists of a 0-59 curve. This is the curve of bias estimation for a video with half-length of original video (60 frames in total). The same subsampling experiment is performed on the half-length video. As you can see, bias is significantly increased with high time resolution while insignificant increased with low time resolution. Our conclusion is that when time resolution drops below a threshold, the measurement of video data will be significantly undersampled. Such undersampling causes high error in reconstruction of original data, which also resulting in high bias. - 65 - Mean Correlation/MSE 1.2 1 Mean Correl/MSE 0.8 0.6 0.4 0.2 0 rate8 rate10 rate12 rate14 0-120 0.97639209 0.887321624 0.577123367 0.162975179 rate16 0 0-59 0.739105235 0.708234578 0.597895185 0.144162856 0.023665797 Subsample Rate Figure 41 In this section, we describe our experiment on estimate measurement bias due to time resolution. We demonstrate the reason why 6 min is chosen as the maximum time resolution for MTLn3 cells. Although it shows that 6 min is a clear threshold for unbias measurement, the threshold is celloriented. Meaning if a desired threshold needs being found in cell lines, the same experiment must always be performed. In chapter 4, we introduced several tracking algorithms and our implementation of feature measurements. We also demonstrate how time resolution affects features measurements (bias estimation). In next chapter, we will demonstrate a real application of our tracking analysis and high-throughput data analysis with siRNA screening of MTLn3. - 66 - 5 MTLn3 Cell Study In this chapter, we will first introduce and test the credibility of our high-throughput data analysis with real video of MTLn3 with three well-understood experiments. Then, the same analysis will be performed on video data of experimental siRNA screening without preknowledge of corresponding reaction, meaning double blind test. 5.1 High-Throughput Data Analysis The idea of high-throughput data analysis is to employ computational technologies to accelerate or fully automate the processing, quantification and analysis of large amounts of high-informationcontent biomedical imagery. Modern image analysis systems augment an observer's ability to make measurements from a large or complex set of images, by improving accuracy, objectivity, or speed. A well-developed analysis system may completely replace the specialist observation. The study of cell migration with tracking algorithm is classified under Phenotypic Research. A straightforward method to study behavior change is the differential ratio. It is calculated based on increasing or decreasing percentage in each feature between control samples and treated samples. The formula can be generalized as: diff = f (treat ) − f (ctrl ) Where f (treat) is a certain feature measurement of treated group and f (ctrl ) f (ctrl) is a certain feature measurement of control group However, differential ratio relying on global average of each feature only shows the numeric increasing or decreasing between control and treated samples. Since global average may not preserve the true trend, two other computational technologies being employed in our research. These two technologies include feature selection and statistical test. After a brief explanation, we will demonstrate the implementation of these two techniques and the reason why both of them must be consulted. 5.1.1 Feature Selection Feature selection is the technique of selecting a subset of relevant features for building robust learning models. When applied in biology domain, the technique is also called discriminative gene selection, which detects influential genes based on DNA microarray experiments. We will implement the same principle in cell behavior discrimination. We believe the behavior measurement providing largest variety is the significantly changed behavior between groups (see figure 42). plGFP mean velocity EGFEGF+ 4 2 0 -2 0 -2 montion Linearity -4 -1 0 1 2 3 freq extension Figure 42 - 67 - The most popular form of feature selection is stepwise regression. It is a greedy algorithm that adds the best feature (or deletes the worst feature) at each round. The main control issue is deciding when to stop the algorithm. In machine learning, this is typically done by cross validation. In statistics, some criteria are optimized. This leads to the inherent problem of nesting. Robust methods have been explored, such as “Branch & Bound” and “Piecewise Linear Networks”. Forward Selection The repeating of feature selection is to find the stable features. Forward searching with crossvalidation may result in random bias since training set is randomly picked by cross-validation procedure. Features may swap position in ranking even these features are the true influent features. The order of features found by feature selection does not possess important information in our study. Therefore, we need to find the features, which most frequently appearing in feature selection results. For detail of forward selection algorithm, please refer to [19] Backward Selection Backward selection is more stable compared with forward selection. The idea of backward selection is to remove the feature that having the least impact on criterion, meaning the feature has no significant improvement in variety in our case. Unlike forward selection, which intending to find combination of most influent features, backward selection intends to remove redundant features. This is also the reason why backward selection is much more stable. For detail of backward selection algorithm, please refer to [19] 5.1.2 Statistical Hypothesis Test Statistical hypothesis test or significant test is a method of making statistical decisions from and about experimental data. Null-hypothesis testing just answers the question of "how well the findings fit the possibility that chance factors alone might be responsible"[21]. This is done by asking and answering a hypothetical question. One use is deciding whether experimental results contain enough information to cast doubt on conventional wisdom. Two popular approaches are the Z-Test and T-Test. Z-Test The Z-test is a statistical test used in inference which determines if the difference between a sample mean and the population mean is large enough to be statistically significant, that is, if it is unlikely to have occurred by chance. The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample of test takers are within or outside of the standard performance of test takers. The Z-test requires the following to be known: • • • • σ μ (the standard deviation of the population) (the mean of the population) x (the mean of the sample) n (the size of the sample) In actuality, knowing the true σ of a population is unrealistic except for cases such as standardized testing in which the entire population is known. In cases where it is impossible to measure every member of a population, it is more realistic to use a t-test, which uses the standard error obtained from the sample along with the t-distribution. T-Test A t-test is any statistical hypothesis test in which the test statistic has a Student's t distribution if the null hypothesis is true. It is applied when sample sizes are small enough that using an assumption of normality and the associated z-test leads to incorrect inference. Among the most frequently used t tests are: - 68 - • A test of the null hypothesis is that the means of two normally distributed populations are equal. Given two data sets, each characterized by its mean, standard deviation and number of data points, we can use some kind of t test to determine whether the means are distinct, provided that the underlying distributions can be assumed to be normal. All such tests are usually called Student's t tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t test. There are different versions of the t test depending on whether the two samples are o independent of each other (e.g., individuals randomly assigned into two groups), or o paired, so that each member of one sample has a unique relationship with a particular member of the other sample (e.g., the same people measured before and after an intervention, or IQ test scores of a husband and wife). o If the calculated p-value is below the threshold chosen for statistical significance (usually the 0.05 level or 0.005 for high-restriction), then the null hypothesis that usually states that the two groups do not differ is rejected in favor of an alternative hypothesis, which typically states that the groups do differ. • A test of whether the mean of a normally distributed population has a value specified in a null hypothesis. • A test of whether the slope of a regression line differs significantly from zero. Once a t-value is determined, a P value can be found using a table of values from Student's tdistribution. We choose a P value of 0.05 due to the existing measurement bias. The t-value is calculated given: t= x−μ σ where x is the sample mean, μ is the population mean, and σ is the population n standard deviation In our experiment, we choose T-Test instead of Z-Test since we cannot normalize the data and we have no knowledge of the true standard deviation of population, which are both critical to ZTest. It leaves T-Test as a reasonable alternative in this problem. By studying both right-tail T-Test and left-tail T-Test, we are able to separate the feature into either significant increasing or decreasing. Appendix V: Table 26 is a sample of visualized T-Test result. The red cell indicates a significant increasing feature and a yellow cell indicates a significantly decreasing feature. We notice that instead of showing significant changing (both-tail T-Test), combining right-tail and left-tail results gives a more meaningful translation. 5.1.3 High-Throughput Data Analysis in Cell Migration Study Both feature selection and statistical T-Test are consulted since we looking for high variety and significant difference between two datasets. To use both techniques as criterions, we introduce a weighted average. For duplicated experiment with control datasets c of size m and treated datasets t of size n, the decisive score of feature x is ⎞ ⎞⎛ m n ⎛ m n ⎜ ∑∑ ttest (ci ( x ), t j ( x )) ⎟⎜ ∑∑ fsel (ci ( x ), t j ( x )) ⎟ ⎟ ⎟⎜ i =1 j =1 ⎜ i =1 j =1 score( x ) = ⎜ ⎟ ⎜ ⎟ m⋅n m⋅n ⎟⎟ ⎟⎟⎜⎜ ⎜⎜ ⎠ ⎠⎝ ⎝ The ‘ttest’ and ‘fsel’ are corresponding to T-Test and feature selection, which returning either one or zero. Thus, it converts our score system into a weighted percentage on the occurrence of features. - 69 - Function T-Test One Hypothesis is rejected, there is a significant difference between two datasets Feature is an informative feature, which providing high variety between two datasets Feature Selection Zero Hypothesis is not rejected, there is no significant difference between two datasets Feature is an redundant feature, which providing insufficient variety between two datasets The consideration of both criterions appears to be redundant since it is practically believed that high variety is identical to significant difference. It is true in experiment such as gene profiling. However, the biological translations of these two criterions in cell migration study are both practically and theoretically different from gene profiling. Employing both criterions provides an advantage in our study. The major reason is to capture two types of behavior changes. A type-one change indicates a significant value shift (see figure 43) and a type-two change indicates a significant distribution change (see figure 44). 1.5 1 y1=rand(200,1) y2=rand(200,1)+0.5 y1=rand(200,1) y2=0.5*rand(200,1)+0.25 0.9 0.8 0.7 1 0.6 0.5 0.4 0.5 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 180 0 200 Figure 43 0 20 40 60 80 100 120 140 160 180 200 Figure 44 Figure 45 indicates a real type-one change while elongation of cell has significant reduced due to siRNA4 downregulation (further experiment in section 5.3). Figure 46 indicates a real type-two change while the size of cells is more widely spreading in treated group, meaning various in the size is much larger in treated group. siRNA4 Mean Elongation 0.9 ctrl treated 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 Figure 45 - 70 - 180 siRNA4 Mean Area 900 ctrl treated 800 700 600 500 400 300 200 100 0 20 40 60 80 100 120 140 160 180 Figure 46 The combination of both feature selection and hypothesis test provides an accurate and reasonable biological translation of behavior measurement. In next two sections, 5.2 and 5.3, two quantitive experiments of cell migration study using our tracking analysis will be presented. - 71 - 5.2 MTLn3 Cell Controlled Experiment KDE tracking algorithm has been confirmed by artificial object test. In this section, its tracking ability will be tested in real experiment with well-documented cell reaction. Using well-designed MTLn3 siRNA screening, we intend to confirm the repetition and reliability of phenotypic research based on tracking analysis The MTLn3 cells, also known as the mammary adenocarcinoma primary tumor, are used as study subject of siRNA screening by toxicology section of LACDR. Two cell lines of original MTLn3 cells are used in our tracking analysis. The first cell line, known as plGFP cell line (see figure 47), is a cell line that both cytoplasm and nucleolus respond to GFP. The second cell line, known as GFPPaxillin cell line (see figure 48), is a cell line that only cytoplasm responds to GFP. plGFP cell lines GFP-Paxillin cell lines Figure 47 Figure 48 Tracking analysis will be applied on video data collected from both cell lines with several wellstudied mutation. The tracking study of MTLn3 cells is expected to confirm the following issues: 1. EGF 6 stimulation 2. S178A downregulation 3. EGFR 7 or Erb1 Upregulation Five cell groups are defined: Group plGFP GFP-EGFR GFP-Paxillin GFP-Paxillin S178A GFP-Paxillin S178A EGFR Description It has no transfection. It is also used as the control group of GFPEGFR. GFP-EGFR, which is also known as GFP-ErbB1, is a siRNA transfection upregulating ErB1. It has no mutation. It is also used as the control group of GFPPaxillin S178A and GFP-Paxillin S178A EGFR. The DNA sequence-S178A is downregulated. The DNA sequence-S178A is downregulated, but with EGFR upregulated. Experiment-1: EGF stimulation 6 Epidermal Growth Factor (EGF) is a growth factor that plays an important role in the regulation of cell growth, proliferation, and differentiation. 7 The epidermal growth factor receptor EGFR is the cell-surface receptor for members of the epidermal growth factor family (EGF-family) of extracellular protein ligands. - 72 - In order to confirm the effect of EGF stimulation, all five groups of cells are treated with EGF stimulation. A higher cell motility and rapid action are expected with EGF stimulation. Experiment-2: S178A downregulation To confirm the correlation between EGFR and S178A, the GFP-Paxillin S178A and the GFPPaxillin S178A EGFR groups have a S178A downregulated. The GFP-Paxillin S178A EGFR has both S178A downregulation and EGFR upregulation. Experiment-3: EGFR upregulation To confirm the effect of EGFR upregulation, GFP-EGFR group and GFP-Paxillin S178A EGFR group is treated with EGFR upregulation. It is believed that EGFR upregulation may result in uncontrollable cell division, which is a preposition of cancer. 5.2.1 Experiment Protocol To reduce cost, three experiments are combined into one plate for scanning. The well content and locations are given in figure 49 and table 22. When videos are taken from the control group, EGF stimulation will be added into the same plate and videos will be taken again. Dual positions are taken from each well, which resulting in a 3 minutes/frame movie in a total length of 90 minutes for control group and 120 minutes for EGF treatment (1.2ul per 2ml). Since the stimulation effect of EGF is stabilized after 60~120 minutes, there is no significant improvement for taking video longer than 120 minutes for EGF stimulation. Figure 49 Table 22 Videos are created in size of 512X512 pixels and tracking analysis is performed on all videos. 5.2.2 Experiment Conclusion Based on the tracking analysis result of video data (see table 23), we have the following conclusion: - 73 - • • • Compared with control samples, EGF treated samples shows strong expression on the motility ability and extension of lamellipodium. The stimulation effect is weak effect on GFP EGFR and GP S178A. The significant decrease in velocity of GP S178A EGFR is due to several rapid-moving noise, thus measurement cannot be trusted. (see table 23) Compared with GFP-Paxillin WT and GFP-Paxillin S178A, S178A downregulation (GFPPaxillin S178A) shows strong suppression of motility ability. EGFR upregulation (GFPPaxillin S178A EGFR) has a weak compensation effect on S178A downregulation, which is an evident of existing correlation between S178A and EGFR. (see table 24) Compared with plGFP and GFP-EGFR, EGFR upregulation (GFP-EGFR) shows strong expression on both motility ability and lamellipodium extension frequency. However, the effect on expression of lamellipodium is insignificant. (see table 25) Table 23 Control vs. EGF Stimulated Samples 'std 'mean 'area' extension' velocity' plGFP -0.177 -0.128 0.459 GFP EGFR -0.095 -0.064 0.107 'motionLinearity' 0.046 0.012 GP WT GP S178A EGFR -0.031 -0.094 0.201 0.011 0.235 0.283 -0.106 -0.023 GP S178A -0.003 0.086 0.188 0.045 freq area 0.040 0.108 0.026 0.268 0.100 freq extension 0.632 0.337 0.121 0.010 -0.034 Table 24 Differential ratio Table GFP-PAX WT vs. S178A-EGFR and S178A (Partial) GP WT ctrl Std mean freq freq mean Diff Ratio extension' velocity' area extension extension 'area' GFP Paxillin -0.420 -0.079 0.020 0.012 -0.324 S178A EGFR -0.354 GFP Paxillin -0.295 -0.215 -0.296 0.295 0.135 -0.091 S178A Std mean freq freq mean GP WT ctrl 'area' extension' velocity' area extension extension T-Test GFP Paxillin -1 0 0 0 -1 S178A EGFR -1 GFP Paxillin -1 -1 -1 1 1 0 S178A Table 25 plGFP vs. GFP EGFR (Differential ratio and T-Test) std std mean plGFP ctrl Diff Ratio 'area' extension' elongation' velocity' GFP EGFR -0.169 0.041 0.061 0.841 std std mean plGFP ctrl T-Test 'area' extension' elongation' velocity' GFP EGFR T-Test -1 0 0 1 at 0.05 - 74 - 5.3 Large Volume siRNA Screening using MTLn3 plGFP Large volume siRNA screening or siRNA transfection experiment is a designed experiment to reveal the function of targeted siRNA. Genetic microarray and cell microarray are two common approaches used in high-throughput siRNA transfection experiment. The major difference between genetic microarray and cell microarray is that genetic microarray intends to find genotypic correlations while cell microarray intends to find phenotypic differences. In current biology experiment, it is difficult to conduct large volume phenotypic analysis due to the limitation in analysis software. The phenotypic analysis is performed by either manual observation or quantitive image analysis [24]. Although considering as automatic approach, quantitive image analysis shows its limitation in motility estimation and morphology differentiation. In fact, quantitive image analysis is principally identical to genetic microarray, which only providing a relative abstract idea of measurements. In general, quantitive image analysis uses averaged measurement based on all cell in each frame. However, we can avoid such overall estimation by targeting each cell and looking directly into its motility and morphology estimation during the entire video. Such action can now be performed by our tracking analysis and large volume data analysis. Although both artificial object test and real experiment test show positive feedback on KDE based tracking analysis, measurement bias is nearly unavoidable in cross-sample experiments. One major reason is due to the rarity of repetition in biological experiment. Slightly change in experimental protocol may result in different cell phenotypic behavior. To assure the usability of tracking analysis in later siRNA screening, we design a small-scale siRNA screening experiment. The small-scale siRNA screening experiment is designed to demonstrate the applicability of our tracking analysis in large volume video data. Table 26 shows the list of siRNA transfection (downregulation) chosen in this experiment. The decision is made on current literatures while most siRNA have known influence in cell proliferation, differentiation and transformation. The experiment protocol is given in Appendix VI: Experiment Protocol of siRNA Screening. Table 26 label GFPsiRNA ctrlsiRNA#2 siRNA1 siRNA2 siRNA3 siRNA4 siRNA5 siRNA6 siRNA7 siRNA8 paxsiRNA target GFP siControl #2 EGFR LOC364403 PTK2 ILK FOSL1 VIL2 RDX LOC305679 PXN-PREDICTED notes SRC Protein tyrosine kinase 2, FAK VCL Paxillin Based on Appendix V, we can directly observe the following changes: • Downregulation of GFPsiRNA (GFP related) suppresses the speed that cell changes its shape. Lamellipodium extension is expressed and cell is larger than control group. Motion linearity is higher than control group. • Downregulation of siRNA1 (EGFR related) suppresses the extension of lamellipodium, but increases the size of cells while compared with control groups. - 75 - • • • • • • • • Downregulation of siRNA2 (SRC related) suppresses the extension of lamellipodium. It also decreases the speed that cell changes its shape. Cell motion is linear but motion velocity is lower than control group. Size of cell is larger than control group. Downregulation of siRNA3 (PTK2 related) suppresses the motion velocity and motion is non-linear. Shapes of cells are less smooth, which consists of small lamellipodium extension. Cells are larger than control group. Downregulation of siRNA4 (ILK related) suppresses the speed that cell changes its shape. Downregulation of siRNA5 (FOSL1 related) expresses the extension of lamellipodium, a higher speed that cell changes its shape can be observed. Cell shapes are more irregular and larger than control group. Downregulation of siRNA6 (Vilin2 related) suppresses the extension of lamellipodium. There is a significant decreasing of lamellipodium activity. Motion linearity of cell is higher than control group. Downregulation of siRNA7 (Radixin related) shows no significant effect on phenotypic behavior of cells. Downregulation of siRNA8 (Vinculin related) suppresses the extension of lamellipodium. Cell motion velocity is lower than control. Downregulation of paxsiRNA (Paxillin related) suppresses the extension of lamellipodium. There is a significant decreasing of lamellipodium activity. Motion linearity of cell is increased and cell is larger than control. Based on known literatures, we can confirm most of our observation from tracking analysis: • Mutations involving EGFR (siRNA1) could lead to its constant activation, which could result in uncontrolled cell division. Consequently, mutations of EGFR have been identified in several types of cancer, and it is the target of an expanding class of anticancer therapies. [27] • Mutations in SRC-protein (siRNA2) could be involved in the malignant progression of colon cancer. [28] • Activation of PTK2 (siRNA3) may be an important early step in cell growth and intracellular signal transduction pathways triggered in response to certain neural peptides or to cell interactions with the extracellular matrix. [26] • ILK (siRNA4) has been associated with multiple cellular functions including cell migration, cell proliferation, cell-adhesions, signal transduction. [29] • FOSL1 (siRNA5) has been implicated as regulators of cell proliferation, differentiation, and transformation. [30] • VIL2 (siRNA6) as a member of the ERM protein family, this protein serves as an intermediate between the plasma membrane and the actin cytoskeleton. It plays a key role in cell surface structure adhesion, migration, and organization. [33] • Radixin (siRNA7) is a cytoskeletal protein that may be important in linking actin to the plasma membrane. It is highly similar in sequence to both ezrin and moesin. [32] • Vinculin (siRNA8) is a membrane-cytoskeletal protein in focal adhesion plaques that is involved in linkage of integrin adhesion molecules to the actin cytoskeleton. [31] By both visual and literature confirmation from our small-scale siRNA screening, tracking analysis reveals and identify several important hypothesis. We also notice that compared with plGFP, the Erb1 cell line shows a consistently over-expressing and is not a good study subject for further screening. The 6-mintues rule is indeed cell line oriented and it is fulfilled by current siRNA screening. Tracking error is less than 20% on average. The total processing time is 12 hours for 48 movies with length of 120 frames and 100 cells per movie on averages. For 520 movies (the size of final screening (140 selected siRNAs), it takes less than 6 days while normally it would take at least one month to manually investigate the movies. The most important improvement is that we can now obtain a numeric estimation of cellular phenotypic behavior. - 76 - 6 Conclusion In this report, we involved four different tracking algorithms. Tracking performances of these algorithms are compared. KDE tracking is proven less sensitive to shape changes, time resolution and connectivity error. The error estimation of tracking suggests that KDE tracking is indeed a robust and sophisticated algorithm. The feedback from Toxicology Section, LACDR shows that duration of post-experiment analysis is reduced from one month to one week. The accuracy of data analysis is increased from 30% to 80%. Several important measurements, which cannot be observed manually, are captured by tracking analysis. During the study of tracking algorithm, we notice that time-resolution is playing a decisive role in tracking success and measurement bias. However, no universal rule has been found to determine the proper time-resolution. The controlled experiment in section 4.7 shows that timeresolution is highly experiment-oriented. Different cell lines may require a re-estimation of a proper time-resolution. In addition, time-resolution of experiment such as stimulation must be estimated based on the most excited culture instead of control culture. Our current rule of thumb for MTLn3 cell lines is to use movie with time interval less than 6 minutes per frame. Using trajectories obtained from KDE Mean Shift tracking, morphology and motility measurements can be made. Current measurements are proven accurate in estimating cell migration. Small controlled experiment such as section 5.2 shows that current measurements provide an accurate estimation of treatment such as EGF stimulation or S178A downregulation. In addition, cell behavior can be directly translated from numeric estimation. Significant behavior change can be easily determined from feature measurements. Large volume data analysis is proven useful for siRNA screening. Feature selection and hypothesis test are potential solutions to full automatic cell migration study in large volume video data. The section 5.3 shows that the combination of feature selection and hypothesis test provides rapid estimation of significant behavior changes. Current version of our automatic cell migration analysis, which is under evaluation by toxicology section, LADRC, has shown great potential in double blind large volume siRNA screening. By the feedbacks from toxicology section, our automatic approach has tremendously reduced the labor hours and the conclusion errors. The processing time of cell migration study in siRNA screening is reduced from one month to one week. Accuracy of conclusion is increased from 30% to 80%. In addition, most procedures are performed automatically without human interference. The tracking analysis with MTLn3 cell strongly gives us sufficient confidence to continue further research on molecular level tracking. The correlation between integrin kinase and cell migration will be our study subject in future. - 77 - 7 Discussion KDE Mean Shift tracking shows a reduced performance on later focal adhesion study. We believe that KDE Mean Shift is limited to only cellular level tracking. For molecular level tracking, a new algorithm must be implemented. Study of failure samples in each tracking algorithm shows that the following factors may be the potential reason of most error in tracking and measurement. High Similarity of Cells Most tracking algorithm relies on certain level of dissimilarity between non-alignment object and similarity between alignment object. However, such principle is not guaranteed in biological imagining, especially in microscopy imagining. The size, shape and intensity of cells are not unique and the chance of two identical cells can be extremely high when noise suppression techniques are used. The reason why KDE mean shift tracking survived is unclear, but it does provide a certain level of accurate tracking compared with other existing algorithms. Bias due to Time Resolution Sampling problem is quite common in time-based, especially time-lapse data. Since undersampling is practically unavoidable due to microscopy mechanism, we can only reduce the bias caused by improper time resolution. Our research shows that choice of time resolution is highly cell-oriented and experiment-oriented. There is no universal and automatic solution for time resolution choosing. Specialist interferences are required in finding a proper time resolution. Stochastic Structure Change Stochastic structure change problem is a combination of between time resolution and cell activities. Cell migration results in a highly deformation of shape and intensity distribution, together with low time resolution, it significant increases the dissimilarity between alignment cells and potentially forces the tracking algorithm to looking for less desire alignment. A high time resolution is a proper solution for this problem. Segmentation Limitation Illumination is the major problem in microscopy imagining. Uneven illumination and inconsistent intensity of objects are two fields that we have to correct before tracking. Uneven illumination can be corrected by background subtraction. Yet, it is not easy to correct inconsistent intensity of objects. Local adaptive threshold shows significant improvement for such problem. Still, certain level of operator guidance is required, such as deciding the neighborhood size. Automatic approach can only be applied to videos with following conditions: 1. Microscope flat field illumination 2. Cells have a consistent intensity spreading 3. Cells have a maximum separation between each other Limitation of Sampling Percentage in Each Well A common well contains over 50,000 cells while only 100~150 cells can be observed by microscopy. The sampling percentage of each well is relative low. For siRNA transfection experiment, only 60%~70% can be transfected while 60~70% of them are correctly transfected. Meaning for 50,000 cells, there will be at least 15,000 cells that are completely not transfected, 24,500 cells are correctly transfected, and 10,500 cells are incorrectly transfected. Based on these number, the chance that the observed cells (150 cells) falling in any one of the group is still quite high. An arbitrary solution is to increase the number of observed cells by reducing the magnification level. However, magnification level is limited by the basic principle of phenotypic research since cell structure must be observed. Meaning minimum magnification level is around 20X. Such limitation can be a potential problem for post-experiment analysis. Regarding all existing problems, tracking analysis does provide important morphology and motility behavior measurements. Current implementation of KDE mean shift tracking provides valuable information on cell migration of MTLn3. - 78 - There are several problems being partially addressed in our research. These problems will be our further research directions. Tracking Analysis for Focal Adhesion Kinase Showing outstanding performance in cell tracking, KDE Mean Shift has an initial failure in focal adhesion kinase tracking. The major reason is due to their relative small size and consistent appearing and disappearing. However, tracking focal adhesion kinase has several advantages that are not held by cell. Most motion in focal adhesion is nearly linear, which contrary to the random motion in cell migration. Although change its shape, the motion direction appears to be on the same direction of motion. Based on these two principles, we will implement particular filter tracking as a replacement for KDE Mean Shift in study of focal adhesion kinase. Correlation between Phenotype and Focal Adhesion Cytomics research will be the future of cell biology. Different from genomics and proteomics, cytomics is the study of cell systems at a single cell level. It combines all the bioinformatic knowledge to attempt to understand the molecular architecture and functionality of the cell system. Much of this is achieved by using molecular and microscopic techniques that allow the various components of a cell to be visualized as they interact in vivo. By combining the current phenotype with our potential result of focal adhesion kinase, we will be able to find a systematic way to reveal the foundation of cytomics. Multi-Descendent Tracking One important cell activity is cell differentiation. It strongly violates the basic principle of nearly all tracking algorithm since most tracking algorithm assuming there is only one true alignment. A multi-descendent tracking algorithm will provide important information concerning cell differentiation rate. Cell Activity Modeling Although widely agreed, amoebic motion model provides no correlation between cytomorphology and cellular motility. Yet, it does suggest an implicit binding between lamellipodium extension and cellular motility. We believe that such correlation does exist but more complicated than amoebic motion model. Moreover, it may be mathematically modeled by periodic function. Texture Based Cell Segmentation Texture based segmentation is widely used in computer vision. It has several advantages in determine homogeneous regions. Compared with region based, edge base, and region growth segmentation, texture based segmentation shows several potential in cell segmentation problem. However, the principle of application texture based segmentation in cell segmentation problem remains unclear. Obtaining initial module is a decisive factor in texture based cell segmentation. - 79 - Appendix I: Workflow of Video Data Analysis Biological Experiments and Time-Lapse Cell Migration Video Cell Tracking Background subtraction and flat field correction KDE Mean Shift Tracking Noise Suppression Mean Filter Artificial Object Test Gaussian Filter Quantitive Cell Video Test Image segmentation Region Based Segmentation Raw Feature Measurements Edge Based Segmentation Generalized Feature Measurements Region Growth Segmentation Mathematic binary morphology Erosion Overlap Ratio Tracking Pattern Recognition Dilation Feature Selection Statistical Test Clustering Discriminate and Significance Analysis Object Labeling - 80 - T-Test Appendix II: Index List of Feature Measurements Index Number 1 2 Feature Name Feature Description ‘area’ 'perimeter' 3 ‘std extension' 4 ‘std dispersion' 5 6 8 9 10 ‘std elongation' ‘mean velocity' ‘mean motionLinearity' ‘std compact factor' ‘freq area’ ‘freq perimeter’ 11 12 ‘freq extension’ ‘freq dispersion’ 13 14 15 16 ‘freq elongation’ ‘mean extension’ ‘mean dispersion’ ‘mean elongation’ The average size of cell in pixel The average perimeter of cell in pixel It measures how significant the cell changes its extension of lamellipodium It measures how significant the cell changes its small structure (cytoskeleton or lamellipodium) It measures how significant the cell changes its elongation of lamellipodium It measures how fast the cell moves in pixel It measures how consistent the motion is. A straightforward, Uturn or circular motion will also be considered as linear. It measures how significant the cell changes its shape while being compared with a circle It measures how fast the cell changes its size It measures how fast the cell changes its perimeter It measures how fast the cell changes its extension of lamellipodium It measures how fast the cell changes its small structures It measures how fast the cell changes its elongation of lamellipodium It measures how extending the cell is on average It measures how smooth the cell is on average It measures how elongating the cell is on average 7 For example: The ‘std elongation’ means the standard deviation of elongation in each frame for one cell. For whole experiment, the mean of ‘std elongation’ for all cells will be a quite representative estimation for the entire culture. Feature 1 and 2 describe the size of cells on average. Upregulation of siRNA such as EGFR may result in a increasing of these two features while compared with control groups. Feature 3, 4 and 5 describe how significant the cell changes its shape. Downregulation of S178A shows a sample of significant decreasing in shape compared with control group. Feature 9, 10, 11, 12, and 13 describe the activity of cytomorphology changes. Downregulation of S178A shows that cells has a less extended but quickly changing lamellipodium, which resulting in a low value in feature 3, 4, and 5 but high value in feature 9, 10, 11, 12, 13. - 81 - Appendix III: Artificial Object Test Data Sheet III-1 Artificial Object Test and Perturbation Experiment KDE Mean Shift Tracking minH:20 maxH:40 minW:30 maxW:60 H:30 W:45 Seq:30 Mint maxT cell remote Separated 5 10 5 FALSE FALSE 5 10 5 TRUE FALSE 10 20 5 FALSE FALSE 10 20 5 TRUE FALSE 5 10 20 FALSE FALSE 5 10 20 TRUE FALSE 10 20 20 FALSE FALSE 10 20 20 TRUE FALSE 5 10 5 FALSE TRUE 5 10 5 TRUE TRUE 10 20 5 FALSE TRUE 10 20 5 TRUE TRUE 5 10 20 FALSE TRUE 5 10 20 TRUE TRUE 10 20 20 FALSE TRUE 10 20 20 TRUE TRUE - 82 - Correct% 78.601 99.407 79.5527 96.001 73.78 86.518 64.933 74.403 89.4 100 86.255 99.266 74.084 97.057 68.81 89.474 Overlapping ratio Tracking minH:20 maxH:40 minW:30 maxW:60 H:30 W:45 Seq:30 minT maxT cell remote Separated 5 10 5 FALSE FALSE 5 10 5 TRUE FALSE 10 20 5 FALSE FALSE 10 20 5 TRUE FALSE 5 10 20 FALSE FALSE 5 10 20 TRUE FALSE 10 20 20 FALSE FALSE 10 20 20 TRUE FALSE 5 10 5 FALSE TRUE 5 10 5 TRUE TRUE 10 20 5 FALSE TRUE 10 20 5 TRUE TRUE 5 10 20 FALSE TRUE 5 10 20 TRUE TRUE 10 20 20 FALSE TRUE 10 20 20 TRUE TRUE - 83 - Correct% 76.687 87.353 77.892 83.642 61.08 81.069 51.766 57.995 79.097 77.914 89.179 80.653 72.226 90.433 63.76 85.39 Appendix IV: MTLn3 Cell Migration Data Sheet Table 27 Full Data Sheet (Video Data 27-Feb-2008) plGFP GFP EGFR GP WT 0.21 freq perimete r 0.21 freq extensio n 0.16 freq dispersi on 0.27 freq elongati on 0.16 0.22 0.26 0.26 0.32 0.26 'area' 'perimet er' 'extensio n' 'dispersi on' 'elongati on' 'velocity' 'motionLinearit y' 'compac t factor' freq area ctrl 589.17 112.54 0.14 0.01 0.14 0.01 0.89 0.14 egf 484.68 98.76 0.13 0.01 0.12 0.01 0.93 0.10 ctrl 494.01 104.41 0.17 0.01 0.17 0.02 0.91 0.18 0.21 0.24 0.22 0.27 0.22 egf 447.32 96.19 0.16 0.01 0.15 0.02 0.92 0.16 0.23 0.25 0.29 0.29 0.30 ctrl 812.92 132.30 0.13 0.03 0.12 0.01 0.85 0.17 0.23 0.25 0.25 0.26 0.23 egf 787.59 128.86 0.12 0.02 0.10 0.02 0.86 0.14 0.22 0.26 0.29 0.31 0.29 GP S178A EGFR ctrl 547.51 101.64 0.08 0.01 0.08 0.01 0.79 0.10 0.29 0.30 0.28 0.31 0.29 egf 676.02 119.72 0.11 0.02 0.10 0.01 0.77 0.14 0.21 0.22 0.28 0.26 0.27 GP S178A ctrl 573.91 107.98 0.10 0.01 0.10 0.01 0.75 0.09 0.27 0.29 0.28 0.31 0.26 egf 572.38 109.38 0.11 0.02 0.10 0.01 0.78 0.14 0.24 0.26 0.27 0.31 0.30 - 84 - Appendix V: Large Volume siRNA Screening Results Table 28 plGFP siRNA Screening movie plGFP CTRL GFPsi RNA ctrl siRNA 2 siRNA 1 siRNA 2 siRNA 3 paxsi RNA siRNA 8 siRNA 7 siRNA 6 siRNA 5 siRNA 4 'area' 'perim eter' std extens ion' std disper sion' std elong ation' mean velocit y' mean motio nLine arity' std comp act factor' freq area freq perim eter freq extens ion freq disper sion freq elong ation mean extens ion mean disper sion mean elong ation 407.92 86.43 0.231 0.025 0.221 0.036 0.830 0.217 0.463 0.473 0.497 0.497 0.497 0.383 0.018 0.365 463.00 93.42 0.232 0.025 0.223 0.038 0.857 0.227 0.423 0.429 0.457 0.464 0.458 0.422 0.020 0.402 428.24 88.77 0.211 0.019 0.204 0.033 0.840 0.194 0.421 0.435 0.456 0.470 0.456 0.377 0.016 0.361 471.25 92.98 0.206 0.022 0.197 0.027 0.822 0.191 0.482 0.485 0.485 0.499 0.486 0.365 0.017 0.348 450.38 91.01 0.204 0.019 0.196 0.030 0.849 0.185 0.434 0.453 0.451 0.484 0.453 0.375 0.015 0.360 449.49 92.11 0.222 0.026 0.213 0.026 0.816 0.223 0.465 0.476 0.495 0.494 0.498 0.390 0.020 0.370 460.34 92.08 0.206 0.019 0.198 0.030 0.847 0.185 0.422 0.432 0.460 0.477 0.465 0.394 0.015 0.379 425.69 88.58 0.218 0.022 0.208 0.031 0.814 0.195 0.478 0.485 0.501 0.515 0.502 0.373 0.017 0.357 426.63 88.96 0.219 0.026 0.210 0.031 0.805 0.219 0.473 0.489 0.503 0.519 0.508 0.395 0.019 0.376 446.33 91.17 0.206 0.019 0.198 0.034 0.849 0.193 0.409 0.408 0.417 0.425 0.420 0.385 0.016 0.369 489.85 96.53 0.228 0.027 0.218 0.032 0.814 0.230 0.458 0.457 0.473 0.485 0.477 0.414 0.023 0.391 425.56 90.24 0.243 0.029 0.233 0.036 0.833 0.250 0.421 0.426 0.439 0.465 0.439 0.431 0.022 0.409 Area is measured in pixels instead of nm Perimeter is measured in pixels instead of nm - 85 - Table 29 plGFP Differential Ratio GFPsi RNA ctrl siRNA 2 siRNA 1 siRNA 2 siRNA 3 paxsi RNA siRNA 8 siRNA 7 siRNA 6 siRNA 5 siRNA 4 'area' 'perim eter' std extens ion' std disper sion' std elong ation' mean velocit y' mean motio nLine arity' std comp act factor' freq area freq perim eter freq extens ion freq disper sion freq elong ation mean extens ion mean disper sion mean elong ation 0.135 0.081 0.005 0.000 0.008 0.055 0.033 0.045 -0.086 -0.093 -0.080 -0.066 -0.080 0.103 0.116 0.103 0.050 0.027 -0.084 -0.231 -0.075 -0.078 0.012 -0.107 -0.090 -0.080 -0.083 -0.053 -0.083 -0.014 -0.115 -0.009 0.155 0.076 -0.107 -0.122 -0.107 -0.260 -0.009 -0.122 0.042 0.026 -0.024 0.004 -0.022 -0.046 -0.060 -0.046 0.104 0.053 -0.117 -0.226 -0.114 -0.161 0.023 -0.151 -0.063 -0.041 -0.092 -0.025 -0.090 -0.021 -0.162 -0.013 0.102 0.066 -0.036 0.029 -0.034 -0.280 -0.016 0.026 0.004 0.006 -0.004 -0.006 0.002 0.020 0.122 0.015 0.129 0.065 -0.107 -0.234 -0.101 -0.158 0.021 -0.150 -0.088 -0.085 -0.074 -0.041 -0.065 0.029 -0.151 0.038 0.044 0.025 -0.057 -0.105 -0.058 -0.137 -0.019 -0.101 0.033 0.026 0.008 0.036 0.008 -0.024 -0.083 -0.021 0.046 0.029 -0.050 0.050 -0.049 -0.149 -0.030 0.010 0.021 0.036 0.013 0.044 0.022 0.032 0.044 0.032 0.094 0.055 -0.105 -0.234 -0.103 -0.043 0.023 -0.113 -0.116 -0.137 -0.160 -0.145 -0.156 0.005 -0.135 0.012 0.201 0.117 -0.013 0.100 -0.013 -0.100 -0.018 0.058 -0.011 -0.033 -0.049 -0.025 -0.041 0.082 0.296 0.071 0.043 0.044 0.052 0.150 0.053 -0.010 0.004 0.150 -0.090 -0.099 -0.116 -0.064 -0.118 0.127 0.213 0.123 Red labeling indicates an increasing ratio Yellow labeling indicates a decreasing ratio - 86 - Table 30 plGFP T-Test at 0.005 GFPsi RNA ctrl siRNA 2 siRNA 1 siRNA 2 siRNA 3 paxsiR NA siRNA 8 siRNA 7 siRNA 6 siRNA 5 siRNA 4 'area' 'perim eter' std extensi on' std disper sion' std elonga tion' mean velocit y' mean motion Lineari ty' std compa ct factor' freq area freq perime ter freq extensi on freq disper sion freq elonga tion mean extensi on mean disper sion mean elonga tion 1 1 0 0 0 0 1 0 -1 -1 -1 -1 -1 1 0 1 0 0 -1 -1 -1 0 0 -1 -1 -1 -1 0 -1 0 0 0 1 1 -1 0 -1 -1 0 0 0 0 0 0 0 0 0 0 1 1 -1 -1 -1 -1 1 -1 -1 0 -1 0 -1 0 -1 0 1 1 0 0 0 -1 -1 0 0 0 0 0 0 0 1 0 1 1 -1 -1 -1 0 1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 1 -1 -1 -1 0 1 -1 -1 -1 -1 -1 -1 0 0 0 1 1 0 0 0 0 0 0 0 0 -1 0 0 1 1 1 0 0 0 0 0 0 0 0 -1 -1 -1 0 -1 0 0 0 Null hypothesis: There are no significant changes in feature measurement compared with control group Red labeling indicates a right-tail rejection Yellow labeling indicates a left-tail rejection Green labeling indicates that null hypothesis cannot be rejected - 87 - Appendix VI: Experiment Protocol of siRNA Screening 30-Mar-2008 plate cells in 48 wells plate plGFP 12 wells a 10000 cells per well plGFP 12 wells a 20000 cells per well Erb1 12 wells a 10000 cells per well Erb1 12 wells a 20000 cells per well 31-Mar-2008 transfection in 48 wells 100nM/Dharmafect 2 plGFP-20000 Erb1-20000 ctrl siRNA4 ctrl siRNA4 GFPsiRNA siRNA5 GFPsiRNA siRNA5 ctrsiRNA2 siRNA6 ctrsiRNA2 siRNA6 siRNA1 siRNA7 siRNA1 siRNA7 siRNA2 siRNA8 siRNA2 siRNA8 siRNA3 paxsiRNA siRNA3 paxsiRNA 1-Apr-2008 cell passage in a collagen coated 24 wells glass bottom plate Coat the glass bottom plate with 30 ug/ml collagen Trypse the cells with 50 ul trypsin-EDTA Add to the glass bottom plate where 500 ul complete MEMα μεδωμ ιν εαχη ωελλ ισ ctrl GFPsiRNA ctrlsiRNA2 siRNA1 siRNA2 siRNA3 siRNA4 siRNA5 siRNA6 siRNA7 siRNA8 paxsiRNA ctrl GFPsiRNA ctrlsiRNA2 siRNA1 siRNA2 siRNA3 siRNA4 siRNA5 siRNA6 siRNA7 siRNA8 paxsiRNA 4-Apr-2008 live cell imaging 8h Starvation 10h I (-EGF) Starvation II 11h I (+EGF) 12h II (-EGF) Starvation III 13h II (+EGF) 14h III (-EGF) Starvation IV 15h III (+EGF) 16h IV (-EGF) 17h IV (+EGF) 18h ready Two positions are observed for each well. EGF stimulation 1.2 ul per 2 ml EGF stock solution 10 ul 1.250ml kaal medium - 88 - Reference [1] Changjiang Yang, Ramani Duraiswami, Daniel DeMenthon, Larry S. Davis: Mean-shift analysis using quasi Newton methods, Perceptual Interfaces & Reality Laboratory, University of Maryland, College Park, MD 20742, The United State of American [2] Fernando de la Torre, Shaogang Gong, and Stephen McKenna: View-Based Adaptive Affine Tracking, Department of Signal Theory, Universitat Ramon Llull, Spain, Department of Computer Science, Queen Mary and Westfield College, England, Department of Applied Computing, University of Dundee, Scotland. [3] Rafael C. Gonzalez, Richard E. Woods, Steven L. Eddins, Digital Image Processing using MATLAB, Person Prentice Hall Publishing, ISBN 0-13-008519-7 [4] Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter, Molecular Biology of The Cell, Garland Science Publishing, New York, ISBN 0-8153-4072-9, The United State of American [5] Ron Shamir, Tel Aviv University School of Computer Science, Fall 2001-2002, Algorithms in Molecular Biology, Pairwise alignment, Israel [6] Kai She1, George Bebis, Haisong Gu, and Ronald Miller, Vehicle Tracking Using On-Line Fusion of Color and Shape Features, Computer Vision Laboratory, University of Nevada, Reno, NV, Vehicle Design R & A Department, Ford Motor Company, Dearborn, MI, The United State of American [7] Gunnar Farneb¨ack, Fast and Accurate Motion Estimation using Orientation Tensors and Parametric Motion Models, Computer Vision Laboratory, Department of Electrical Engineering, Link¨oping Universit, SE581 83 Link¨oping, Sweden [8] Qi Zang and Reinhard Klette, Object Classification and Tracking in Video Surveillance, Lecture Notes in Computer Science, CITR, Computer Science Department, The University of Auckland, Tamaki Campus, Auckland, New Zealand, Publisher Springer Berlin / Heidelberg, ISBN 978-3-540-40730-0 [9] Mei Han, Wei Xu, Hai Tao, Yihong Gong, An Algorithm for Multiple Object Trajectory Tracking, NEC Laboratories America, Cupertino, CA, USA, University of California at Santa Cruz, Santa Cruz, CA, USA [10] Kota Miura, Cell Biology and Biophysics Programme, EMBL, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Advances in Biochemical Engineering/Biotechnology, Publisher Springer Berlin / Heidelberg, ISBN 978- 3- 540- 23698- 6 [11] Reyes-Aldasoro CC, Akerman S, Tozer GM. Measuring the velocity of fluorescently labeled red blood cells with a keyhole tracking algorithm, Cancer Research UK Tumor Microcirculation Group, Academic Unit of Surgical Oncology, The University of Sheffield, K Floor, School of Medicine & Biomedical Sciences, 2008 Feb, Beech Hill Road, Sheffield S10 2RX, UK. [12] http://en.wikipedia.org/wiki/Main_Page the free encyclopedia [13] http://www.datastructures.info/the-backtracking-algorithmmethod/ a sample of backtracking [14] J.M.Fuertes Garcia, M. Lucena Lopez, Region Growth Based on Color Gradients, Dept. de Informatica, Escuela Politecnica Superior, Universidad de Jaen, Spain, N. Perez de la Blanca, J. Fdez-Valdivia, Dept. de Ciencias de la Computacion e I.A., ETSI Informatica, Universidad de Granada, Spain. [15] Raziel Álvarez, Erik Millán and Ricardo Swain-Oropezn Multilevel Seed Region Growth Segmentation, Carretera al Lago de Guadalupe, Atizapán de Zaragoza, Estado de México, 52926, Mexico, Mechatronics Research Center (CIMe), Tecnológico de Monterrey, Campus Estado de México, Km 3.5, Lecture Notes in Computer Science, Copyright 2005, Springer Berlin / Heidelberg, ISBN 978-3-540-29896-0 [16] Dr. Ir. Fons. Verbeeks, Dr. Nies Huijsmans, Lecture Notes for Image Analysis in Microscopy, Leiden University, LIACS, The Netherlands [17] Virginia Fonte, Brian Hiester, John Yerg, Jmil Ferguson, Susan Csontos, Michael A. Silverman, and - 89 - Gretchen H. Conversion of GFP into a toxic, aggregation-prone protein by C-terminal addition of a short peptide Christopher D. Link, Stein IBG, University of Colorado, Boulder, CO 80309, USA [18] Drs. William E. Green, Canny Edge Detection Tutorial, Mechanical Engineering and Mechanics, 3141 Chestnut Street, MEM Department, Room 2-115, Drexel University, Philadelphia, PA 19104, 2002 [19] Cox, I.J.; Boie, R.A.; Wallach, D.A.; "Line recognition", Pattern Recognition Proceedings.10th International Conference on Volume i, Page(s):639 - 645 vol.1, June 1990 [20] Delport, V.; Liesch, D. Fuzzy-c-mean algorithm for codebook design in vector quantization Electronics Letters Volume 30, Issue 13, 23 Jun 1994 Page(s):1025 - 1026 [21] The Sage Dictionary of Statistics, Duncan Cramer, Dennis Howitt, 2004, ISBN 076194138X [22] Yael Paran, Micha Ilan, Yoel Kashman, Sofee Goldstein, Yuvalal Liron, Benjamin Geiger, Zvi Kam, Highthroughput screening of cellular features using high-resolution light-microscopy; Application for profiling drug effects on cell adhesion, Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel, HTS Center, Department of Zoology, Tel Aviv University, Tel Aviv 69978, Israel, School of Chemistry, Tel Aviv University, Tel Aviv 69978, Israel, published by Elsevier Inc, 2007, doi:10.1016/j.jsb.2006.12.013 [23] Suha Naffar-Abu-Amara, Tal Shay, Meirav Galun, Naomi Cohen, Steven J. Isakoff, Zvi Kam, Benjamin Geiger, Identification of Novel Pro-Migratory, Cancer-Associated Genes Using Quantitative, MicroscopyBased Screeening, Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel, Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel, Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel, Massachusetts General Hospital Cancer Center, Boston, Massachusetts, United States of America, Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, United States of America [24] Neumann B, Held M, Liebel U, Erfle H, Rogers P, Pepperkok R, Ellenberg J, MitoCheck Project Group, European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany, Highthroughput RNAi screening by time-lapse imaging of live human cells. [25] Hinton CV, Avraham S, Avraham HK, Division of Experimental Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA 02115, USA, Contributions of Integrin-Linked Kinase to Breast Cancer Metastasis and Tumorigenesis. [26] Schaller MD (2001). "Biochemical signals and biological responses elicited by the focal adhesion kinase.". Biochim. Biophys. Acta 1540 (1): 1-21. PMID 11476890. [27] A comprehensive pathway map of epidermal growth factor receptor signaling. Molecular Systems Biology doi:10.1038/msb4100014, 2005 May [28] Nobel Prize in Physiology or Medicine for 1989 jointly to J. Michael Bishop and Harold E. Varmus for their discovery of "the cellular origin of retroviral oncogenes". [29] Persad S, Dedhar S (2004). "The role of integrin-linked kinase (ILK) in cancer progression.". Cancer Metastasis Rev. 22 (4): 375-84. PMID 12884912. [30] Herdegen T, Leah JD (1999). "Inducible and constitutive transcription factors in the mammalian nervous system: control of gene expression by Jun, Fos and Krox, and CREB/ATF proteins.". Brain Res. Brain Res. Rev. 28 (3): 370-490. PMID 9858769. [31] Martin TA, Harrison G, Mansel RE, Jiang WG (2004). "The role of the CD44/ezrin complex in cancer metastasis.". Crit. Rev. Oncol. Hematol. 46 (2): 165-86. PMID 12711360. [32] Hoeflich KP, Ikura M (2005). "Radixin: cytoskeletal adopter and signaling protein.". Int. J. Biochem. Cell Biol. 36 (11): 2131-6. doi:10.1016/j.biocel.2003.11.018. PMID 15313460. [33] Goldmann, W.H., Ingber, D.E. (2001). Intact vinculin protein is Require for control of cell shape, cell mechanics, and rac-Dependent Lamellipodia Formation. Biochemical and Biophysical Research Communications. 290: 749-755. - 90 - - 91 -