Performance analysis of a parallel mode superposition algorithm for nonlinear structural dynamics

Date of Award




Degree Name

Doctor of Philosophy (Ph.D.)


Civil and Architectural Engineering

First Committee Member

Ahmad Namini, Committee Chair


Parallel processing is a very promising technology to meet the challenge of solving more complex problems in less wall-clock time. Several computer architectures have been developed in recent years to meet that challenge. However, to fully utilize the advantage of parallel processing, the user must be aware of the specific details of the hardware architecture. Therefore, tremendous effort is being made to enhance parallel performance by applying new algorithms and/or modifying the existing ones. In the area of structural mechanics, many algorithms have been developed to solve a wide variety of structural problems for different architecture types.In this study, an algorithm to solve nonlinear structural dynamics problems using the mode superposition method is developed for CRAY C90-2. The algorithm is specially designed to take advantage of two parallel features available on the CRAY architecture: (1) vectorization; and (2) multitasking. The algorithm is divided into eight modules, which are the basis for a performance analysis. The developed algorithm is also tested on a nonlinear structure, namely an earthquake analysis of the Luling Cable-Stayed Bridge. Three different analysis assumptions are performed: (1) linear; (2) nonlinear; and (3) nonlinear with updating of eigenmodes.An extensive comparison is made among the different analysis types, variations of the percentage of every module in each analysis as a function of the processing mode and the number of processors, and finally among the different modules. The performance analysis relies on three major parameters that are commonly used in the parallel computing environment: (1) the number of floating point operations per second (Megaflops); (2) the speedup; and (3) the CPU time.Results indicate that every module has a different behavior in a vector or in a multitasking environment. The maximum overall speedup from vectorization and multitasking using 16 processors is 76.9, 16.84, and 18.86, in cases of nonlinear with updating, nonlinear, and linear analyses, respectively. The percentage of CPU time of every module with respect to the overall time depends on the analysis type, processing mode, and the number of processors. The speedup attained from vectorization is higher than that obtained from multitasking using 16 processors for all the modules except the assembly module due to its short-length vector calculations. Also, results indicate that the QR method, which is currently used in the solution of a small eigenproblem in scalar computers, is not suitable for parallel computations since it has a poor performance in both vector and multitasking environments.


Applied Mechanics; Engineering, Civil; Computer Science

Link to Full Text