compas logo


Center for Observations and Prediction at Scripps (COMPAS)
SIO Logo
Home
About
Research
Publications
People
Links

Facility

Introduction

The original COMPAS cluster (bought using a 2001 NSF MRI grant)

The original COMPAS cluster (funded using the NSF MRI grant) has been leveraged to build a larger computing facility. Two other large clusters have been purchased with ONR and NOAA funding and are all housed and managed together at SIO. The resources of the COMPAS clusters are shared among the PIs and students in the COMPAS project The resources in COMPAS are facilitated by the COMPAS Director (originally Detlef Stammer and later Bruce Cornuelle) and the COMPAS System Manager (Caroline Papadopolous)


Hardware

The current COMPAS compute facility has expanded so that it currently houses over 450 computer nodes CPUs and associated storage servers:

310 1.0 GHz Pentium III, 512MB/CPU (310 GFlops Peak) 100 Mbit + Myrinet 2000
202 3.06 GHz Xeon 1GB/CPU (1236 GFlops Peak) 1 Gbit Ethernet + Myrinet 2000
256 2.8 GHz Xeon, 1GB/CPU (1433 GFlops Peak) 1GB/CPU (1433 GFlops Peak) 1 Gbit Ethernet + Myrinet 2000
128 1.0 GHz Pentium III 512MB/CPU 100 Mbit + 1 Gbit Ethernet. Connected to OptIPuter 10Gigabit Campus network
16 CPU 733MHz Pentium III 512MB/CPU 100 Mbit Ethernet + Myrinet (Test Cluster)
14 Storage Servers 1 Gigabit Ethernet 12 TB total storage, RAID-5
4 Storage Servers 1 Gigabit Ethernet

2.9 TB total storage, software RAID-1


General Description

The largest cluster is 128 Nodes (256 CPUs) and is limited by the size of existing 128-port Myrinet switches.Three independent switch fabrics define the three main clusters. Because of the large performance difference between Pentium III and Xeon processors, applications either target PIII or Xeon configurations (no mixing) even though these processors co-exist in the same Ethernet and Myrinet fabrics. This heterogeneous collection represents several different major acquisitions over the last 5.5 years. The total facility (453 Xeon + 459 PIII) has a theoretical peak speed of 3 TeraFlops (TF). The configuration of machines is defined by our targeted workload allowing us to more easily make memory vs. network vs. compute power trade-offs than more general-purpose installations. Our 18 storage servers all have hardware RAID-5 or software RAID-1 with .75TB to 1.4TB each (depending on configuration) and run the standard NFS (Network File Server) protocol giving adequate performance. The models used by COMPAS researchers have been coded to take advantage of node-local disks to dramatically improve performance. Storage performance is an acknowledged weakness for clusters, but by load balancing the nfs servers we are able to work around this weak link.

General Network

The COMPAS facility has a single 1Gbit/s network connection to the campus backbone. In addition a single 10Gbit/sec connection is available to the OptIPuter network (a campus and national scale research network funded by NSF).

Job Characteristics

Computing jobs that run on the COMPAS facility are generally characterized as long-running, mid-sized parallel applications with processor counts of 32 to 128 CPUs. Jobs using 64 and 76 CPUs are common as these mark where parallel efficiency begins to drop off in this configuration. Runs are often long-lived, typically several days (3-5+). The COMPAS computing facility assigns these long-lived runs to dedicated processors so jobs run with little intervention for weeks at a time without queue waits. The jobs are distributed to have one process per processor and the process must fit into main memory to attain acceptable performance. We have found that it is quite cost-effective to distribute individual user accounts across medium-sized, gigabit-connected, IO servers. Multiple I/O ”pipes'' mitigate interference of users running different codes. We use hardware RAID-5 with hot-swap spares or software raid 1 on these servers to help minimize data failure. Smaller jobs are usually assigned to the older test clusters in a development environment that sometimes uses queues.

Mid-range facility

COMPAS fits within the hierarchy of computer centers as a mid-range facility. Larger supercomputing centers are important resources, but our usage patterns often conflict with their stated mission. For example, SDSC's web site states: “SDSC's machines are a national resource, allocations are assigned on the basis of scientific merit and on the inability of other, less-capable computing sites to perform the work'' In this context, the COMPAS compute facility is a “less-capable computing site''. Yet, in aggregate, the current facility will be able to deliver nearly 8 Million CPU-hours/year. If COMPAS computational science was shifted to a national resource such as SDSC, then this would detract from larger jobs that required the massive resources that are available at national centers. The codes that run on the COMPAS compute facility are all somewhat similar in their use of computing resources. This has allowed us to choose machine configurations optimally (a specific balance of Flops/disk/memory) to achieve high performance.  It has taken considerable human investment to port/tune/develop the codes to run efficiently on the COMPAS clusters.



These pages are maintained by webmaster , last update April 4, 2007


Scripps Institution of Oceanography
University of California, San Diego