% fortune -ae paul murphy

Uses for supercomputers

Everybody knows what supercomputers are useful for, right? - things like weapons simulations, weather forecasting, submarine tracking, pattern matching (in the biosciences), fluid dynamics and materials simulations, graph analyses, and cryptology.

I think there's another one - one that's already important and likely to become more so over time, and that's data reduction.

For example, the single biggest cost driver in remote sensing and earth sciences isn't the cost of getting the satellite up there and working, it's the cost of receiving and processing the data the thing can produce.

Thus typical modern remote sensing instrumentation can produce around 300TB of data a day - but on board computing limits compression and transmission to perhaps ten percent of that and, of course, even that 30TB has to be stored and processed again at the receiving station before being passed on to users.

Thus the time killers, and cost drivers, for the end result the user cares about are in processing and transmitting the data, not in getting it. Hang a multi-teraflop processor on the satellite, however, and it becomes feasible to equip larger customers with quite small (0.66 meter) receiving stations, rent them transponder time, and make far more effective use of the satellite while by-passing most of the cost, and most of the delay, in getting them what they want, when they want it. In fact, with three satellites it would be possible to guarantee the customer a start of data transmission on exactly the data of interest for any area of interest within an hour of request receipt.

The same idea applies in many areas of earth bound sensing - in seismology, for example, you collect and store terabytes of data per minute, haul it all to your company's super computing grid, and then throw most it away after processing. Hang a multi-teraflop processor right at the end of the seismic string or trawl, however, and you could avoid most of the storage problem, skip right to reservoir modelling and information recovery, and refocus or reshoot anything that looks sufficiently interesting before you move the equipment very far.

Back in 1997 a geophysicist named Shawn Larsen, working at the Livermore labs, gave an interview to Alan Beck at HPCwire, about the importance, structure, and value of a piece of earth movement modelling code called E3D that he mostly wrote himself. Take a look at his description of it:

HPCwire: Please describe the applications utilised in this work, e.g. the language(s) used, size, and programming challenges encountered. What kind of database is utilised, and how is it managed? How long did the calculations take?

LARSEN: "We use E3D, a code developed at LLNL. It is an 3D elastic finite-difference seismic wave propagation code that is 4th order accurate in the spatial domain and 2nd order accurate in time. It is explicit. The code is written in C (good old fashioned K&R C), although the lowest level numerical calls (which handle 98 percent of the computations) are written in FORTRAN. This increases performance by about 40 percent on workstations or HPC machines using RISC technology.

"The code itself is about 15,000 lines long and is fully self-contained. It has a message passing interface (using mpi, mpsc, or pvm depending on the platform). The same code runs on workstations and parallel machines (there are references to different libraries in the Makefile).

"E3D includes a number of features (physics and computational) including free surface topography, attenuation, hybridization (can connect to other codes), propagating grids, and a variable density grid. We incorporate paraxial extrapolators as absorbing boundary conditions along the grid edges, which eliminates almost all of the artificial reflections from the grid boundaries.

"The number of grid points (e.g., nodes or zones depending on terminology) for our simulations was about 50 million (a 175x100 x 40 km deep 3D volume with a resolution of 0.25 km). The run-times were about 10 hours on a 40-node Meiko CS-2. Each node has 2 SuperSparc processors and 128 MBytes of distributed internal memory. We decided on this machine because we were under a conference deadline and it had an abundant availability of free computer time. As an aside, two years ago we observed a speed of 28 GFlops on a 256-node Meiko CS-2 with vector processors.

"The grid is simi-structured. That is, it is composed of blocks of regularly spaced grid points.

"The only item remotely resembling a database is the geologic model used in the calculation. The compressional and shear seismic velocities and the medium density is required at each node. We could have utilized a file (or distributed files in parallel) where the velocity and density at each grid point is explicitly defined. Instead, we elected to parameterize the model as units or blocks of geologic structure. This has the advantage of minimizing the I/O into the model.

"Quite honestly, the greatest computational challenge is probably the I/O associated with the simulated data of interest. Since our grid is simi-structured, there are various tradeoffs between simplicity and efficiency when dealing with the output. In additional, significant time was devoted to incorporating certain physics into the finite-difference code."

Today that thing runs on 100 teraflop grid style supercomputers like ASCI (Accelerated Strategic Computing Initiative) Purple and is used to do things far beyond earthquake effects modelling - things like helping help model ground water flows in the Los Angeles basin and helping determine the precise point at which North Korea's nuclear test fizzled.

The code works and so do ASCI Purple and Bluegene/L, but the process is now even more data starved than it was in the days of the 1994 superSPARCs he refers to, because processing capabilities have advanced at a much faster rate than data collection, storage, and management capabilities.

One solution is to focus funding on the collection and storage problem - more sensors, more tape libraries, bigger diskpacks, switching to ZFS from Unitree, and so on. That's expensive but has the benefit that it uses known technologies in proven ways.

A more innovative solution, and one I expect we will see a lot of soon, is to construct a grid from multi-teraflop CPUs located right on, or adjacent to, the sensors gathering the data. That would enable things like real time reporting and self correcting models, while reducing system wide storage and related complexities (measured in dollars per year) by perhaps two orders of magnitude.

So what's the bottom line on making this kind of thing work? The need is there, the software exists -stuff like E3D would run more or less unchanged- and IBM's cell means that the cheap hardware needed is on the way.


Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.