Saturday, August 2, 2008

Multithreading comes undone


EDA vendors have struggled to meet the challenge of multicore IC design by rolling out multithreading capabilities for their tools. Nonetheless, the question cannot be ignored: Is multithreading the best way to exploit multicore systems effectively?
"Threads are dead," asserted Gary Smith, founder and chief analyst for Gary Smith EDA. "It is a short-term solution to a long-term problem."

At the 45nm node, more and more designs reach and exceed the 100 million-gate mark. These designs break current IC CAD tools, forcing EDA vendors to develop products capable of parallel processing.

Until now, parallel processing has relied on threading. Threading, however, tends to show its limits at four processors, and EDA vendors may have to come up with new ways of attacking the problem.

"Threads will only give you two or three years," Smith said. "Library- or model-based concurrency is the best midterm approach."

Looking into the future
EDA vendors interviewed at the 2008 Design Automation Conference (DAC) painted a more nuanced picture of the future of multithreading.

"We have not seen the limits to multithreading in the timing-analysis area," said Graham Bell, marketing counsel for EDA startup Extreme DA Corp. "We see good scaling for three or four process threads. We get to see difficulties beyond that, but they are not dramatic."

With Extreme DA's GoldTime, a multithreaded static and statistical timing analyzer, the company has applied a fine-grained multithreading technique based on ThreadWave, a netlist-partitioning algorithm. "Because of our unique architecture, we have a small memory footprint," Bell said. "We have not seen the end of taking advantage of multithreading."

For applications with a fine-grained parallelism, multithreading is one of the most generic ways to exploit multicores, said Luc Burgun, CEO of Emulation and Verification Engineering SA. "On the other hand, multithread-based programs can also be quite difficult to debug." That's because they "break the sequential nature of the software execution, and you may easily end up having nondeterministic behavior and a lot of headaches."

According to Burgun, multiprocess remains the "easiest and safest way to exploit multicore." He said he expects some interesting initiatives to arise from parallel-computing experts to facilitate multicore programming. "From that standpoint, CUDA [the Nvidia-developed Compute Unified Device Architecture] looks very promising," Burgun said.

Simon Davidmann, president and CEO of Imperas Ltd, delivered a similar message. "Multithreading is not the best way to exploit multicore resources," he said. "For some areas, it might be OK, but in terms of simulation, it is not."

Multithreading is not the only trick up Synopsys Inc.'s sleeve, said Steve Smith, senior director of product platform marketing. "Within each tool, there are different algorithms. When looking at each tool, we profile the product to see the largest benefits to multithreading," he said. "Multithreading is not always applicable. If not, we do partitioning."

As chipmakers move to eight and 16 cores, a hybrid approach will be needed, asserted Smith, suggesting a combination of multithreading and partitioning.

To illustrate the point, Smith cited a host of Synopsys' multicore solutions in the area of multithreading, "HSpice has been broadly used by our customers. This is typically the tool you do not want to start from scratch," he said.

HSpice multithreading has come in stages, noted Smith. "Last year, we multithreaded the model-evaluation piece, and it gave a good speedup. Then, in March, we introduced the HSpice multithreaded matrix solver. We want to make sure our customers are not impacted, and we do it [multithreading] piece by piece," he said.

Another trend Synopsys is investigating, Smith continued, is pipelining. This technique—an enterprise-level activity, since it demands the involvement of IT—collapses multiple tasks, such as optical proximity correction and mask-data preparation, into a single pipeline.



HSpice multithreading has come in stages.



Last year, Magma Design Automation Inc. unveiled an alternative to multithreading, using a streaming-data-flow-based architecture for its Quartz-DRC design rule checker. Multithreading provides a less-fine-grained parallel-processing capability than Magma's data flow architecture, said Thomas Kutzschebauch, senior director of product engineering at Magma.

Magma's multicore strategy is focused on massive parallelism, Anirudh Devgan, VP and general manager of the custom design business unit, said at a DAC panel session on reinventing EDA with "manycore" processors.

"Four CPU boxes are just the beginning of a trend, and EDA software has to work on large CPUs with more than 32 cores," he said. "Parallelism offers an opportunity to redefine EDA productivity and value. But just parallelism is not enough, since parallelizing an inefficient algorithm is a waste of hardware."

Devgan's conclusion was that tools have to be productive, integrated and massively parallel.

Seeing beyond C
As he unveiled "Trends and What's Hot at DAC," Smith expressed doubts about C as the ultimate language for multicore programming. He cited the identification of a new embedded-software language as one of the top 10 issues facing the industry this year, and asserted, "a concurrent language will have to be in place by 2015."

EDA executives did not debate the point. "We will change language over time," stated Joachim Kunkel, VP and general manager of the solutions group at Synopsys. "We are likely to see a new language appear, but it takes time. It is more an educational thing."

On the software side, meanwhile, reworking the legacy code is a big issue, and writing new code for multicore platforms is just as difficult. Nonetheless, Davidmann held that "the biggest challenge is not writing, reworking or porting code, but verifying that the code works correctly, and when it doesn't, figuring out how to fix it. Parallel processing exponentially increases the opportunities for failure."

Traditionally, Davidmann said, software developers think sequentially. Now, that has to change. Chip design teams have been writing parallel HDL for 20 years, so it's doable—though it will take much effort and new tool generations to assist software teams in this task.

With single-processor platforms and serial code, functional verification meant running real data and tests directed to specific pieces of functionality, Davidmann said. "Debug worked as a single window within a GNU project debugger."

But with parallel processing, "running data and directed tests to reduce bugs does not provide sufficient coverage of the code," he said. "New tools for debug, verification and analysis are needed to enable effective production of software code."

Davidmann said Imperas is announcing products for verification, debug and analysis of embedded software for heterogeneous multicore platforms. "These tools have been designed to help software development teams deliver better-quality code in a shorter period of time," he said.

To simplify the software development process and help with the legacy code, Burgun said customers could validate their software running on the RTL design emulated in EVE's ZeBu. It behaves as a fast, cycle-accurate model of the hardware design.

For instance, he continued, some EVE customers can run their firmware and software six months prior to tapeout. They can check the porting of the legacy code on the new hardware very early and trace integration bugs all the way to the source, whether in software or in hardware. When the engineering samples come back from the fab, 99 percent of the software is already validated and up and running.

Thus, "ZeBu minimizes the number of respins for the chip and drastically reduces the bring-up time for the software," Burgun said.

- Anne-Francoise Pele
EE Times



No comments:

Google