Get Ready! Go Parallel!

:Once upon a time I used to build VC++ 5.0 applications .on computer that had a single processor,128 MB RAM and 1GB Hard disk space. Those days are bygone. Today there are already desktop computers and laptops with  multiple processors and multiple cores. OK, this is change in hardware, good. Does this change matter to software? Does this change affect the way I perform software development? Is there a change needed in the way I write software in the future compared to today.?

No matter how processors increase their output and speed, software consistently finds new ways to eat up the extra speed.  As a software developer, we keep riding the free performance due to processor/ hardware performance. We are of the major opinion that  “tomorrow’s processors will provide better performance and you see today’s application are constrained by CPU throughput and memory speed, but due to I/O-bound, network-bound, database-bound operations”.

Today as a software developer, we enjoy the free lunch wave of desktop computer performance. Will this continue in the future?  Processors have always optimized the clock speed and execution optimization or increased the size of chipset to provide better performance. These optimizations have lead to speedups in sequential (nonparallel, single-threaded, single-process) applications, as well as applications that do make use of concurrency. Effectively to run chips at higher speeds, there came a challenge of impractical quantities of cooling equipment that affects the climate and increases the cost of power consumption. As these optimizations might not continue for long without increasing power consumption, the hardware is taking the new approach to increase the performance by multi cores and multi processors in desktop computers.

To take advantage of parallelism, applications face the following challenges and need to be ready for change

  1. Applications need to be written concurrent if they want to fully exploit CPU throughput gains due to current availability of the multi –core machines today. Let us be prepared that there might be no single-core machines after a couple of years.
  2. Application tasks shall become CPU-bound in addition to being network-bound and data-bound. This does not rule of operations that are not CPU-bound
  3. Demand for efficiency and performance optimization will become more important. This will add with demand for performance-oriented practices and solutions.
  4. Programming languages and systems will increasingly be forced to deal well with concurrency.

Let us assume that business and individuals realize the potential of multi processor environment and need to take advantage of this opportunity, the first question is where do I start? How do I start? How to get ready my developers for writing parallel code? What are the out of box tools available in the market to help in this saga?What is the software development approach to be followed for writing parallel and concurrent code.?

I would think that we can take the first step towards writing parallel code is to migrate existing applications and products to take the advantage of going parallel. For an existing application, the approach to make the application parallel is as follows.

  • Have a strategy to find where to start looking whether parallelism can give raise to performance and where to start.. This includes performing a lot of different benchmark tests to measure current performance . This helps you to focus on identifying the serial steps in the system and possible steps that can be made parallel.
  • Design and Implement the functionality feasible  to be expressed as parallelism as parallel steps. Analyze the memory handling is prepared for parallelism and plan for its migration.
  • Debug the new parallel code and try to look for race conditions and deadlocks. Please remember that code not written for parallelism might reduce the existing performance of the system. Identify the suitable debugging environment  that shall help your developers  to find errors in implementation.
  • Measure the performance of the new system against the old benchmarks and analyze the difference and validate where the difference is actually driven by parallelism. A major leap in performance or major degrade in performance need to make the architect go back and re-evaluate all the above steps.

Getting the  developer ready to implement the development approach? The vast majority of programmers today do not think in terms of concurrency. Hence developers and architects need to be be prepared, ready and time should be planned for the investment in training and time. We need to learn learn concurrency (what’s a race? what’s a deadlock? how can it come up, and how do I avoid it? what constructs actually serialize the program that I thought was parallel? how is the message queue used to write parallel code? Beyond the “whats” and “hows,” why are the correct design practices? Are they actually correct?).

The developers who perform lock-free programming, where the framework takes charge of concurrency also needs to be aware of multi-core. Let us accept that concurrent lock-free programming is equally harder to understand and reason about than even concurrent lock-based programming. For a programmer used to write programs for sequential  control flow it is trickier and much harder to write programs for parallel flow. Developers also need to be aware of the  pitfall. Concurrent code that is completely safe might give same or less performance on a multi-core machine than on a single-core machine, typically because the threads aren’t independent enough and share a dependency on a single resource which re-serializes the program’s execution

Minority of people writing multithreaded applications today also need to understand the addition of multiple processor in your deployment environment. They write multi-threaded programs today to achieve one of three; to logically separate naturally independent control flows,to scale by taking advantage of multiple physical CPUs, to easily take advantage of latency in other parts of the application. Though the multi-threaded developer has advantage in multi-core era, but he still needs to learn and invest time to understand the difference between threaded model and the parallel model. Multi-threaded application testing shall discover new concurrency bugs or these bugs shall surface only on true multiprocessor systems, because of the conceptual difference that the threads aren’t just being switched around on a single processor but where they really do execute truly simultaneously and thus expose new classes of errors.

Getting the business aware ready to go for parallel?

  • if your application is threaded and is running , and you have a short timeframe to release your current version, it may be cost effective to simple the additional multi cores and multi processor’s.  Plan to focus on parallel for the next version release of the application.
  • Predictability of time estimation to write parallel code vs. the  actual time might have huge variance and thus there is all reason for increase in the cost and delay in shipment. Hence plan for risks and contingencies
  • Multi-threaded programs can be very difficult to debug. If your schedule is short, and if you’re already got a single-threaded code-base up and running, then it may well be cost effective to simply ignore the extra CPUs, at least for this version of the game. The decision to go single threaded can also greatly simplify the debugging process.
  • An application has both code that needs to un serial and ones that can run parallel. It is quite possible that a large portion of a the application is unavoidably serial. This means that the time taken no more matters on, no matter over how many processors you spread the execution of your parallelized code.
  • Please be aware of the impact of  Amdahl’s Law. The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. To understand better, let us consider that there is task that has two parts A and B. Let us assume that B roughly takes 25% of the time of the whole computation. You might not get much advantage by trying to make B parallel. you need to select the area that needs to be made parallel.
  • Get ready with the money and ROI to invest in the infrastructure and environment needed to go parallel

Getting the environment ready to go for parallel?

Development environment needs tools to write code and to perform testing. Lot of existing languages like C#,?Java and C++ are trying to add parallel capability code to help developers. Testing of dual core is possible, but do I test machine only for dual core. Business would like the same to be tested for 4-cores, 8-cores and 16-cores. What would be the cost of buying those machines? Will the product company or the deployment company user bear the cost. I have collected some links to do more research and train the folks at the place I work. Sharing the same