Monday, 31 May 2010

Yellow Light of Death and reflow

If you are one of the unlucky sods with a PS3 suffering the infamous YLOD syndrome, you might have heard of the Gilksy reflow fix. You might even have tried it, and actually brought your PS3 back from the dead, but chances are it's living on borrowed time. Read on to learn why.

In an earlier post, I described the infamous YLOD syndrome of the PS3. To recapitulate: Sony uses lead-free solder balls as per the RoHS directive. Lead-free solder has higher melting point, but is more sensitive to thermal stresses. The RSX and the CBE chips generate a lot of heat, especially in their 90nm incarnations. The PS3 cooling solution leaves something to be desired. The PCB will warp due to the heat and can cause the solderballs of the BGAs to lose connection. This manifests as the Yellow Light of Death.

Now, the Gilksy guide advices to use a heatgun to "reflow" the BGAs. This might sound good in theory and it might even work in practice, at least for a while. But you might be aware that this kind of fix is known to be temporary, and it's quite likely that the YLOD will make a reappearance in not too long. How come?

The sad answer is that the Gilksy guide most likely never really achieves a proper reflow. What's even sadder is that with each "reflow", the problem will become even more severe. Why?

It's because in order to heat the solder, not only the solder itself is heated, but also the BGA. In fact, the heat has to traverse the chip in order to reach the solder. On top of that, in these kind of guides, the internal heat spreader of the chip is usually still mounted, meaning heat has even more resistance. On top of that, lead-free solder has higher melting points, so in fact all that might happen during this kind of "reflow" is that the PCB warps and possibly connections that were broken come together slightly. Enough to make a connection but very fragile. When the console becomes warm again, and the PCB starts to warp, it is prone to break the connections once again.

Here is the really bad part though: solder that is going through a heat cycle will actually achieve an even higher melting point! This means that if the first fix didn't properly achieve a reflow, the second time will be even less likely to work! And the third time much more. And so on. In fact, solder will deteriorate through each cycle until it needs to be replaced. Without flux this is even more severe!

What's really bad is that the higher the melting point of the solder, the higher the BGA chip itself has to be heated in order for the heat to reach the solder. These chips will be damaged if the temperature becomes too high! In fact, manufacturers usually specify a heat profile that reflow ovens can be programmed to follow. There is a reason for this profile...

What can be done about this sad state of affairs?

Well, the internal heatspreader (IHS) on the RSX is very easy to remove. I think it's quite important to say that any "reflow" should NOT be attempted with the IHS still attached to the RSX!

There is actually a second reason for this, in addition to the heat resistence: The RAM chips underneath the IHS are epoxied to it, but the RSX die just has thermal paste. Applying heat to the IHS, in order to heat the BGA in order to heat the solder, will dry out the thermal paste that's attaching the RSX die to the IHS!

The proper way to fix the YLOD is to actually remove the BGA, reball it with fresh solder (preferably leaded) and then reflow it (preferably according to the proper temperature curve.) This is far from trivial.

If the solder has become damaged by repeated high temperature cycles, excessive temperatures might be required in order to liquify it in order to remove the chip. There is also a risk of pulling out pads off the PCB when trying to remove the BGA. Actually reballing the chip is not the really difficult part, with the proper tools, but putting it back on the board properly is no easy picnic! There are so many solder balls that have to be precisely aligned. There is a difference between a low ball count small BGA (<100 solderballs) and these fat chips with 1320 balls!

I will post more information about this later. I'm still experimenting.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.