WIP - GPU Repair Notes ~ 20220324

Table of Contents

Post is modified on an ongoing basis, last updated 202508, images TBC

March 2020. The onset of the coronavirus results in a sudden demand for computer graphics cards as people are confined to their homes and seek entertainment from their computers. While most are locked out of the new GPU market with prices rising more than 200%, a few in the repair community saw a chance to start creating some value out of junk on the used market. From here begins my long journey into the electronics repair rabbit hole.

PCIe Knowledge Check

Standard PCIe Supply Voltages

On most PCIe cards, regardless of their peripheral type, a series of pins on the PCIe gold finger provide voltages direct from PC (ATX or otherwise) power for:

High power peripherals generally receive majority of their 12V power via the PCIe 6-Pin or 8-Pin finger but there are other defined standards for additional power:

PCIe Lanes

PCIe uses 1 to 16 lanes initialized from the leftmost side of the slot to the rightmost. Bifurcation and lane reversal is another rabbit hole which I will need a whole new article to cover.

GPU Power Up Basics

On board voltage rails

On standard PCI Express based graphics cards, the modern GPU die is rather complex with multiple voltage rails. On board rails power up sequentially and only after signals from the other voltage rails are received.

We established above that there are only 2 voltages coming into the GPU: 12V and 3.3V. From these, generally auxillary voltage rails are generated as below

For critical GPU voltage rails we have:

For the purposes of repair, the control circuit and MOSFETs for a power phase will have:

Obviously for each type of chip this is not always the case and there may be more steps to the power up process.

On Maxwell GPUs for example, the PGOOD signal from the memory rail would be supplied into the enable for the NVVDD PWM Controller.

Resistance measurements

All voltage rails mentioned above should generally have a resistance above 50 ohms EXCEPT for NVVDD which has a resistance low enough that there isn’t a useful reading out of a multimeter.

Notes from typical resistance measurements I’ve seen:

Since core resistance is tiny, it’s much more useful to measure a short between 12V and the NVVDD VRM which we will cover below.

Core and Frame Buffer Voltage Generation

GPU cores have such low resistances that the power going through them needs to be supplied with a large amount of current by ohm’s law to get the power targets for high intensity workloads like gaming. This means currents could be anywhere from 50-100A at 1V through the core. To supply such current, we need optimized VRMs (voltage regulation modules) usually of 1 or more phases.

Per Phase Design

A single phase in a PWM is basically a circuit which takes in 12V and switches it at such a frequency (usually a 10-1 duty cycle) as to create pulses of 12V. This is then smoothed out through an inductor which essentially averages out the signal into a 1.2V supply. Of course a single phase is not going to have stable voltage suitable for complex integrated circuits so more phases will bring stability and higher current capability.

Dedicated power transistors are used to switch the input voltage on or off rapidly and are known as MOSFETs in most VRMs.

For the switching segment of a VRM phase, the essential elements are:

In modern GPUs with more than 10 phases, you will often see all 3 components combined into a DRMOS or driver + MOSFET combo. It is basically a power IC that does the function of everything above and is more efficient + uses less space.

A PWM controller is used to control the output voltage to the GPU.

Let’s use a common example of a core VRM with 12V input.

  1. The high side ON, low side OFF. The current flows from 12V -> Inductor -> High Side MOSFET -> Inductor -> Capacitor (often) -> CORE. Voltage increases slowly due to inductor.
  2. The high side OFF, low side ON. Voltage on the phase drops as it flows through low side MOSFET to GND.

The PWM controller will control the high and low side operation with the form of EN signals to each MOSFET driver which switches it on or off.

Troubleshooting

With the above information we can now outline the basic steps to troubleshoot a broken GPU.

The Tools

PWR - PC stays off or explodes (unlucky) when GPU installed

This means there is a short on the primary voltage rails (12V or 3.3V). Modern power supplies should prevent startup with over current protection but you can trip fuses if it does not so always measure resistances first.

GPU short can be confirmed with a simple multimeter. There are a few outcomes

  1. Short on the PCIe 6/8 Pin Rail - Almost always a VRM phase on memory or core. Check out this video for an example of how to fix this. Else follow 12V troubleshooting below.
  2. Short on the 12V PCIe finger - On budget GPUs, some of the phases such as memory or core may also come from the 12V finger so follow the steps above as a first try. Else follow 12V troubleshooting below.
  3. Short on the 3.3V Rail - Usually some minor logic IC has some problems but you may be unlucky.

Troubleshooting a short can be easy or hard depending on the tools you have. For budget, you can use isopropyl alcohol and inject voltage through the 12V with low current with a bench power supply to see what is heating up and evaporating the liquid. If you are rich, you can use an IR gun and check hotspots that way. Once the problematic chip is confirmed then attempt a replacement and check the short is gone.

Notes for a VRM Short

If you have found the issue is a VRM short, then I would highly advise replacing both the MOSFET driver and high + low side mosfets regardless of which one is acting up. Generally the driver will blow up with the high side. If you have a board with DRMOS then replace the entire IC.

PWR - No display but the computer has not exploded.

BEFORE ATTEMPTING TO POWER UP ALWAYS MAKE SURE THERE ARE NO SHORT CIRCUITS!!!

At this stage it’s safe to start troubleshooting what exactly is not powering up. With multimeter in voltage mode, I would recommend probing inductors for each of the power phases. The more experience you have the easier it is to locate the power rails without a schematic.

«To be continued»

>> Home