Post is modified on an ongoing basis, last updated 202508, images TBC
March 2020. The onset of the coronavirus results in a sudden demand for computer graphics cards as people are confined to their homes and seek entertainment from their computers. While most are locked out of the new GPU market with prices rising more than 200%, a few in the repair community saw a chance to start creating some value out of junk on the used market. From here begins my long journey into the electronics repair rabbit hole.
PCIe Knowledge Check
Standard PCIe Supply Voltages
On most PCIe cards, regardless of their peripheral type, a series of pins on the PCIe gold finger provide voltages direct from PC (ATX or otherwise) power for:
- 12V (PCIe finger)
- 3.3V
High power peripherals generally receive majority of their 12V power via the PCIe 6-Pin or 8-Pin finger but there are other defined standards for additional power:
- PCIe 6-Pin
~75W
(as the middle pin is not guaranteed to provide power.) - PCIe 8-Pin
~160-252W
- Note the last 2 pins for the extension are sense pins to indicate that power can actually be drawn from the middle 12V wire
- EPS 12V
~336W
, often found on datacenter GPUs (DO NOT CONFUSE THIS WITH the PCIe 8-PIN!!)
PCIe Lanes
PCIe uses 1 to 16 lanes initialized from the leftmost side of the slot to the rightmost. Bifurcation and lane reversal is another rabbit hole which I will need a whole new article to cover.
GPU Power Up Basics
On board voltage rails
On standard PCI Express based graphics cards, the modern GPU die is rather complex with multiple voltage rails. On board rails power up sequentially and only after signals from the other voltage rails are received.
We established above that there are only 2 voltages coming into the GPU: 12V and 3.3V. From these, generally auxillary voltage rails are generated as below
- 1.8V (through 3.3V rail)
- 5V (through the 12V rail)
For critical GPU voltage rails we have:
- PCIe Rail (PEXVDD)
1.2V
- Memory Voltage (Frame Buffer FBVDD)
~1.5V
- Core Voltage (NVVDD)
~1V
For the purposes of repair, the control circuit and MOSFETs for a power phase will have:
- ENABLE (EN)
- PGOOD (Power Good)
- Input Voltage
- Output Voltage
Obviously for each type of chip this is not always the case and there may be more steps to the power up process.
On Maxwell GPUs for example, the PGOOD signal from the memory rail would be supplied into the enable for the NVVDD PWM Controller.
Resistance measurements
All voltage rails mentioned above should generally have a resistance above 50 ohms EXCEPT for NVVDD which has a resistance low enough that there isn’t a useful reading out of a multimeter.
Notes from typical resistance measurements I’ve seen:
- 12V
1-3kohms
- 3.3V
1kohms
- 5V
>1kohm
- PEXVDD
100ohms
- FBVDD
30-100ohms
- NVVDD
0-1ohms
Since core resistance is tiny, it’s much more useful to measure a short between 12V and the NVVDD VRM which we will cover below.
Core and Frame Buffer Voltage Generation
GPU cores have such low resistances that the power going through them needs to be supplied with a large amount of current by ohm’s law to get the power targets for high intensity workloads like gaming. This means currents could be anywhere from 50-100A at 1V through the core. To supply such current, we need optimized VRMs (voltage regulation modules) usually of 1 or more phases.
Per Phase Design
A single phase in a PWM is basically a circuit which takes in 12V and switches it at such a frequency (usually a 10-1 duty cycle) as to create pulses of 12V. This is then smoothed out through an inductor which essentially averages out the signal into a 1.2V supply. Of course a single phase is not going to have stable voltage suitable for complex integrated circuits so more phases will bring stability and higher current capability.
Dedicated power transistors are used to switch the input voltage on or off rapidly and are known as MOSFETs in most VRMs.
For the switching segment of a VRM phase, the essential elements are:
- 1 high side MOSFET for 12V to output
- 1 (or more) low side MOSFET for output to ground
- 1 MOSFET driver which can connect and switch both the high and low side mosfets with a PWM signal input
In modern GPUs with more than 10 phases, you will often see all 3 components combined into a DRMOS or driver + MOSFET combo. It is basically a power IC that does the function of everything above and is more efficient + uses less space.
A PWM controller is used to control the output voltage to the GPU.
Let’s use a common example of a core VRM with 12V input.
- The high side ON, low side OFF. The current flows from 12V -> Inductor -> High Side MOSFET -> Inductor -> Capacitor (often) -> CORE. Voltage increases slowly due to inductor.
- The high side OFF, low side ON. Voltage on the phase drops as it flows through low side MOSFET to GND.
The PWM controller will control the high and low side operation with the form of EN signals to each MOSFET driver which switches it on or off.
Troubleshooting
With the above information we can now outline the basic steps to troubleshoot a broken GPU.
The Tools
- Multimeter (preferably two). One that has a good continuitiy mode, and can measure resistance and voltage fast.
- Soldering iron with flat tip and point tip.
- Preheater or proper BGA rework station.
- Hot air gun.
- DC Bench power supply that can adjust voltage down below 1V
PWR - PC stays off or explodes (unlucky) when GPU installed
This means there is a short on the primary voltage rails (12V or 3.3V). Modern power supplies should prevent startup with over current protection but you can trip fuses if it does not so always measure resistances first.
GPU short can be confirmed with a simple multimeter. There are a few outcomes
- Short on the PCIe 6/8 Pin Rail - Almost always a VRM phase on memory or core. Check out this video for an example of how to fix this. Else follow 12V troubleshooting below.
- Short on the 12V PCIe finger - On budget GPUs, some of the phases such as memory or core may also come from the 12V finger so follow the steps above as a first try. Else follow 12V troubleshooting below.
- Short on the 3.3V Rail - Usually some minor logic IC has some problems but you may be unlucky.
Troubleshooting a short can be easy or hard depending on the tools you have. For budget, you can use isopropyl alcohol and inject voltage through the 12V with low current with a bench power supply to see what is heating up and evaporating the liquid. If you are rich, you can use an IR gun and check hotspots that way. Once the problematic chip is confirmed then attempt a replacement and check the short is gone.
Notes for a VRM Short
If you have found the issue is a VRM short, then I would highly advise replacing both the MOSFET driver and high + low side mosfets regardless of which one is acting up. Generally the driver will blow up with the high side. If you have a board with DRMOS then replace the entire IC.
PWR - No display but the computer has not exploded.
BEFORE ATTEMPTING TO POWER UP ALWAYS MAKE SURE THERE ARE NO SHORT CIRCUITS!!!
At this stage it’s safe to start troubleshooting what exactly is not powering up. With multimeter in voltage mode, I would recommend probing inductors for each of the power phases. The more experience you have the easier it is to locate the power rails without a schematic.
«To be continued»
>> Home