Epyc Server Build
2023-07-19
It started with a message at work. We have a #homelab channel, and I should know better as it seems like every discussion there results in me getting something.
Well, now I have a new processor. I had not even finished drinking my morning coffee that day, which is probably why I found myself in this position.
The Epyc line of processors are insane for the amount of I/O you get. WikiChip has the details, but having 128 pci-e gen3 lanes is very cool for a storage based system, I can have multiple HBA’s and a 10Gb nic if I wanted too.
The only way this was about to be feasible was if I could score some ram. A friend of mine that I worked with at New Relic and was in the data center where New Relic has a sizable amount of hardware. When they decommission hardware, sometimes folks that work there get dibs.
With that care package on the way, I had to find a motherboard. A full sized ATX board fits in my current servers case, but it still had to be a standard non-server board.
That was my second eBay purchase, and it wasn’t cheap. A Supermicro Workstation board
Oh, and a cpu cooler. It can’t be just any cheap one either. I love Noctua fans, and they are quiet. So I picked up one of the few EPYC 7000 rated coolers, NH-U12STR4-SP3
By the way, Epyc processors need to be tightened to a very specific 14lbs/inch with a T20 bit.
Gotta get one of those now too…
You know, this sweet “deal” I got has become a costly project. My home server, which is a really old core2duo 2 core processor with 8GB of DDR2 RAM was doing alright.
But screw that! I’ve got 128GB now and 32 cores (64 with Hyper-threading enabled)! The minecraft server is gonna be amazing!
Truth is, I’ve always wanted to play around with FreeBSD’s hypervisor, bhyve. I’ve stuck with using iocage and jails because I lacked resources, but now I can live a little.
The Build
This was a dicey situation. I have parts that are used. I have no spare components for testing. As it turned out, I also had a bad power supply :/
Around Friday, all of my components had arrived. The perfect weekend project.
And day one was a few failures, one after another.
I went to two different hardware stores to try and find a torque driver. I didn’t want to buy on Amazon, I wanted to buy it from the local hardware store that has old men ambling around dispensing their wisdom about home projects.
While there, I couldn’t find what I was looking for. Just the large torque wrench for working on vehicles.
So I asked an old man, ambling around. The gist of the conversation went like this:
"Hey, do you carry any torque screwdrivers? I'm looking for inches per pound"
"Well we have these here Torx drivers"
"No, torque, you know like a car engine"
"We have this wrench here..."
"No, that's foot pounds, I need something more refined, inch pounds"
"Well I've never heard of that. Whats it for?"
"A computer part"
"Well now maybe you should try best buy. Or I bet Amazon"
"Yeah... I really wanted to shop local..."
So I went to home depot, found one fairly quickly, and added up the total cost of this project.
This Husky torque driver did NOT come with instructions, nor could I found a youtube clip of how to use it. I guess when you buy stuff like this, the assumption is you know what you are doing, or someone in your trade knows and will maybe show you how to operate it.
I DIDN’T KNOW OKAY! It has buttons with unclear labels! I thought I was setting it to 14 in-lbs, that’s what the display told me it was at! BUT IT NEVER BEEPED OR JIGGLED!
Long story short, a Epyc processor will not POST if you over-tighten the screws. It also will not post if you under-tighten the screws. When eventually did, was undo it all, and then take the torque driver in the garage, put a square bit in, and clamp that in my vice. I was going to figure out how to make it beep by golly!
When I figured it out, I went inside and PROPERLY applied the correct inch-pounds and ta-da! The power LED light on the board lit up!
Except it still wouldn’t boot. The IPMI interface also was not asking for a DHCP lease.
I gave up for the evening, as I was starting to suspect it was my spare power supply. The system would turn on for a few seconds, and then power off. kaput.
Look, I’ll be honest… I was kind of shitting bricks here. I had a whole lot of nothing to show for a few hundred dollars in eBay stuff. I wasn’t feeling super great. My plan the next day was to fully take down my server, and use the power supply in it, as its new and fairly nice. I thankfully never sold a cheap monitor one of the ids was using, and it has 1 VGA port and 1 HDMI port. This was very useful.
I was very excited that the system POST-ed with the better PSU.
From a previous project, when I redid all of my storage, I already had FreeBSD installed on an SSD. This made testing easier
I ran top while also running a make -j 64 buildworld
. I no longer upgrade with this method, but its a really good way to test your system
Putting it all back
I was able to get this in the chassis (except the lid… my heatsink is too tall), and plug in the ipmi interface and a 1Gbe interface.
The reason why the OOB interface never got a DHCP lease, was this system had a static IP assigned. Whoever owned this board last did not clear anything out. Oh the default password was there, but the IPMI IP Address was set, and the event log was not cleared. Oh the secrets you can tell…
At least I know how long it was in service.
System came up just fine, and it looks like its all there.
FreeBSD 13.2-RELEASE-p1 GENERIC amd64
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
VT(vga): resolution 640x480
CPU: AMD EPYC 7551 32-Core Processor (2000.21-MHz K8-class CPU)
Origin="AuthenticAMD" Id=0x800f12 Family=0x17 Model=0x1 Stepping=2
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
AMD Extended Feature Extensions ID EBX=0x1007<CLZERO,IRPerf,XSaveErPtr,IBPB>
SVM: NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
TSC: P-state invariant, performance statistics
real memory = 137438953472 (131072 MB)
avail memory = 133682778112 (127489 MB)
Oh my gosh but wait, it turns out, this supermicro board has Redfish!
Redfish API
I wrote a good amount of ruby code at new relic to control Dell servers remotely over the redfish api, so in a future post maybe I’ll do another Golang article.
Redfish is a Restful api to control and automate a physical system (no OS) using the out of band management interface. Dell has iDRAC, HP has iLO and Supermicro has … I don’t know, they don’t seem to have a name besides OOB or IPMI.
Until I can write up something more concrete this is what it looks like
CPU
// curl -k -u user:pass https://server-ipmi.home.michaelc.dev/redfish/v1/Systems/1/Processors/1
{
"@odata.context": "/redfish/v1/$metadata#Processor.Processor",
"@odata.type": "#Processor.v1_0_0.Processor",
"@odata.id": "/redfish/v1/Systems/1/Processors/1",
"Id": "1",
"Name": "Processor1",
"Description": "Processor",
"Socket": "CPU",
"Manufacturer": "Advanced Micro Devices, Inc.",
"Model": "AMD EPYC 7551 32-Core Processor ",
"MaxSpeedMHz": 3000,
"TotalCores": 32,
"TotalThreads": 64,
"ProcessorType": "CPU",
"ProcessorArchitecture": "x86",
"InstructionSet": "x86-64",
"ProcessorId": {
"VendorId": "GenuineIntel",
"IdentificationRegisters": "0x178BFBFF00800F12",
"EffectiveFamily": "0x17",
"EffectiveModel": "0x1",
"Step": "0x2"
},
"Status": {
"State": "Enabled",
"Health": "OK"
}
}
Memory:
// curl -k -u user:pass https://server-ipmi.home.michaelc.dev/redfish/v1/Systems/1/Memory
{
"@odata.context": "/redfish/v1/$metadata#MemoryCollection.MemoryCollection",
"@odata.type": "#MemoryCollection.MemoryCollection",
"@odata.id": "/redfish/v1/Systems/1/Memory",
"Name": "Memory Collection",
"Description": "Memory Collection",
"Members@odata.count": 8,
"Members": [
{
"@odata.id": "/redfish/v1/Systems/1/Memory/1"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/2"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/3"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/4"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/5"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/6"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/7"
},
{
"@odata.id": "/redfish/v1/Systems/1/Memory/8"
}
]
}
// curl -k -u user:pass https://server-ipmi.home.michaelc.dev/redfish/v1/Systems/1/Memory/1
{
"@odata.context": "/redfish/v1/$metadata#Memory.Memory",
"@odata.type": "#Memory.v1_1_0.Memory",
"@odata.id": "/redfish/v1/Systems/1/Memory/1",
"Id": "1",
"Name": "DIMMA1",
"RankCount": 1,
"Description": "Memory",
"CapacityMiB": 16384,
"DataWidthBits": 64,
"BusWidthBits": 72,
"MemoryMedia": [
"DRAM"
],
"MemoryType": "DRAM",
"MemoryDeviceType": "DDR4",
"OperatingSpeedMhz": 2133,
"AllowedSpeedsMHz": [
2133
],
"DeviceLocator": "DIMMA1",
"MemoryLocation": {
"Socket": 0,
"MemoryController": 0,
"Channel": 0,
"Slot": 0
},
"Manufacturer": "SK Hynix",
"SerialNumber": "108F6333",
"PartNumber": "HMA42GR7MFR4N-TF",
"Status": {
"State": "Enabled",
"Health": "OK"
}
}
Also, system information is available. Like the BIOS Version:
curl -k -u user:pass https://server-ipmi.home.michaelc.dev/redfish/v1/Systems/1 | jq .BiosVersion
"2.1"
There seems to be a 2.4 update available, I should do that before I poke around any further.
Oh man, but you know what? I hate using curl -k
, lets get rid of that silly self-signed expired cert. Cue hacking montage music:
> sudo -i
$ pkg -y install cfssl
$ zfs create data/ca.home.michaelc.dev
$ cd /data/ca.home.michaelc.dev
$ cat << "EOF > ca-csr.json
{
"CN": "Home Root Certificate Authority",
"key": {
"algo": "ecdsa",
"size": 384
},
"names": [
{
"C": "US",
"L": "WA",
"O": "Home MichaelC Dev Internal",
"ST": "Camas",
"OU": "Infra"
}
],
"ca": {
"expiry": "87600h"
}
}
EOF
$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca
$ mkdir intermediate
$ cat << EOF > intermediate-ca-csr.json
{
"CN": "Home Intermediate CA",
"key": {
"algo": "ecdsa",
"size": 384
},
"names": [
{
"C": "US",
"L": "WA",
"O": "Home MichaelC Dev Internal",
"ST": "Camas",
"OU": "Infra"
}
]
}
EOF
$ cfssl genkey intermediate/intermediate-csr.json | cfssljson -bare intermediate/intermediate-ca
$ cat << EOF > config.json
{
"signing": {
"default": {
"expiry": "43800h"
},
"profiles": {
"intermediate": {
"usages": [
"cert sign",
"crl sign"
],
"expiry": "70080h",
"ca_constraint": {
"is_ca": true,
"max_path_len": 1
}
},
"server": {
"expiry": "43800h",
"usages": [
"signing",
"digital signing",
"key encipherment",
"server auth"
]
},
"client": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
EOF
$ cfssl sign -ca ca/ca.pem -ca-key ca/ca-key.pem -config config.json -profile intermediate intermediate/intermediate-ca.csr | cfssljson -bare intermediate/intermediate-ca
$ mkdir certificates
$ cat << EOF > certificates/server-ipmi-csr.json
{
"CN": "server-ipmi.home.michaelc.dev",
"key": {
"algo": "rsa",
"size": 2048
},
"hosts": ["server-ipmi.home.michaelc.dev", "192.168.1.86"],
"names": [
{
"C": "US",
"L": "WA",
"O": "Home MichaelC Dev Internal",
"ST": "Camas",
"OU": "Hosts"
}
]
}
EOF
$ cfssl gencert -ca intermediate/intermediate-ca.pem -ca-key intermediate/intermediate-ca-key.pem -config config.json -profile server certificates/server-ipmi-csr.json | cfssljson -bare certificates/server-ipmi
$ mkdir /usr/local/etc/ssl/certs/ ; cp ca/ca.pem /usr/local/etc/ssl/certs/
$ cp intermediate/intermediate-ca.pem /usr/local/etc/ssl/certs/
$ certctl rehash
After installing the cert and key, we can curl without error. I can also use my handy tls-info go tool that I discussed last post:
//echo server-ipmi.home.michaelc.dev | ./tlsExpiry | jq
[
{
"fqdn": "server-ipmi.home.michaelc.dev",
"port": 443,
"dns_names": [
"server-ipmi.home.michaelc.dev"
],
"ip_addrs": [
"192.168.1.86"
],
"sni": "server-ipmi.home.michaelc.dev",
"version": "TLS 1.2",
"issuer": {
"Country": [
"US"
],
"Organization": [
"Home MichaelC Dev Internal"
],
"OrganizationalUnit": [
"Infra"
],
"Locality": [
"WA"
],
"Province": [
"Camas"
],
"StreetAddress": null,
"PostalCode": null,
"SerialNumber": "",
"CommonName": "Home Intermediate CA",
"Names": [
{
"Type": [
2,
5,
4,
6
],
"Value": "US"
},
{
"Type": [
2,
5,
4,
8
],
"Value": "Camas"
},
{
"Type": [
2,
5,
4,
7
],
"Value": "WA"
},
{
"Type": [
2,
5,
4,
10
],
"Value": "Home MichaelC Dev Internal"
},
{
"Type": [
2,
5,
4,
11
],
"Value": "Infra"
},
{
"Type": [
2,
5,
4,
3
],
"Value": "Home Intermediate CA"
}
],
"ExtraNames": null
},
"serial": "7709E19DB4EC6A45505547BA81D4109FD9D1E2A1",
"expiration": "2028-07-17T22:14:00Z",
"err": false,
"message": "Okay"
}
]
Special note: It appears that you cannot use a ecdsa cert with this particular ipmi interface. I had to re-create it as an rsa-2048 request, total bummer.
Wrap-up
I had a lot of ideas for this post, and then I realized, the build was a good chunk of my time and I didn’t get to scratch the surface with bhyve yet. So, that will come soon.