I wanted to make this a post yelling at a manufacturer, but - while they definitely are on the blacklist for this one - its also a nice documentation of where i fucked up.
So I'm writing this in the hopes someone doesn't make the same mistakes i do.
A few folks at $place asked if someone could look over the software of $project. I am using $-notation here cause i do not want to make things harder for them right now.
I jumped on the chance - why not.
And then, probably also me, fucked up. Things were likely bad already (a mounted Raspberry PI Zero would not boot reliably, and something was off on the I2C that was supposed to talk to an IR camera). Somewhere between us debugging why the raspberry pi didn't boot and some I2C IR cam behaving a bit weird, someone - and lets be honest, it was probably me: I'm a clumsy idiot - semi-fried an stm32. If it actually was me, it was likely while probing and shorting 3.3V to 12V. If it was someone else? Who knows what happens, there was a lot of heavy testing.
So, lesson one: Don't stick your probes into places unprotected. And have people double check your probing before you turn stuff on.
Ok, board is dead, order a new one, right? Well. This board was what we call "backup flight hardware". Its not the board that should go into the thing that flies - probably. But it is the one where you validate all the code of the thing that flies - less a problem if you fry stuff (huzzah! right?). And its also the board you replace the actual flight hardware with if anything goes wrong (oh no). And things go wrong all the time, so you 100% need this backup to be there.
This incident occurred with ~1 week of time left in $project before $flight_campaign. So there needs to be a replacement board yesterday.
So, we thought, 1 workday+shipping PCB services exist and our usual board house - Multi-CB - has one! So money was thrown around, board was in the mail on time, and all is well, right?
Well, except that $project is a student project, and its the middle of the exam phase, and everyone also has a job - you get the idea. The board was ordered fast cause work will happen in the few free hours that are available every evening, by volunteers. They did not, and i don't blame them, see the issues right away.
Lesson two: If you fuck up, make time to help with the clean up.
I should have reviewed the boards immediately when they arrived. The issues down would have shown a lot earlier then.
I'm not perfect by any means (see above!), but i do have enough experience that I'd probably caught on that something was off. Or not. I might have fried it again. Oh well.
Anyway, lets fast forward to yesterday (Thursday) where i read in our chat that (paraphrased): "One board seems to work, but the second has a short somewhere".
All is good, right? One board is still good, right? This is the part where i start complaining about Multi-CB - a board house that is supposed to be quite good. They praise themself with a free e-check on everything >= 2 layers even!
This here i took yesterday evening when i realized something was wrong. I didn't have time to look at it further right then and there. This is bad:
This is the one that "works". It probably does. I do not feel good looking at this. The places you around the vias that look copper colored? That is just copper. There is none, or barely any, solder mask there. And the mask is very thin in general. I was told that soldering this board (first in the relfow oven, then connectors by hand) was hell. I can see that - on of the jobs of the solder mask is to prevent bridging, there is barely any in so many places!
A small aside/speculation here: The connection between the layers via vias is probably done by electroplating them. This might result in the copper near the vias being just a bit thicker than everywhere else - and thus the vias not getting any of the already way-to-thin solder mask. Notice also how at the edges of some copper surface the mask seems to thin out. This also happens around pads - which is why soldering becomes hell. Anyway, moving on.
So today (Friday) i looked at the boards again, and holy hell. The "it works" board continues to work and lets pray that it continues to do so.
This is what i noticed after cleaning the "its shorted" board a bit. #
There is a closeup of the connector there as well:
That looks odd, right? Lets scratch this away under the microscope!
The big blob? That copper that is not meant to be there :/ Cutting this like you see removed one of the shorts - not the big power to ground one but at least one.
Small one is also copper, this shorts some signal net to ground. I stopped cutting away there, this board is not something i wanna use.
Free e-check my ass! This board should not have passed visual inspection!
Lesson three is: Don't order from Multi-CB. This is not the first fuck up that happened to us, and not the first one that you can read about either.
We could ask for a refund/rerun with another 1 work day production time. But there is no time left. There now is a board that probably works, just looks a bit shite and needs double and triple checking for any missing connections and shorts.
The reason i went in to probe shit in the first place is because i noticed a pressure sensor didn't reply, and checking it's supply showed something like 2.7V. So everything was already fried by this point, right? RIGHT? Probably Copium on my end, who am i kidding.
$People in $project asked for help, i fixed/explained some software stuff successfully (at least that), and then made everything else much much worse for them anyway.
I'm sorry. I was supposed to be the experienced one helping out and likely i fucked up. I'll do better next time.