Giner | donlonisland

Project Engineer - Program Manager

We had the unique privilege of being a subcontractor for a subcontractor for NASA. Despite being that deep in the chain of custody, NASA was often directly interested in our work due to the importance of the devices we were making. I prepared the presentation for and was the primary presenter for our OGA Technical Process Review, OGA Critical Design Review II, and AOGA Critical Design Review attended by almost 100 technical experts and even executive-level folks from our customer, their customer, and NASA. These were some of the toughest rooms I have been in to date and even as one of the youngest people in the room, I managed to command respect through my technical knowledge, well-reasoned arguments, partnership with my teammates, and, most importantly, ability to graciously accept when we did not know something.

During my time at Giner, I served as program manager for the Oxygen Generating Assembly (OGA) and the Advanced Oxygen Generating Assembly (AOGA) programs. The OGA was largely a build-to-spec program while the AOGA was a development process that involved qualifying a new type of device. This was a technically challenging feat for the team we had, and we really had to build the metaphorical fighter jet while flying it (see the above about the QMS). Working on a system like this for service on the bleeding edge of where mankind can go and also having the confidence that we did a good job for the astronauts up there makes me proud. Below, you'll find a few examples of interesting problems I helped solve while on the team.

The OGA Cyclic Operation Test

At Giner, we were producing the Electrolysis Cell Stack - heart of the Oxygen Generating Assembly (OGA) for creating breathable oxygen for the metabolic loads (astronauts) on the International Space Station (ISS). During a pre-run for our acceptance testing (the final testing before the cell stack is shipped to the customer), we uncovered an insidious incompatibility between our test apparatus and the cell stack when running in the mode required by the test. To fully understand the issue and how we solved it, you have to first understand what the stack does at a high level and how the test stand interfaces with it.

In electrolyzers, water is presented to electrodes and is split into Hydrogen and Oxygen. Water typically flows through the electrolyzer to keep the cells flooded and to assist with the transport of one of the gas streams. In our case, we wanted gaseous oxygen, so the water flowed through the cathode side and carried Hydrogen gas away with it as seen in the figure to the right. Our test stand was designed to power the electrolyzer, manage water and gasses entering and exiting the stack, and measure pertinent state variables of the stack. These include, but are not limited to:

Cell voltages
True current supplied to the stack
Water flow rate through the stack
Back-pressure on the stack itself

A simplified, redacted diagram showing the relevant test stand components is shown below.

A simplified view of the water feed system in the test stand running the cell stack.

In Liquid Cathode Feed Proton Exchange Membrane (PEM) electrolysis, water enters the cell on the cathode side, travels across the membrane, splits into H+ and O-, and the H+ travels back across the membrane. This results in a gaseous Oxygen stream and gaseous Hydrogen in the exit water stream. This image is from Wikipedia, modified to illustrate liquid cathode feed rather than anode feed.

During the test, the cell stack was supplied with water from the reservoir (the Hydrogen-water separator), gas was produced, and the Hydrogen-saturated water returned to the reservoir with the gaseous Hydrogen to be separated. This gas builds pressure in that part of the system up to the nominal operating pressure of the stack. Once this pressure was attained, the pressure regulator vents excess gas out of the system. In this test, the stack was run at full capacity for a period of time, then put into a “sleep” mode where it produced much less gas. Furthermore, the test stand was equipped with a failsafe that turned the stack off if any number of error conditions were detected. In our case, the test stand would often shut itself off after it had been running these cycles for a while. This was a problem because this shut down would prevent us from passing the test and if that happened in front of our government witnesses, the program would be delayed and the paperwork nightmare would ensue.

Our experienced cell stack engineers in the lab jumped to conclusions about what they thought was wrong. Rightfully so, they had a solid understanding of how the system worked. I called the room to order, grabbed an empty cardboard box out of the trash (for some reason, this is a theme with me), and started with a single question to the group: what are we actually observing?

We were seeing the test stand shut down. Ok, why was the test stand shutting down? It said it was due to low water flow, and we verified in the data that the sensor did indeed report low water flow. I called attention to the distinction between the sensor reading low water flow, and there actually being low water flow. We pushed through the rest of the exercise in a similar manner (what I call “low-gear problem-solving”) while gathering opinions from the engineers who were far more experienced with these systems than I. Soon we had a sprawling fault-tree on the large, flattened cardboard box. We circled what we agreed were the most probable branches and brainstormed how we could “lay a trap” for the problem. The trap we devised consisted of many things, but I will highlight ruling out the following:

Cavitation on the pump suction side
The pump receiving the correct speed command
The sensor malfunctioning
Possible gas bubbles forming in parts of the system

After splitting the experiments between people, we got to work validating. After a day or two of experiments, we found out that when the stack decreased its electrolysis during the “sleep” part of the cycle it was decreasing the back pressure in the water separator enough that hydrogen was bubbling out of solution in the water line where the flow sensor was. We weren’t actually getting a lower flow than expected, the sensor was just reading the wrong flow because it was ultrasonic and not designed to handle two-phase-flow.

There were lots of possible solutions, but the one we implemented was supplementing the stack’s gas production with compressed Nitrogen. This kept the stack pressure constant during the full cycle instead of dropping during the “sleep” segment. With the Nitrogen back pressure assist, we were able to complete the cycles required by the test and passed with the government witnesses watching.

Prematurely jumping to solutions or even conclusions about what is happening can be a good shortcut, but when the problems are more insidious, it can be detrimental to problem-diagnosis and problem-solving. Sometimes taking a step back and doing it in “low-gear” while listening to the experts is the fastest way to do it.

The Mystery of the Non-conforming Holes

Throughout the production of the cell stack, we manufactured some non-conforming parts. This is to be expected, but manufacturing yield is an important aspect of any product. This anecdote is about an interesting manufacturing problem that I solved with some fun detective work that would not have been possible without empathy for the people at the machine shop we were working with. On the OGA cell stack (previously described in the analysis section above), we had a set of parts with a very long lead time (about 20 months all-in) that had holes drilled in them. The holes had an insane aspect ratio: 50 thou (0.050”) diameter, about 2” deep. In addition to the aspect ratio we had a tight True Position tolerance on the holes, applied to each hole individually. For those unfamiliar with True Position callouts, here is a quick refresher as it relates to holes:

A true position tolerance on a hole is a cylinder centered on the nominal axis of the hole of diameter noted in the callout. The measured axis of the hole must lie entirely within this positional tolerance zone.

In this case, I first wanted to understand - are the parts machined wrong, or measured wrong? I called up the metrologists and presented it as: “We’re seeing this weird issue and I’m really just trying to solve this puzzle. You guys know more about metrology than I do - can you help me investigate?” They were willing to help us out and did a blind re-measurement of a few parts after discussing how they were fixturing the parts, verifying which datums they were using, etc. We found that the parts were re-measured very close to the original CMM reports, which ruled out a measurement issue.

In parallel, I did a similar thing with the machine shop. After explaining our issue and asking for his experienced eye, I found that the head machinist was especially receptive to geeking out over nitty gritty machining details. So on a visit to their shop, I flipped over the stack of CMM summaries I had and started drawing all sorts of different ways to machine the parts. We discussed the pros and cons of each method of fixturing, drilling sequence, programming, etc. to arrive at a few “oh, I bet that could be a source of error here”.

Ultimately we decided to try the following as well as several other small process modifications:

Center drill, then drill on each hole individually
Programming the CNC to overshoot the return pass to remove any backlash on the first hole
“Pecking” in smaller depth increments to reduce the load on the bit
Treating the drillbits as relatively consumable (to guarantee you’re using a sharp bit)

After these modifications, most of the remaining batches of parts were >80% in-tolerance with some batches being nearly 100% in-tolerance. Making this whole weeks-long process less about “you did this wrong” and more about “let’s think about this interesting problem together” turned what could have been a door slammed in my face into a mutual respect between the head machinist and myself.

Note that because of the nature of the tolerance zone, as the hole gets longer axially, the allowable angular error decreases. For a hole of our aspect ratio, the angular deviation allowed was imperceptibly small. These holes were machined by the machine shop, then sent directly to measurement with a Coordinate Measurement Machine (CMM) where True Position was measured. The parts coming back from verification had something like a 10% in-tolerance rate after 3 batches of machined parts. This was clearly not acceptable.

At this point, it would have been easy to turn around, blame the machinists and tell them to do better because you’re paying them. However, that’s not my style, nor is it conducive to getting a real resolution - especially since this is the only Defense-contract-approved machine shop that is willing to do this job. Having empathy for the vendors we work with, treating them like an extension of our own team, is not just necessary to get results, it’s the right thing to do.