See also: asking a friend for help can save oodles of time and effort.
Around Thanksgiving 2022, an friend of mine asked to talk through a problem he thought might be solvable with Powershell, but he'd been stuck on the design. Naturally, I agreed to help out if/where I could and it provided an opportunity to chat via Zoom which I am not one to turn down.
In this case, there's a relatively free-form bunch of text received from a person/process whose behavior is unlikely to change. A couple of things that are consistent in this data:
- Each line is always a separate record;
- Each field in a line is separated by an arbitrary number of spaces, and
sometimes often includes additional fields not necessary for downstream usage; and
- The field in question (for email address) is always the last field in a record/row.
Since this downstream process doesn't need/care about anything but the email address, the question was posed:
How can I use Powershell to extract just the email address of every row/record/line?
Blinded by Complexity
My friend had thought about ways to deal with this but had quickly encountered what I often do when tackling such a problem: over-complicating the process in question. I mentioned a few ways I'd been manipulating string data with Powershell...most of which weren't a 1:1 fit for this case but would get someone in the neighborhood of a viable solution and probably be too complex.
One such solution was to bypass the input being received from the upstream party and pulling directly. Effective, but as I recall the process is run on such an ad-hoc basis it seemed like the wrong problem to solve by circumventing the situation and possibly taking on more technical debt/responsibility. The real problem in this case was to eliminate manual post-processing for the downstream process.
To that end there was some discussion about using proper cmdlets in Powershell to do some serious string manipulation. While those absolutely have their place, for something consistent and simple like this using those cmdlets and piping output between them would have made the solution more difficult to understand...which leads to...
Powershell is no different than other languages in that it has strong and baked-in text manipulation...but it isn't (to me) as intuitive as some other languages due to the function/module/method/cmdlet naming (verb-noun) convention. However, there are "shortcuts" available in the form of built-in operators, which is exactly how we can simplify an problem like the one at hand.
For this example each row would be assigned to the
$string variable via loop process and has a format similar to this:
$string = "Person Last First email@example.com"
In the end, this simple regular expression one-liner did exactly the trick:
$($string -replace '\s+', ' ').split()[-1]
Using the one-liner quickly returns just the email portion of the record:
Breaking It Down
To explain the one-liner, we start by using Powershell's ability to do an in-line
-replace and take out/account for n>1 spaces between fields by replacing multiple instances of the "space" character with just one:
$string -replace '\s+', ' '
We then use the
.split() operator to split the string into its distinct data points, using the default delimiter (space):
$($string -replace '\s+', ' ').split()
Since the field/data in question is always last in a record, we use
[-1] to grab the last instance from the previous operation:
$($string -replace '\s+', ' ').split()[-1]
There's a Place for One-Liners!
In the tech community there's a love-hate relationship with one-liners. Some folks love them, and others think they only exist to encourage unnecessarily "creative" (read: complex) solutions that are difficult to read and maintain. I think both of those things can be true, but like the ternary operator when used judiciously and for the right purpose (and not just to be "cute" or "smart") one-liners can be both easily readable and highly effective.
And I got to help a friend solve a weird situation in a language he's not super familiar with.
Unrelated: I Took a Break
This is my first post in quite some time (since late October). Some of the stuff I'd queued up to write about just hadn't fully marinated/baked, and I was running out of time/energy to tinker the last 20%. So that stuff is still in the machine and will publish as I finish them out. I've been spending a bunch of time working in Home Assistant and doing some fun 3D printing, though, and on the encouragement of some folks I intend to post about some of those adventures over the next couple of months!