Writing about code

In the long run, one of the things I will likely write most about is scripting and coding wih PowerShell and Python. Alongside this effort, I am in the process of updating most of my scripts and adding them to public repositories to share. I learned a long time ago that the best way to know something is to:

  • Learn it
  • Do it
  • Teach (or share) it

Sharing things I have learned has probably taught me as much about them as discovering and using them

Today’s topic: HashSets

I have often used hashtables to filter out duplicate values. For example if we start with the following strings:

$thingsTheySaid = @(
    "His first name is Johnny."
    "Johnny's last name is Doe."
    "Johnny is not an idiot."
    "Tess' favorite color is purple, but does'nt like green!"
    "'Tess' loves Johnny's cooking!"
)

Create some RegEx to just get words

$wordRgx = @"
^\'(?<aWord>.+)\'$|^\"(?<aWord>.+)\"$|^(?<aWord>[A-Za-z]+(\'[A-Za-z]*)?).*$
"@

The do some splitting and matching before dropping the words in a hashtable we can get a list of every unique word along with how often they appear with something like

$uniqueWordHashtable = @{}

foreach ($thing in ($thingsTheySaid -split (" "))) {
    if ($thing -match $wordRgx) {
        if ($uniqueWordHashtable["$($Matches["aWord"])"]) {
            $uniqueWordHashtable["$($Matches["aWord"])"]++
        } else {
            $uniqueWordHashtable["$($Matches["aWord"])"] = 1
        }
    }
}

$uniqueWordHashtable

That’s all well and good when additional information (like how often the words appear) can be useful but sometimes I only need to know if data appears and do not need any more information. I used to do something like:

$uniqueWordHashtable["$($Matches["aWord"])"] = [string]::empy

Or sometimes used $true, $false, or 0 just to have some data to match that key. Not awful when I’m first getting the data but extra work if I already had the data, such as if I’d done:

$allWords = foreach ($thing in ($thingsTheySaid -split (" "))) {
    if ($thing -match $wordRgx) {
        "$($Matches["aWord"])"
    }
}

Giving me data including duplicates. In order to get unique values I either have to rewrite my code to use a hashtable or use an alternate solution, like a .Net hashset. Going the hashteble route I would have to loop through the $allWords string array to try to add key/value pairs. I could do something similar with a hashset if I really wanted to or I could just cast it as a string hashset:

[System.Collections.Generic.HashSet[string]]$allWords

or, if I wanted to play it safe

$allWords -as [System.Collections.Generic.HashSet[string]]

I’m partial to the casting route because you never know when all the words, including suplicates, may prove useful. No code is ever truly wasted.

All put together it would look something like:

$wordRgx = @"
^\'(?<thisWord>.+)\'$|^\"(?<thisWord>.+)\"$|^(?<thisWord>[A-Za-z]+(\'[A-Za-z]*)?).*$
"@

$thingsTheySaid = @(
    "His first name is Johnny."
    "Johnny's last name is Doe."
    "Johnny is not an idiot."
    "Tess' favorite color is purple, but does'nt like green!"
    "'Tess' loves Johnny's cooking!"
)

$allWords = foreach ($thing in ($thingsTheySaid -split (" "))) {
    if ($thing -match $wordRgx) {
        "$($Matches["thisWord"])"
    }
}

$allWords -as [System.Collections.Generic.HashSet[string]]

In closing

Writing exercise over. Much lived evrything else I’ve been writing, it took much longer than I liked, but it didn’t turn out too bad. It should get easier with practice.

As alway, keep playing, exploring, and learning…and of course, dont forget to enjoy the day.

2023

My freedom cane

3 minute read

Over the course of my career I have had the chance to work with, and support, some very high tech solutions. However, it was a piece of very low tech that ha...

Writing about code

2 minute read

In the long run, one of the things I will likely write most about is scripting and coding wih PowerShell and Python. Alongside this effort, I am in the proce...

Writing about code

2 minute read

In the long run, one of the things I will likely write most about is scripting and coding wih PowerShell and Python. Alongside this effort, I am in the proce...

Site Relaunch

less than 1 minute read

Welcome to my new site and please excuse the mess. As you can see it is a work in progress. It will improve over time as I learn more about what I’m doing an...

Back to top ↑

2022

Welcome to Jekyll!

less than 1 minute read

You’ll find this post in your _posts directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different wa...

Back to top ↑