The Problem

Recently I needed to count how many particular HL7 segments were in a given column. Rather, I wanted to find those with 2+. I needed a solution to find how many occurrences of a substring were in a target column to be searched.

I was about to post a question on StackOverflow on this, but found the answer myself bit a bit of help from a few SQL related questions that I found while searching (How to count instances of a character in a SQL column). Funny how SO’s ‘search’ functionality doesn’t show the questions related, but when you're composing the question’s title, it does a MUCH better job.

Here’s the question I would have posted:

         Consider a varchar column or varchar variable holding a value like this:

              ABC123|foo|bar|ABC123|987|ABC123|123|DEF|456|ABC|

          The goal is to count the number of times ABC123| appears in that given string/column.

          How would you write the algorithm in T-SQL for SQL 2005 and/or 2008?

I would suggest that mostly you’d want to take this sort of task to a higher level language, rather than do it at the database level. Sometimes you have no choice, and a customer is asking for this kind of information in a short timeframe.

The Solution

The solution here is fairly easy.

  • define the substring you want to find.
  • find the length of the original string
  • REPLACE that string with a blank
  • subtract the length of the replaced string from the length of the original
  • the remainder is the number of characters that were removed
  • the number of substrings found is found by dividing by the length of the substring
DECLARE  @Subj         VARCHAR(6000),
         @Find         VARCHAR(10),
         @NumInstances INT

SELECT @Find = 'OBX|' --this is the string we want to count the occurrences of         
       ,@Subj = 'MSH|^~\&|LAB|TEST|TEST||20100302132036||ORU^R01|987654321|T|2.3 PID|1||Z000000167||TESTLAST^TESTFIRST^TESTMIDDLE||19100101|M|||^^^^|||||||ABC123 PID|1||0000000135|9028520613|KGH^ECPAT^^^^||19561203|F|^^^^^||99 TESTING STREET^^TEST^CA^90210||555-333-4444|||M|NV|ABC123|ZYX987| PV1|1|P|EM^^^.|||||||||||||||ER||HP|||||||||||||||||||.||PRE| ORC|SC|0203:U00006R300.0000|||CM||||201003021319|TEST^TESTER^DR^HELLO^^^||TESTING^TEST^DR^WORLD^^^| OBR|1|0203:U00006R300.0000|0203:U00006R|Some Test^L300.0000^00024940^24356-8^Some Test Name^|||201003021319||| ||||201003021319|| TEST^TESTER^DR^HELLO^^^||||||201003021320||LAB|F||| TEST^TESTER^DR^HELLO^^^| OBX|1|ST|L300.0200^Clarity^000z4940^32167-9^Clarity|1|Clear||Clear|N|||F|||201003021320| OBX|2|ST|L300.0700^Colour^000z4940^5778-6^Colour|1|Yellow||Yellow|N|||F|||201003021320| OBX|3|NM|L300.1300^XGravity^000z4940^5811-5^XGravity|1|1.006||1-10|N|||F|||201003021320| OBX|4|NM|L300.1400^pH^000z4940^5803-2^pH|1|5.0||5.0-9.0|N|||F|||201003021320| OBX|5|ST|L300.1500^Protein^000z4940^5804-0^Protein|1|Negative|g/L|N|N|||F|||201003021320| '

SELECT @NumInstances = (LEN(@Subj) - LEN(REPLACE(@Subj,@Find,''))) / LEN(@Find)

SELECT @NumInstances AS NumInstancesFound

 

It’d take a tiny bit of modification to convert the logic to perform the REPLACE/LEN statement on a column, rather than a table.

SELECT  myColumn
FROM   myTable
WHERE  (LEN(myColumn) - LEN(REPLACE(myColumn,@mySubstring,''))) / LEN(@mySubstring) > 1 --where we want to find 2 or more occurrences

Any better ways?

Having solved the problem in a fairly string-manipulative way, I wondered if there were any other ways to achieve this in the database layer. The other possibilities:

  • Write a CLR stored procedure. Here you could use your .NET Framework features to EASILY bust this out with a simple one-liner: String.Split() and Array.Length(). Better? Perhaps for maintainability or the abstraction of the logic. It’s a bit heavy for my tastes.
  • Set-based. Given that’s what SQL is actually for, you could perform some gymnastics on the column/variable, but the amount of code would be much more than the REPLACE/LEN solution above.

Posted in: code blowout , sql , datalayer  Tags:
Actions: E-mail | Permalink | Comments (0) |
Phil posted on June 17, 2009 09:29

A new project has me writing up the same old Country/State/Province reference tables. My feeling is that these static (fairly static) entities should be normalized and referenced by foreign key. I had asked a StackOverflow question on whether other developers had this prebuilt set of country/state/province create scripts in their toolbelt.

This is a code garage sale! Never again worry about creating and loading country and province/state data.

Create The Schema

Create your Country and State tables with this CREATE SQL script. As always, name the State table whatever you like (ProvState, tblState, whathaveyou).  Some folks don't like table names to be the same as reserved keywords.

country_state_erd

Populating

Here is a collection of insert scripts to get the data populated quickly for you.

Have Fun

Some developers go further down the normalization path by creating a City table, but I usually pass on that. As always, that decision is largely dependant on the problem domain or task at hand.

Yippee for me – that’ll be fun trying to reconcile data when someone enters their city location incorrectly as “St. Paul” / “Tuscon” / “Pittsburg” instead of Saint Paul / Tucson/ Pittsburgh.

If you would like to contribute an insert script or two for a country that you would like to see here, just contact me with your script!


Posted in: code blowout , sql  Tags:
Actions: E-mail | Permalink | Comments (0) |