The Problem

Recently I needed to count how many particular HL7 segments were in a given column. Rather, I wanted to find those with 2+. I needed a solution to find how many occurrences of a substring were in a target column to be searched.

I was about to post a question on StackOverflow on this, but found the answer myself bit a bit of help from a few SQL related questions that I found while searching (How to count instances of a character in a SQL column). Funny how SO’s ‘search’ functionality doesn’t show the questions related, but when you're composing the question’s title, it does a MUCH better job.

Here’s the question I would have posted:

         Consider a varchar column or varchar variable holding a value like this:

              ABC123|foo|bar|ABC123|987|ABC123|123|DEF|456|ABC|

          The goal is to count the number of times ABC123| appears in that given string/column.

          How would you write the algorithm in T-SQL for SQL 2005 and/or 2008?

I would suggest that mostly you’d want to take this sort of task to a higher level language, rather than do it at the database level. Sometimes you have no choice, and a customer is asking for this kind of information in a short timeframe.

The Solution

The solution here is fairly easy.

  • define the substring you want to find.
  • find the length of the original string
  • REPLACE that string with a blank
  • subtract the length of the replaced string from the length of the original
  • the remainder is the number of characters that were removed
  • the number of substrings found is found by dividing by the length of the substring
DECLARE  @Subj         VARCHAR(6000),
         @Find         VARCHAR(10),
         @NumInstances INT

SELECT @Find = 'OBX|' --this is the string we want to count the occurrences of         
       ,@Subj = 'MSH|^~\&|LAB|TEST|TEST||20100302132036||ORU^R01|987654321|T|2.3 PID|1||Z000000167||TESTLAST^TESTFIRST^TESTMIDDLE||19100101|M|||^^^^|||||||ABC123 PID|1||0000000135|9028520613|KGH^ECPAT^^^^||19561203|F|^^^^^||99 TESTING STREET^^TEST^CA^90210||555-333-4444|||M|NV|ABC123|ZYX987| PV1|1|P|EM^^^.|||||||||||||||ER||HP|||||||||||||||||||.||PRE| ORC|SC|0203:U00006R300.0000|||CM||||201003021319|TEST^TESTER^DR^HELLO^^^||TESTING^TEST^DR^WORLD^^^| OBR|1|0203:U00006R300.0000|0203:U00006R|Some Test^L300.0000^00024940^24356-8^Some Test Name^|||201003021319||| ||||201003021319|| TEST^TESTER^DR^HELLO^^^||||||201003021320||LAB|F||| TEST^TESTER^DR^HELLO^^^| OBX|1|ST|L300.0200^Clarity^000z4940^32167-9^Clarity|1|Clear||Clear|N|||F|||201003021320| OBX|2|ST|L300.0700^Colour^000z4940^5778-6^Colour|1|Yellow||Yellow|N|||F|||201003021320| OBX|3|NM|L300.1300^XGravity^000z4940^5811-5^XGravity|1|1.006||1-10|N|||F|||201003021320| OBX|4|NM|L300.1400^pH^000z4940^5803-2^pH|1|5.0||5.0-9.0|N|||F|||201003021320| OBX|5|ST|L300.1500^Protein^000z4940^5804-0^Protein|1|Negative|g/L|N|N|||F|||201003021320| '

SELECT @NumInstances = (LEN(@Subj) - LEN(REPLACE(@Subj,@Find,''))) / LEN(@Find)

SELECT @NumInstances AS NumInstancesFound

 

It’d take a tiny bit of modification to convert the logic to perform the REPLACE/LEN statement on a column, rather than a table.

SELECT  myColumn
FROM   myTable
WHERE  (LEN(myColumn) - LEN(REPLACE(myColumn,@mySubstring,''))) / LEN(@mySubstring) > 1 --where we want to find 2 or more occurrences

Any better ways?

Having solved the problem in a fairly string-manipulative way, I wondered if there were any other ways to achieve this in the database layer. The other possibilities:

  • Write a CLR stored procedure. Here you could use your .NET Framework features to EASILY bust this out with a simple one-liner: String.Split() and Array.Length(). Better? Perhaps for maintainability or the abstraction of the logic. It’s a bit heavy for my tastes.
  • Set-based. Given that’s what SQL is actually for, you could perform some gymnastics on the column/variable, but the amount of code would be much more than the REPLACE/LEN solution above.

Posted in: code blowout , sql , datalayer  Tags:
Actions: E-mail | Permalink | Comments (0) |

Comments

Comments are closed