• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to identify decimal numbers in a free-text cell with a formula?

Rediska

Member
Hello again, help wanted!

Back to my validation sheet - How to identify decimal numbers in a free-text cell with a formula?
In a single cell series on numbers could be entered - "flip flap flop 2345, flip flap 2,9494" or "blim blum 1.234 fluff fluff".
I need to detect if any entered number are in the decimal format - comma or period - and give an error message.

Sajan, I have you formula for detecting a number, but I can't "convert it" to a number type.

I have a macro written and it works, but I need formula - with xlsx and xlsm formats will be a huge mess with different users.

Thanks!
Regards,
Rediska
 
Hello Rediska,
So, if you find a standalone number (example: 235), that would not be an error?
It would be an error only if it has a comma or decimal point (before, in the middle, or after)?
Please confirm.
 
Hi Sajan,
here it goes:

not error, to be ignored:
-if number is a part of an abbreviation (CY123NR) or number is with a letter prefix (V123);
-if comma or decimal point is after or before the whole rounded number (part of the phrase)
-semicol. to be ignored as well

errors:
- if comma or decimal point is within the number, including zero: 1.234 ; 12,345 ; 12345.9 ; 0.123 ; 123.0
- due to typo, it is possible that unit of measure is placed right after the number: 1,234NM or 1.349kT

And, of course - the explanation of the solution is very much appreciated.
Thanks! as usual.
 
Hi Rediska,
Your second set of samples and rules seem to contradict your first set of samples. I suggest you upload a file with representative data, along with expected outcomes for each row.

I interpreted your requirements as follows:
Anything of the format XX999XX is "good"
Anything of the format XX999,99XX is "good"
Anything of the format 9999 is "good"
Anything of the format 99.99 is "bad" (including 99.0)
Anything of the format ,99 or 99, or ,99, or .99 or 99. or .99. are "bad"

If your requirements match my interpretation, then you can try the following array formula for a string in cell A2:
=IF(LOOKUP(1,MMULT(IFERROR(N(ROUND(MOD(SUBSTITUTE(SUBSTITUTE(TRIM(MID(SUBSTITUTE(TRIM(A2)," ",REPT(" ",100)), (ROW(OFFSET(A$1,,,LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1))-1)*100+1, 100))&".1",{".",""},""),{",",""},""),1),1)={0,0.1}),0),{1;-1})),"Bad","Good")

enter with Ctrl + Shift + Enter

It should handle everything except when a unit of measure is placed after an invalid number.
Are there some letters that are acceptable (i.e. to be ignored) and some others that are not?

-Sajan.
 
Explanation for the above formula:
=IF(LOOKUP(1,MMULT(IFERROR(N(ROUND(MOD(SUBSTITUTE(SUBSTITUTE(TRIM(MID(SUBSTITUTE(TRIM(A2)," ",REPT(" ",100)), (ROW(OFFSET(A$1,,,LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1))-1)*100+1, 100))&".1",{".",""},""),{",",""},""),1),1)={0,0.1}),0),{1;-1})),"Bad","Good")

The segment in green was described in a previous post. (It creates an array of substrings.)

The formula adds ".1" to the array of strings so that if the substring is actually an integer number, we would be able to detect it later on, while being able to flag 123.0 as "bad"

Then the formula performs a series of substitutions to get rid of the commas (",") and periods ("."). The formula uses double-substitutions, with the second value an empty string. This is to preserve the original value as is. For example, 123 would become {"1231","123.1"}

The formula next uses MOD to determine the decimal portion. In the above example of 123, it would return {0,0.1}

Formula next compares the above result to {0,0.1}. Only an integer number should return {TRUE,TRUE}.
For example, if the substring was 123.1, it would have become {"12311","123.1.1"}. MOD would then have returned {0,#VALUE}, which becomes {TRUE,FALSE}

The formula then uses MMULT to flatten the array, with {0,0} and {1,1} both turned into 0s, while {1,0} is turned into 1.
The resulting array should have 0s for substrings that should be ignored, and 1s for substrings that need to be flagged.

Finally, LOOKUP is a quick way to locate a 1, which is used as part of an IF function to return "Bad" or "Good".


Cheers,
Sajan.
 
Hi Sajan,

Thanks!
Tested your formula, and here it is:

1 bla bla 100 Good
2 bla 3,00 and 3000 Bad
3 0 Good
4 above fg Good
5 sdfsd 400.99 Bad
6 and 5.999 dsfg Bad
7 1,000 kT jet Good
8 qwew 2.000 Bad
9 below 3,456 Good


It recognises the decimal point no problem, however for comma is different – for case #2 it recognises, but for #7 and #9 it doesn’t for some reason.

(and BTW, regarding contradicting rules - it goes by default for a female :eek:)
 
Hi Rediska,
Good thing you are testing this since you know what your data looks like! :)

Both 1,000 and 3,456 are valid Excel numbers. In the formula I add 0.1 at the end of numbers, but 1,000.1 is still a valid number! On the other hand 3,00 is not a valid number. that is the reason for the results you are observing.

As such, if this is a likely situation in your data, the formula needs to be tweaked.

Can you test the formula against other conditions and report back any conditions that are not working? We can then look at addressing them all together.

-Sajan.
 
Maybe there is a problem because of the use of the characters 32 and 160 in the text-string
 
I am assuming that the delimiter is a space (char #32). If that is not correct, replace space in the formula with the proper delimiter.
 
The following are the test strings I used... Incidentally, a number of the format #,### was not one of them! Goes to show the importance of good test cases!

Test String Result
One Two ,4567 3.45 Three 5678, 67,78 ,98,78, .3 .4.5 .6. .2,3.4 789 767868,1,1,1 34,345,4.4545,45.23 Bad
Xyz Two ,8567, .45 Three 0.0,0,1.0 789 Bad
One Two ,85,67, .79. Three Bad
One Two ,85,67, .79.8 Three Bad
One Two 85,67, 79.8 Three Bad
,123, 4.5 Bad
One Two Three Four Good
Two 234 More 123.0 Bad
One Two 2342.1 Bad
XX99XX Good
 
Sajan, more results:


bla bla 5,000 Good
6,000 kT Good


These are actual numeric formats occasionally entered (such as altitude in thousands of feet, or speed data), but commas still not detectable for some reason.
 
Hi Rediska,
Those numbers are following the same pattern as before. i.e. the formula is not recognizing a number where a comma is in a position recognized by Excel. As such, the following would not work with the current formula:
1,000
1,000,000
9,999
9,999,999
etc.


The solution is the same for all cases above. However, I will wait for your further test results to determine what else may need to be addressed.

-Sajan.
 
Hi Sajan,

well, then it's a bit of an issue for commas in our case.
Is there any way around it? For example, replace commas in numbers with whatever-other symbol (number sign?), and detect if this symbol is present?
It doesn't have to be in one shot, decision can takes up another column, no problem.
 
Hi Rediska,
Try the following formula:
=IF(LOOKUP(1,MMULT(N(ISNUMBER(0+SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A2),",","@"),".","@")," ",REPT(" ",100)),(ROW(OFFSET(A$1,,,LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1))-1)*100+1,100)),{"@",""},""))),{1;-1})),"Bad","Good")

Please note that when testing, if you have a cell with 1,000 (or something in that pattern), you will need to enter the number as text. (Otherwise, Excel stores the actual number, and uses the commas for display only.)

Cheers,
Sajan.

P.S. Not sure why the forum software colored a portion of the formula!
 
Good to hear! Thanks for the feedback. Glad to help!

By the way, what kind of data are you working with? I normally suggest to folks that if they can control the problems at the source, that would be best.

Cheers,
Sajan.
 
Well,

I am creating an aeronautical data input template for multiple users with build-in validation functions "on the fly", so basically it becomes a source data for the next step users.
There is no other way but initial manual entry.
And thanks to you, those last three headachy problems are solved.

Actually I have another question: formulas vs. VB & macros - what is "heavier" in terms of file size?

Thanks!
 
And in case anyone is interested in an explanation of the most recent formula.... here it is.
=IF(LOOKUP(1,MMULT(N(ISNUMBER(0+SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(TRIM(A2),",","@"),".","@")," ",REPT(" ",100)),(ROW(OFFSET(A$1,,,LEN(TRIM(A2))-LEN(SUBSTITUTE(A2," ",""))+1))-1)*100+1,100)),{"@",""},""))),{1;-1})),"Bad","Good")

It does look like "SUBSTITUTEs Gone Wild"!

Formula starts by creating an array of substrings.
Then it substitutes all instances of commas and decimal points with @.
Then it substitutes all instances of @ with "". However, in order to tell whether the original substring was an integer number, the substitution is made for {"@", ""}
So, if "1000" was the original substring, the result would be {1000,1000}
If the original substring was "1,000", the result would be {1000,1@000}

Formula then checks if the resulting array values are numbers. Using N(ISNUMBER()) to convert the TRUE/FALSE to 1/0, we get a 2-dimensional array of 1s and 0s. The pattern we need to flag is {1,0}. (i.e. patterns like {0,0} and {1,1} are to be ignored.) To get that, we use MMULT to convert the pairs to be ignored to 0s, and the pairs to keep to 1s.

Finally, the IF(LOOKUP(1,...)..) combination is used to display "Bad" if a value of 1 is found, and "Good" otherwise.

Cheers,
Sajan.
 
Regarding your question about file size due to formulas vs macros... I am not sure. I am sure it is possible to create a huge file with just macros or just formulas. I am sure one of the other forum members will chime in with some insights.
 
Good day Rediska

The forum has some very strong VBA wizards and if you post your question (number 19 ) a new (as many who have viewed this threat may not re-view) in the VBA and Macro forum I am sure you will get many interesting reply's.

By the way is
"flip flap flop 2345, flip flap 2,9494" or "blim blum 1.234 fluff fluff"
some sort of engineering speak in the aeronautical world......:p




.
 
Hi Rediska ,

Given the nature of Sajan's formulae , do you still think VBA is heavier ?!

Any decent ( you would not need a Yoda ! ) programmer could come up with data validation code which would not markedly differ from what you have described in plain English !

The number of lines in the code would certainly be more than in Sajan's formula , but is that a consideration ?

I shudder to think what would happen if six months from now , you need to modify the above formula to take care of additional constraints ; hopefully you can enter into a life-time maintenance contract with Sajan !

Narayan
 
Hi, Rediska!
When I (or anybody else) join Sajan, b(ut)ob(ut)hc and NARAYANK991 viewpoint, if I were you I wouldn't doubt at all and take the given advice.
In this case I wouldn't care at all about the heaviness of lightness of the file but of the recalculation and processing speed. It's absolutely preferable a few seconds after pressing a button than a lot of seconds or a few minutes waiting for recalculation.
Gigabytes are cheaper than Gigahertz :)
Regards!
PS: Yoda-Saj surely could build the monster-formula only monstrous workbook, and few -if any- better than him, so why not trying his approach for a test file with the equivalent data volume as the actual one and then decide which path to walkthrough?
 
Back
Top