• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Automatically Detecting Common number errors between two lists

Gordon

New Member
One idea I have had is to have a way of automatically detecting transposed digits in bank statement reference numbers, or numbers wrongly entered by those with poor eyesight.


Most errors where statement reference numbers do not match with account numbers are likely to be caused by two numbers transposed in the reference number, or astigmatism caused wrong number entry (i.e. a 6 or 0 for an 8, a 1 or 9 for a 7 - most other numbers are so unique in form they are hard to misread with poor eyesight)


To detect transposed digits between to numbers I understand you can check if pairs in the number are transposed when compared to a reference number as the difference between them will be able to be exactly divided by 9 (http://www.reallifeaccounting.com/pubs/Article_Theme_Detecting_Accounting_Errors.pdf)


So I am looking to be able to compare two arrays of numbers and split it into four lists

1) Where a number in one array exactly matches a number in the other array

2) Where a number in one array does not exactly match any number in the other array but has a "poor eyesight error" (a 6 instead of an 8 etc)with a number in the other array.

3) Where a number in one array does not exactly match any number in the other array but has a transposition in one pair of digits between the two numbers.

4) Where a number on one array does not match in situations 1,2 & 3 and the resolution can not be assisted by automation.


Might be a useful exercise for the experts on the site to say how to best do this.

As I work though this will let you know, would be very useful for all accountants and banking administrator types out there. But I am a real beginner compared to most people here.


Thanks and Regards
 
Hi Gordon ,


Given the number of possibilities , I don't think this error check can be implemented using only formulae. Are you comfortable with VBA , if that is used to get a solution ?


You need to specify the maximum number of digits that will be met in practice.


Also , it is not quite so practical as it seems ; for instance if one of the numbers in one list is 7654 , while the other is 76545 , are these two to be flagged ?


Assume that one list of numbers goes from cell A5 through A500 ; the other goes from B7 through B502 ; are we supposed to use a positional basis to check the numbers ? Is the number in cell A5 to be checked only with its corresponding entry , which is in cell B7 , or is it to be checked with all the numbers B7 through B502 ? Is it not possible that you can have perfectly valid numbers which are transpositions or matches or ... ? Are all these to be flagged ? You may end up with longer lists of possible errors , which may be more time consuming to cross-check !


Can you upload a sample worksheet to give an idea of the volume of data ? With all the checks having to be run for every item of data , the run time of such a procedure may be unacceptably long.


Narayan
 
Thanks for the reply Narayan.


No real list yet, more exploring what could be done using VBA. I suppose that a first scan would easily eliminate all matches, then a more intensive analysis could be done on what does not match. But you are right, if one list contained tens of thousands of numbers the working could take a long time. But still would be faster than looking up items manually.


I would assume that all the numbers would have the same count of digits (i.e. Account numbers all 7 digit) so any that are more or less automatically to list 4.


Really at this stage just an intellectual exercise, I have no practical data to work with.


Marvelous help - thank you
 
Hi Gordon ,


Thanks for the clarification.


Are you interested in a solution with just the 4 rules that you have mentioned ?


Narayan
 
Hi Gordon ,


With reference to your rules , rule 1 and rule 4 are no problem ! Rule 1 is where there is an exact match between two numbers in the two lists ; but you need to clarify my earlier doubt viz. when checking for a match between the two lists , do we go by positional matching , or do we have to check each number from the first list with all the numbers in the second list ? The latter can be very time consuming since if there 1000 numbers in each list , we will have nearly a million checks to be done.


Rule 4 is that if a number from the first list fails to match any of the Rules 1 , 2 or 3 , then it comes under Rule 4.


Rule 2 states that where a number from list 1 does not come under Rule 1 , can it be made to match a number from list 2 merely by changing one digit ? This will call for AI ! The "eyesight error" that you have mentioned needs to be precisely defined ; is this done in the link you have given ? Which digits can be substituted for which other digits ? For instance , a number like 1234567 can be matched with 1234597 or 1234587 , substituting 6 for 9 or 8 ; but it can also be matched with 1234557 substituting 6 for the second 5. What are the exact substitution rules that will be gone through in checking for numbers under Rule 2 ?


Rule 3 is even more complex ! 1234567 can be matched with either 1243567 or 1234657 or any other such combination ; can these also be exactly defined ?


If all the possibilities are enumerated , then a programming solution can certainly be worked out.


Narayan
 
Greetings Narayan


My thoughts are that the process would only work with a check for each number from the first list with all the numbers in the second list - slow but faster than a manual lookup process.


In practice for most accounting work the matching would not be too onerous, as the source accounts list would be filtered to only have active (i.e money owing or owed)items at the time the scan is done.


Rule 2 could find multiple "possibles" as you state, in practice there should not be too many matches as only active accounts are matched, and when the solution has done it's work a second check would be done on money owing or owed to focus into likely account. At that point "real rather than artifical intelligence" would need to be applied.


For Rule 3 the same considerations apply - the matching is against a subset that only contains active accounts so duplicate matches should be few.


In practice it would be very useful to have account numbers that are unique and must adhere to a check algorithm (i.e. the account number is nine digits long, each pair of the first eight numbers must add up to 9 and the final digit is a checksum) - this would make such work very easy as it would focus in to exactly where the error was located in a non-matching number


Thanks and Regards
 
Hi Gordon ,


The following link gives a VBA procedure for calculating a checksum ; do you have any use for it ?


http://www.excelforum.com/excel-programming/482390-excel-vba-calculating-checksums.html


Narayan
 
Hi Gordon ,


The following link gives a VBA procedure for calculating checksums ; you may find it useful.


http://www.excelforum.com/excel-programming/482390-excel-vba-calculating-checksums.html


Narayan
 
Back
Top