• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Data Duplicate

inbp

Member
Hi Friends!


I have a file with 1 coulumn and 400,000 rows.


Its bar code file. Our data base of bar codes is regularly updating. today is 400,000, data after 2 days it will be 400,200. My issues is when i use duplicate formula it works, but when i sort the data by color, it hang my system, as file have data in 800,200 rows.


My purpose is to know only new 200 bar codes. but it is very tough for me to get.


Regards,

Muhammad Shakeel
 
Hi Shakeel,


I do not understand the term "bar codes".


However, what are doing with these new "200 bar codes" that are added each day?


Can you upload a sample file here and explain your problem in a bit more detailed manner?


http://chandoo.org/forums/topic/posting-a-sample-workbook


Kaushik
 
These are updated on daily basis....


You can say i have a list of 400,000 Unique Names, New unique Names added on daily basis...i only want to know about new added names by using duplicate formula or anything else


Barcode

L123451988

L134568970
 
Hi Shakeel,


Can you please try the below for me..

[pre]
Code:
Sub RemoveDuplicateFewLast()
rdfrom = InputBox("alredy Duplicate Removed, upto row # " & _
Chr(10) & "Before this row # All are fine" & _
Chr(10) & "I swear.. there was no any duplicate befor this point & I have only One Column ", "My Gurantee")
LR = Range("A" & Rows.Count).End(xlUp).Row
For I = 1 To LR - rdfrom
If InStr(Deb & ",", "," & Cells(rdfrom + I, 1) & ",") = 0 Then Deb = Deb & "," & Cells(rdfrom + I, 1)
Next I
RAJ = Split(Mid(Deb, 2), ",")
Range("A" & rdfrom & ":A" & LR).Clear
Cells(rdfrom, 1).Resize(UBound(raj), 1) = Application.Transpose(raj)
MsgBox "All Duplicate has been removed after Row # " & rdfrom
End Sub
[/pre]
Khuda Hafiiz,

Deb
 
DEB Bhai May You Live Long....


I have use your code but it is showing this


"I swear.. there was no any duplicate befor this point & I have only One Column "


Than show debug error


Please see the file below, i paste the data duplicate and unique


https://www.dropbox.com/s/cq8w7cc132or3py/Barcodes%20Data.xlsx


Regards,

Muhammad Shakeel
 
Hi Shakeel,


* Two thing... regarding the above code..


When you paste the code in your VBA.. Please remove "Option Explicit" from the top of the page.


and when it ask you for the row Number from where you want tp remove the duplicate.. For ex.. Yesterday you have removed upto 400,000 and toway you want to checke only last extra added rows.. then give answer as 400,000.. it will only check after 400,000/. and it should be less than total number of line in the file..


Regards,

Deb
 
Deb bhai i think i am not clarify my requirement


I have prepare a utlity in which i had to Update only new barcodes generated by our sourcing department.


I can get the total list of barcodes from system.


while in utility i only want to update new ones.


My aim is to know about new barcodes


Let suppose today is following barcodes in my list


L1234567890

L2345678010

C1245367900


after 3 days these become


L1234567890

L2345678010

C4538291910

C1245367900

L3561829298


you can see two new barcodes added. I want to know only the two new added.


C4538291910

L3561829298


if you have any sort of method you can guide and help me brother


Thanks & Regards

Muhammad Shakeel
 
Hi Shakeel,


So you problem is somehow like this..

http://chandoo.org/forums/topic/to-display-only-unique-values-from-two-columns


Still .. For exp.. today you have 5000 record in a sheet.. and next day you again recieve 5050 record.. so you need only 50 records.. but where is the record for 5000 Barcode, so that I can check this 50 are New..


Regards,

Deb
 
Hi Deb!


Brother as my data is of about to 400,000 so these formulas reference to your link hang my system.


Please see my this file


https://www.dropbox.com/s/cq8w7cc132or3py/Barcodes%20Data.xlsx


i hope now you understand


Thanks
 
Hi Shakeel,


OOPS.. Sorry yaar.. Actually its weekend.. so lil bit busy.. with Family.. :)


OK.. lets come to the point.. as you already wait a long.


Suppose :

Sheet1 > A1:A1300 contain OLD LIST

Sheet2 > A1:A1500 contain NEW LIST.


In Sheet 2 > In C1 write "Criteria"

In Sheet 2 > In C2 Write =NOT(ISNUMBER(MATCH(A2,Sheet1!$A$2:$A$1300,0))) It will display FALSE


Now Select Sheet 2 > A:A .. Apply.. Data > Advance Filter.

* Filter the list in PANE

* List Range "$A$1:$A$1500"

* Criteria Range "$C$1:$C$2"

* OK.


In Sheet2 > A:A it will filter all the data that are not in Sheet1 > A:A .. In your case Only 200..

Select the Cells and Copy (for more safety you can use Alt + E + G + S + Y
to select visible cells only and then COPY )


I know you are searching for some VBA .. and then for loop.. of something like that.. but believe me.. It will work more faster than any VBA or Formula based approach at-least in case of 400000+ data..


Please let us know if you still need any VBA or have any confusion..


Sabba Khair..

Deb
 
Hi Deb Bhai!


i have only need to do the task as fast as can...you can suggest any method brother.


The above method takes almost 1 hour.


Is there any method to work in minutes.??
 
Hello Shakeel,


Can't we do simple vlookup to match the old and new list in order to find out the new entries added on a particular day.


The vlookup would return #N/A for the entries where it would not find any match in the old list. Then you could filter the the list with #N/A to get the new entries.


In case this approach is fine, we can even automate the process(in a proper / customized way) so that you could avoid running the formula again and again.


However, my thought is here in this file:

http://speedy.sh/TUp5S/Barcodes-Data.xlsx


Regards,

Kaushik
 
Hi Kaushik


Thank you for your effort.


Brother when we use v.lookup on 400,000 cells, it will hang the excel file. also at the time of sorting or filter, same problem occurred.
 
Hi Shakeel


Considering the huge number of cells you are dealing with and the kind of requirement you have, I believe Microsoft Access / SQL Server would be the best option to get it done using the SELECT QUERY statement.


It would be a good idea to switch to a real database and if needed come back to Excel to perform other analysis or operations over the unique data.


Kaushik
 
If you have Excel 2007 or later version then try the following.

--> Sort Data

--> Data | Data Tools | Remove Duplicates.


If you find that the results are faster than any of the above methods then turn on the macro recorder and record the macro.
 
Hi, inbp!

Would you please tell me how, I mean where, in which position, are the new barcodes added?

At the end, sorted in certain order, ...

Regards!
 
Hi, inbp!


Give a look at this file:

https://dl.dropbox.com/u/60558749/Data%20Duplicate%20-%20Barcodes%20Data%20%28for%20inbp%20at%20chandoo.org%29.xlsm


For 400K entries it takes less than 1' (47" in my PC). It can be even improved if you remove the calls to the "X" procedure that display step and time that was just for testing purposes, and eliminate conditional formatting for columns A & C (800K conditions) and count cells on row 1.


More improvement may be obtained if both old and new list contained unique entries. If so, check the code and swap comments between statement adding 1 and Do...Loop.


Column A: old list

Column C: new list

Column E: add list

Cell F1: no. of additions

Column G: del list

Cell H1: no. of deletions


It also uses 4 named ranges for conditional formatting and counting, but you can get rid of them too if elapsed time isn't suitable for you.


Just advise if any issue.


Regards!

PS: I forgot pasting the code for those with CASFFML syndrome (Luke M, Faseeh, ... & co.)

-----

Option Explicit

Sub DifferenceList()
X "start"
' constants
Const ksRanges = "0:0 A:A C:C E:E G:G"
' declarations
Dim rng(4) As Range, srng() As String, rngX As Range
Dim lIndex(4) As Long, iActive(2) As Integer, bChange(2) As Boolean
Dim I As Long, J As Long

' start
' application
With Application
.EnableEvents = False
.Calculation = xlManual
.ScreenUpdating = False
End With
' ranges
srng() = Split(ksRanges)
For I = 1 To 4
Set rng(I) = Range(srng(I))
Next I

' process
' sort old & new
For I = 1 To 2
X "sort" & I
With ActiveWorkbook.Worksheets(ActiveSheet.Name).Sort
With .SortFields
.Clear
.Add Key:=rng(I), _
SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
End With
.SetRange rng(I)
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
Next I
' clear add
X "clear" & I
For I = 3 To 4
With rng(I)
If .Rows.Count > 1 Then
Range(.Rows(2), .Rows(.Rows.Count)).ClearContents
End If
End With
Next I
' main
X "main"
' start pointers
lIndex(1) = 2
lIndex(2) = 2
lIndex(3) = 1
lIndex(4) = 1
' cycle
Do
' check condition
Select Case rng(2).Cells(lIndex(2), 1).Value
Case Is = rng(1).Cells(lIndex(1), 1).Value
iActive(1) = 1
iActive(2) = 1
bChange(1) = False
bChange(2) = False
Case Is < rng(1).Cells(lIndex(1), 1).Value
iActive(1) = 0
iActive(2) = 1
bChange(1) = False
bChange(2) = True
Case Is > rng(1).Cells(lIndex(1), 1).Value
iActive(1) = 1
iActive(2) = 0
bChange(1) = True
bChange(2) = False
End Select
' add to 3 if new at 2
If bChange(2) Then
lIndex(3) = lIndex(3) + 1
rng(3).Cells(lIndex(3), 1).Value = rng(2).Cells(lIndex(2), 1).Value
End If
' add to 4 if del from 1
If bChange(1) Then
lIndex(4) = lIndex(4) + 1
rng(4).Cells(lIndex(4), 1).Value = rng(1).Cells(lIndex(1), 1).Value
End If
' move pointers
For I = 1 To 2
If iActive(I) = 1 Then
' next value, it might has been the following statement:
'lIndex(iActive(I)) = lIndex(iActive(I)) + 1
' but are duplicates so Do...Loop
Do
lIndex(I) = lIndex(I) + 1
J = lIndex(I)
Loop Until rng(I).Cells(J, 1).Value = "" Or _
rng(I).Cells(J, 1).Value <> _
rng(I).Cells(J - 1, 1).Value
End If
Next I
' just in case
DoEvents
Loop Until rng(1).Cells(lIndex(1), 1).Value = "" And _
rng(2).Cells(lIndex(2), 1).Value = ""
' end
X "main end"
' positioning
rng(3).Cells(1, 1).Select
' ranges
For I = 1 To 4
Set rng(I) = Nothing
Next I
' application
With Application
.EnableEvents = True
.Calculation = xlAutomatic
.ScreenUpdating = True
End With
' beep
Beep
X "end"

End Sub

Private Function X(S As String)
Debug.Print Now(), S
End Function

-----
 
Hi Sir,


its working awesome in 2 minutes it done the job.


i have a file in which about 400,000 entries with series.


Barcodes are added on daily basis. I need to find only new barcodes from new file. and in new file all items are mixed.


My old list have all duplicates in new, but new have added some unique barcodes.


The file you provided is showing following result


Added 243980

Deleted 225693


But i think only 5,000 to 10,000 maximum addition.


When i go through the file in added list duplicate barcodes shown.


one barcodes is shown in old list, new list, added list and deleted list.


Regards,

Muhammad Shakeel
 
Hi, inbp!


2 minutes, well, that seems to be a good time regarding your previous ones, but you should change or upgrade your XT :)


The file I provided is the same file as you uploaded, completing the codes up to 400K with "Z & row()", that's to say, from Z1301 to Z400001. In my file there were 2 deletions of codes in old list that I created adding an X to the end of cells A399991 and A399995 (missing in uploaded version, please update those cells or download the file again from same previous link).


Despite of this error, if you get 243980 and 225693 and you think they'r incorrect, please upload the new file with your codes so as to check what figures I'd get when running the macro.


I don't understand that regarding 5K or 10K max addition. If anything works different as stated, please indicate correct value at a helper column of the file.


Regards!
 
@Debraj Roy

Hi!

Actually I've done something alike before and for 750K records it took 1'40", so the process at ianb's machine shouldn't last 40', maybe 1, 2, or 5' in the worst case but not 40' anyhow.

Clues for speed?

Data ordered, no selects neither at sorts nor at assigns, and not much more. Glad you liked it.

Regards!


EDIT:

I forgot: no events triggered, manual calculation and no screen updating.


EDIT:

I forgot again, should have to go for a couple of Carlsberg: most important? no formulas, VBA code... for that number or records, forget searches, filters, ...
 
Hi Sir,


Thanks a lot, its working now 100%, and it took less than one minute. Great thanks sir.


My whole day working is now on minutes.


Regards


Muhammad Shakeel
 
Hi, inbp!

Glad you solved it. Thanks for your feedback and for your kind words too. And welcome back whenever needed or wanted.

Regards!

PS: Try to seem to be doing anything else during your daily work... otherwise the author wouldn't be responsible if someone were fired :)
 
Back
Top