• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Compare Cells From One Column To Another + Delete

excelnub

New Member
I don't have a better way to describe that, so hopefully I can explain it better here.

So I have a list of 1,000+ URLs and another list of 300+ root domains. The 300+ root domains have already been dealt with and I don't need them in the 1,000+ URL list anymore.

I have a list of all the root domains from the 1000+ list in one column. I have a list of the other 70+ domains in another column. I'd like to remove all instances of the domain in column 2 from column 1.

It's tough to explain, but here's a screen shot example:

cells.jpg


So since poker.net is in the URL 2 column, it'd remove cells A2-4 (or replace their url with a string that makes it easy for me to Search > Select All and then manually delete or whatever), and it'd remove A5 as well since wheep.net is listed. The rest would remain untouched. Is this doable?
 
Hi Excelnub

The picture did not upload but if I understand you correctly you want to delete all instances of the second List (List B) from the first List (List A). Question, will there be instance in List B that are not in List A?

It does not matter really. What you should do is use a vlookup. In Col C

=Vlookup(List2 cell, List A Range, 1,0)

a workins solution

=vlookup(B2,A2:A100,1,0)


Use that Filter method we spoke of in the last post to isolate all items which do not equal NA# and now delete the lines. Get used to using Filters as they make Excel a breeze!!!

Take care

Smallman
 
That's weird, the image shows for me. The raw URL is here: http://s21.postimg.org/7mldo7vs7/cells.jpg

Yes, I want to delete all instances of List B from List A, but we're using root domains in List B and full URLs in List A, so it's tricky. So basically if List B has www.chandoo.org in it and List A has forum.chandoo.org, I want that deleted. So any cell which has any instance of 'chandoo.org' in it should either get deleted or replaced in some way to make it obvious.

How do I use vlookup?
 
Hi

How do I use vlookup?

Copy the example I gave you above into Row C2. Now put the same data in A2 and B2. Now create a small list and lookup a value in a small list.

Will the term you are looking up always be between the dots. www.chandoo.org

Where Chandoo is between the two dots?

Smallman
 
I'm sorry, but I don't understand what you're asking me to do.

"Copy the example I gave you above into Row C2."

Okay, I put =Vlookup(List2 cell, List A Range, 1,0) into C2. It came back with an error: #NAME?

"Now put the same data in A2 and B2. Now create a small list and lookup a value in a small list."

I don't know what you're asking me to do. I'm pasting that code into A2 and B2 and overwriting the domain values?

The term will be the first dot, the domain (chandoo) and the domain extension. So I'd look up chandoo.org (this way it removes subdomains like forum.chandoo.org and not just www.chandoo.org/whatever).
 
Hey Smallman,

I looked at your sheet you provided. So it looks like your C code strips out the text in between . and . and compares that into the A column? Seems like a solid fix, although we'd need to include the domain extention and not just the domain name in between the 2 .'s. For instance, we'd need it to find all instances of carsales.com and not just carsales, because a URL might be www.greatcarsales.com and we don't want that to be deleted.
 
Can you include some examples in a file and upload it. Show examples of what should be deleted and what should not so we can see the patterns and suggest a solution.

Take care

Smallman
 
Sure. Although I think your example worked how I'd want it to, only it didn't include the domain extension (.com, .org, etc.) only the URL name between the . and .'s.

I've uploaded the spreadsheet I used earlier.

Column A: My main index of all URLs.
Column B: The list of root domains that should be deleted from Column A.
Column C: The updated list (Column A minus root domains from Column B).

I just need to wind up with a copy of Column A without the root domains in Column B. So we'd wind up with Column C somewhere.
 

Attachments

Hey Smallman,

Thanks a LOT. It seems like this is definitely what I need. There's one issue I noticed if you don't mind me mentioning it!

It seems like it only returns a response based on how many of the root domains are in the root list. For example, there's 3 poker.net URLs in Column A, but it only returns the first one and only one.

The URL list I have with 1000+ URLs has multiple URLs from the same root name. Is there a way to fix this?

Let me know if this can be fixed.

Thanks!
 
Sorry I thought they were unique. My mistake. I will have a look at the file and correct this assumption so it works on all.

Take care

Smallman
 
Hi, excelnub!

Give a look at these files, 2003 xls version and 2007+ xlsx version:
https://dl.dropboxusercontent.com/u/60558749/Compare Cells From One Column To Another + Delete - example_removal (for excelnub at chandoo.org).xls
https://dl.dropboxusercontent.com/u/60558749/Compare Cells From One Column To Another + Delete - example_removal (for excelnub at chandoo.org).xlsx

They use a helper column D and 3 dynamic named ranges for columns A, B & C as follows:
URL1List: =DESREF(Sheet1!$A$2;;;CONTARA(Sheet1!$A:$A)-1;1) -----> in english: =OFFSET(Sheet1!$A$2,,,COUNTA(Sheet1!$A:$A)-1,1)
RootList: Same with column B references
HelpList: Same with column C references

Column C formulas:

2003: =SI(ESERROR(BUSCARV(IZQUIERDA(A2;HALLAR("/";A2&"/";8)-1);"http://www."&RootList;1;FALSO));MAX(D$1:D1)+1;BUSCARV(IZQUIERDA(A2;HALLAR("/";A2&"/";8)-1);"http://www."&RootList;1;FALSO)) -----> in english: =IF(ISERROR(VLOOKUP(LEFT(A2,SEARCH("/",A2&"/",8)-1),"http://www."&RootList,1,FALSE)),MAX(D$1:D1)+1,VLOOKUP(LEFT(A2,SEARCH("/",A2&"/",8)-1),"http://www."&RootList,1,FALSE))

2007: =SI.ERROR(BUSCARV(IZQUIERDA(A2;HALLAR("/";A2&"/";8)-1);"http://www."&RootList;1;FALSO);MAX(D$1:D1)+1) -----> in english: =IFERROR(VLOOKUP(LEFT(A2,SEARCH("/",A2&"/",8)-1),"http://www."&RootList,1,FALSE),MAX(D$1:D1)+1)

Column D formulas:

2003: =SI(ESERROR(INDICE(URL1List;COINCIDIR(FILA()-1;HelpList;0)));"";INDICE(URL1List;COINCIDIR(FILA()-1;HelpList;0))) -----> in english: =IF(ISERROR(INDEX(URL1List,MATCH(ROW()-1,HelpList,0))),"",INDEX(URL1List,MATCH(ROW()-1,HelpList,0)))

2007:=SI.ERROR(INDICE(URL1List;COINCIDIR(FILA()-1;HelpList;0));"") -----> in english: =IFERROR(INDEX(URL1List,MATCH(ROW()-1,HelpList,0)),"")

Just advise if any issue.

Regards!

PS: Please take :D as : D (without space) due to icon features.
 
I did attach the file, it said 100% complete and posted to the thread. I then pullud up stumps for the night. It is odd that the text got uploaded and not the file. Now the file is at home and I am at work, typical. I will upload the file this evening if I don't get a chance to recreate it today.

Take care

Smallman

EDIT

Here is the file as promised. Hopefully it attaches first time.
 

Attachments

Hey Guys,

Thanks for all the help! A user on another forum sent me code which does exactly what we need. If you're curious, his code checks the list of roots to the list of URLs and returns a number next to the URL if it's there and N/A if not. That way I can just sort that column by number and delete all of the duplicates.

I've attached the spreadsheet for you guys to look at if you want, although the ROOT column probably won't work for you because it utilizes an add-in I have. You can manually type them in though!

Smallman, thanks a lot for your help!

Cheers!
 

Attachments

Hi, excelnub!
Have you downloaded and tested my uploaded file (.xls and .xlsx versions)? :( It just does the same, a bit smarter as it avoids the error values. :)
Regards!
 
No worries sorry about not uploading a file. I didn't realise. My procedure uses vba to trap unique instances then just zapps all those unique instances using a filter. I would do this manually if it was me but it is nice to have the option.

Take care

Smallman
 
@SirJB

That is subjective. The error Values are what you want cause they provide a grouping of sorts. You just filter by the non error values which is everything and delete those rows in one swoop with the filter. Can't get easier than that.

Smallman
 
@Smallman
Hi!
Neither need nor want to argue, surely it's like you say. And surely too your missing file would have done it much better indeed... but it wasn't uploaded. Sorry.
Regards!
PS: Why should I be remembering b(ut)ob(ut)hc's signature?
 
Back
Top