• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Formula to remove HTML entries in the Workbook

I have a workbook with many tabs. The information in here has been dumped from some web application (my guess). There are thousands of entries. Most of the rows have some html codes in it too e.g. <p><span style="color: black; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; font-size: 16pt; etc. The html text varies and not same for all rows. The only thing I have noticed is most of the rows start with the '<'. I need to know how to clean this up using a formula. Going through this manually is going to be painful and long. :( Please help. Thank you in advance
 
You can try find and replace with the following specification

Open up Find/Replace dialog box by pressing ctrl+h short cut

In the find/ replace dialog box perform the following activities,
Find => <*> ( This expression matches all html elements because they follow this convention of < and > symbols)

Leave blank in Replace textbox and click on Replace all..

This would replace all the html tags and leave the content between those html tags
 
There is a problem with the find and replace method. For e.g. in one cell I have
<p>Under general direction, oversees the administrative functions of the Chairman's Office....</p> Here the replace deletes the whole text, when I only want the <p> at the beginning and end to be removed. In some cells there are only HTML text, so deleting all the text in that row is okay.

Sample attached

Please advise.
 

Attachments

  • Sample.xlsx
    38.2 KB · Views: 4
The other work around which we can give a try here is..

copy all these HTML tags into notepad and save that file as .htm

After creating the html file, open the html file using Excel.

This should nullify all your html tags and you can copy the data as plain text into another excel file
 
This is a huge file with thousands of entries. I tried copying a few rows to textedit, saved as .htm but it just opened with all the tags in it. I use a Mac device.
 
Yipeeee! Thank you SirJB7! You saved the day! It worked. It took out majority of the HTML tags and what is left is something I can live with and clean out.

Appreciate it the help. Thank you
 
Hi, Hazra Hadee!
Yes, a bit of tags are left but nothing that could not be removed manually. Maybe there are a other online converters that work better, I didn't dig deeply, just tested the topmost.
Glad you solved it. Thanks for your feedback and welcome back whenever needed or wanted.
Regar
 
Back
Top