• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Data Extraction in Excel

hgopalan

New Member
I have a complex set of data which looks like xml.

I need to extract the data in the following format


<Metamorph dxp1:category="ABC" xmlns:dxp1="http://alibaba.com/ab/2012/01/" xmlns="http://apple.com/ab/2012/01/search_request"><d2p1:Or><d2p1:Value>BL:31</d2p1:Value><d2p1:Value>M:B</d2p1:Value></d2p1:Or></MetaDataConstraint>


from the above text: I have to extract the item which occurs after category = "ABC". It comes with quotes.


2) I need to extract the codes between Values: BL:31, M:B and put them in separate columns of the same row.


Is it possible, if yes please guide me.


I have 1000's of similar lines to go over with.
 
Cannot try right now, but how about import as text or text to column, and you define these two (BL:31 and M:B) as separators? First one of the two (BL:31) and you get two columns. then you do the same with the second column, this time using M:B as separator. Result should be in the middle column after.
 
Hgopalan


Firstly, Welcome to the Chandoo.org forums.


Q1: =RIGHT(A1,LEN(A1)-FIND("ABC",A1)-4)


Q2: =MID(A1,FIND("BL:31",A1)+5,FIND("M:B",A1)-FIND("BL:31",A1)-5)
 
Andreas: Thank you for your assistance

*************************************************************************************


Hui: The first one almost worked but I am getting the residue information too after "ABC"

like: "ABC" xmlns:dxp1="http://alibaba.com/ab/2012/01/" xmlns="http://apple.com/ab/2012/01/search_request"><d2p1:Or><d2p1:Value>BL:31</d2p1:Value><d2p1:Value>M:B</d2p1:Value></d2p1:Or></MetaDataConstraint>


I want only the item between the quotes "ABC".


I can probably run a text to column for it.


But if I have multiple categories in a given row how to tackle it?

=======================================================================================

Q2: the values: BL:31 are not consistent, it can vary and have 100's of different options. I need a similar logic to just extract BL:31, M:B. There will be multiple values in a given row.
 
Hgopalan

I misread your requirements


For

ABC try:
Code:
=MID($A1,FIND("category=",$A1)+10,FIND("xmlns:",$A1)-FIND("category=",$A1)-12)

BL:31 try: =MID($A1,FIND(":Value>",$A1)+7,FIND(":Value><",$A1)-FIND(":Value>",$A1)-13)

M:B try: =MID($A1,FIND(":Value>",$A1,FIND(":Value>",$A1,FIND(":Value>",$A1)+1)+1)+7,FIND("</",$A1,FIND(":Value>",$A1,FIND(":Value>",$A1,FIND(":Value>",$A1)+1)+1)+7+1)-FIND(":Value>",$A1,FIND(":Value>",$A1,FIND(":Value>",$A1)+1)+1)-7)


If there are more than 2 sets of <d2p1:Value>XXXX</d2p1:Value> pairs

I would suggest using Text to Columns and using "d2p1:Value>" as a column delimiter


Is there any chance the source can be reconfigured to give you a better output?
 
Thank you Hui, The first one works perfect.

Need to figure out how to get the multiple values.

Data formatting is not an option, I wish it were.
 
Hello Gopal,


Assuming data starts from A2 to down.


To get the values between 'quotes' try this in B2 & copy across & down.


=TRIM(MID(SUBSTITUTE($A2,CHAR(34),REPT(" ",LEN($A2))),LEN($A2)*(COLUMNS($B2:B2)*2-1),LEN($A2)))


Regards,

Haseeb
 
Back
Top