• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Sorting a list of comma-separated lists

Peter Bartholomew

Well-Known Member
What ideas do you have on reducing the content of a set of comma-separated lists to a single column range (sorted and without duplicates if you choose).
I am excluding VBA, PQ and manual operations such as 'Text to Columns'.

I have included a solution to explain the situation. My solution is lightly protected so it should not be immediately obvious by inspection. I am interested to see what solutions are out there.

62046
 

Attachments

vletm

Excel Ninja
Peter Bartholomew
The first idea was
... that those sets has done,
because there are same first name students.
That means, there are no duplicate students.
I would use VBA to split those rows and after that sorting it.
I would always use also last names with first names with my list to avoid mistake to loose any student from the final list.
 

AliGW

Active Member
I am excluding VBA, PQ and manual operations such as 'Text to Columns'.
Why? :confused:

Is this a query or just a challenge?

PS Your student list has leading spaces - you might wish to get rid of those. ;)
 
Last edited:

GraH - Guido

Well-Known Member
I don't have all functions on board, and since I'm lazy would always use PQ for this, @Peter Bartholomew.
perhaps this could work?
=index(FILTERXML("<a><b>"&SUBSTITUTE(textjoin(",";true;Students);",";"</b><b>")&"</b></a>";"//b");row(a1)
 

Peter Bartholomew

Well-Known Member
Why? :confused:

Is this a query or just a challenge?

PS Your student list has leading spaces - you might wish to get rid of those. ;)
It was part of a query I answered on Microsoft TechCommunity. The question was how to build a list of dates (row numbers) associated with each name. It turned out that getting the unique list of names in the first place was the hardest part. I came up with a solution but it involved some lateral thinking. Before embarking on a session of egotistical self-congratulation, I thought I would post it here as a challenge in case someone else comes up with a simpler approach. You already get kudos for being more observant than me, I returned duplicate names before realising that the data needed cleaning; an embedded TRIM sorted the problem.

As an aside, how are you getting on with dynamic arrays? I have been using them continuously (spreading even more confusion than is my norm).
I find that they support a very different thought process in which the objects used to construct solutions are entire arrays, occasionally broken down into a set of rows or columns, but never the individual cell. "... and copy down" has virtually disappeared from my vocabulary.
 

GraH - Guido

Well-Known Member
Went along with the challenge:
  1. since I don't have textjoin in my armory, I concatenated the values in the total table row
    • -> name manager: vAllStudents
  2. this variable was altered in a list of names with the formula:
    • =FILTERXML("<a><b>"&SUBSTITUTE(vAllStudents;",";"</b><b>")&"</b></a>";"//b")
    • -> name manager: ListStudents
  3. Then made the list using (D6 down):
    • =IF(COUNTA(ListStudents)<ROW(D1);"";INDEX(ListStudents;ROW(D1)))
  4. Made a second list called "ListExtracted" using:
    • =OFFSET(Sheet1!$D$6;;;COUNTA(ListStudents))
  5. Finally in F filtered/sorted the list with this array formula:
    • =IFERROR(INDEX(ListExtracted;MATCH(0;COUNTIF(ListExtracted;"<"&ListExtracted)-SUM(COUNTIF(ListExtracted;D$5:D5));0));"")
Would have liked to get there with only using "ListStudents", but that got me nowhere.

EDIT: just saw, I'm missing a bunch of names.... :oops:
 

Attachments

Peter Bartholomew

Well-Known Member
@GraH - Guido
That is quite an achievement! Especially handicapped by a dated toolset (no TEXTJOIN!).
I did think of FILTERXML but I have never used it and the help page was about as much use as a chocolate teapot. I understand the xml that you have built well enough but what is the "//b" that appears as the xpath parameter?

I think that once you had got to the point of a list, duplicates and all, I would consider the problem solved. The rest, writing over-sized ranges of formulas and counting occurrences are just sordid workarounds compensating for the lack of a FILTER array function, just annoying. There is not even any point in upgrading to Office 2019; maybe revert to Google Sheets until dynamic arrays appear in a static Office version!
 

AliGW

Active Member
As an aside, how are you getting on with dynamic arrays?
I'm not yet, but once the number of people using Office 365 reaches tipping point (I've seen a marked increase over the past month or two), then I'll start playing.

However, I am getting on really well with PQ, and that's what I'd have used for this. It is my go-to tool of choice at the moment.
 

GraH - Guido

Well-Known Member
I did think of FILTERXML but I have never used it and the help page was about as much use as a chocolate teapot. I understand the xml that you have built well enough but what is the "//b" that appears as the xpath parameter?
I don't have a clue! I simply remember an earlier challenge where it was used and thought it could be helpful in solving this one. Thus replicated without giving it a second thought.
The second part (sorting/filtering) came from Mike Girvin, I also remembered there was video on it. I have made a mistake somewhere in my file, because in his sample file with your list it is simply working. (Have to imagine some swearwords to understand my tone...)
So less impressive, hey, now, as I'm just stealing away solutions everywhere. Told ya, I'm the lazy kind :p

EDIT: found the mistake. I was counting from the wrong list. Proofs your point of cell referencing, right?

So the correct formula is:
=IFERROR(INDEX(ListExtracted;MATCH(0;COUNTIF(ListExtracted;"<"&ListExtracted)-SUM(COUNTIF(ListExtracted;F$1:F1));0));"")
 

Attachments

Last edited:

Peter Bartholomew

Well-Known Member
Hi Guido
Especially chosen for you!
All you could want to know about Belgian waffles and more.

The attached reads some xml from the w3schools.com website using WEBSERVICE() and then extracts data using the xPath parameter in FILTERXML().
I used CSE arrays but the behaviour is so much nicer using dynamic arrays. Apparently the double-slash as in "//food/price" means any instance of the node irrespective of where it sits in the tree.

Knowing your liking for PQ, I also extracted the information from the string to a table using parse xml.
This may be pretty routine and even boring for others but it was all new to be.
 

Attachments

Last edited:

Peter Bartholomew

Well-Known Member
Since I see no queue waiting to offer fresh insight on the problem, I will outline the solution I came up with.

I started by trying to separate the comma separated text from each cell but then changed my mind and used TEXTJOIN to form a single string before processing it. At this stage, I had formed the text string
Sam, Tim, Max, Ronny, Rhea, Sam, Tim, Joan, Peter, Eric, Ronny, Sam, Julie, Max
It seems that this was also used by others, Guido included.

The next idea was to convert this list to the format of an array constant. I set the calculation up as a named formula 'string' that refers to
= "{""" & SUBSTITUTE( TEXTJOIN( ", ", 1, Students ), ",", """;""" ) & """}"
The multiple double quotes are something of a nightmare but that's the way it is. If one debugs the expression by using the formula
= string
the result is
{"Sam";" Tim";" Max";" Ronny";" Rhea";" Sam";" Tim";" Joan";" Peter";" Eric";" Ronny";" Sam";" Julie";" Max"}

I then defined a further named formula 'split' that referred to the Macro2 function EVALUATE
= EVALUATE(string)


The final steps of trimming spaces, extracting a list of distinct values, and sorting were achieved with the named formula 'distinct'
= SORT( UNIQUE( TRIM(split) ) )

The workbook shows the output for
= split
and
= distinct

I have used a CSE array for 'split' rather than a dynamic array to reduce Excel version dependence, though not sufficiently for Guido as he doesn't have TEXTJOIN either :confused:.

I seems to work well enough but it is far from a light-weight solution and I wonder where I have missed a simpler idea somewhere along the line.

_____________________________________________
p.s. The workbook is as it was before with the exception that I have added some annotation and run a macro to make the defined names visible
Code:
Sub Unhide()
Dim n As Name
For Each n In ActiveWorkbook.Names
    If Left(n.Name, 1) = "_" Then
        n.Visible = False
    Else
        n.Visible = True
    End If
Next
End Sub
 

Attachments

Peter Bartholomew

Well-Known Member
@XOR LX
Wow, that was quite a tour de force! I suspect that the existence of modern functions such as TEXTJOIN and now FILTER and SEQENCE would simplify the implementation significantly. Of course, backward compatibility would then be a major problem. My formula
= EVALUATE(string)
has a measure of elegance but, even so, I was not very happy with using an undocumented feature left over from a past era. I had forgotten your idea of padding with spaces and then chopping the resulting string at regular intervals.

An idea I had toyed with was that of using SUBSTITUTE to replace each occurrence of a space using, an array for the instance number, and then using SEARCH to return the corresponding array of locations of the space character.
 

p45cal

Well-Known Member
I don't have FILTER, UNIQUE, SORT, SEQUENCE, but I do have TEXTJOIN.
In any cell in row 6:
Code:
=MID(SUBSTITUTE(SUBSTITUTE("," & TEXTJOIN(",",TRUE,Students) & ","," ",""),",","¬",ROW()-5),FIND("¬",SUBSTITUTE(SUBSTITUTE("," & TEXTJOIN(",",TRUE,Students) & ","," ",""),",","¬",ROW()-5))+1,FIND(",",SUBSTITUTE(SUBSTITUTE("," & TEXTJOIN(",",TRUE,Students) & ","," ",""),",","¬",ROW()-5),FIND("¬",SUBSTITUTE(SUBSTITUTE("," & TEXTJOIN(",",TRUE,Students) & ","," ",""),",","¬",ROW()-5)))-FIND("¬",SUBSTITUTE(SUBSTITUTE("," & TEXTJOIN(",",TRUE,Students) & ","," ",""),",","¬",ROW()-5))-1)
copied down as far as needed will convert them into a non unique/distinct list. I realise I could have used -ROW($A$5) instead of -5 throughout to make it easy to put on any row.
 

Peter Bartholomew

Well-Known Member
@p45cal
Thank you for your solution. I can confirm that it works!
It must have required patience and fortitude (as well as a few applications of cut and paste?) to put the solution together.

In order to analyse your formula, I applied names to elements of the formula in order to reduce it to a sequence of steps I could follow. I sincerely hope you find the process of interest rather than simply objecting to it as interference!

The first repeating element that was visible in your formula was the concatenated list of students, 'concatenation', that refers to
= SUBSTITUTE("," & TEXTJOIN(",",TRUE,Students) & ","," ","")

The next was a row index that I called 'k'
= ROW() - ROW(header)

where ROW(header) = 5

Next I created the character pointer 'p' which gives the first letter of each student name
= 1 + FIND("¬", SUBSTITUTE(concatenation, ",", "¬", k ) )

I tried two formulas for the pointer 'q' that identifies the trailing comma following each name. Your version was
= FIND( ",", SUBSTITUTE(concatenation, ",", "¬", k ), p )
which involved nesting the formula 'p' to provide a starting point for the search. The other approach I tried was to duplicate the formula for 'p' but seek the 'k+1'st occurrence of ',' using
= FIND("¬", SUBSTITUTE(concatenation,",","¬", k+1 ) )
Both worked.

The final worksheet formula was
= MID( concatenation, p, q-p )
I dropped the replacement of a ',' by '¬' because it makes no difference,

Altogether, a good formula that meets the spec without requiring EVALUATE. Also, your original did not require as many words to explain it :)

Just to 'gild the lily' I introduced a variant that started with a name k′ (k-prime) that refers to
= SEQUENCE(14)
which leads to the spilt array solution
= UNIQUE( MID( concatenation, p′, q′-p′ ) )
 

Attachments

Top