• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Help to create a formula for extracting surname along with the suffix or double surname.

Kelcher

New Member
Good afternoon. I would appreciate your assistance with creating an Excel formula.


I want to divide a string, which is separated by spaces, in cells of column A into separate columns B, C, and D.

I also want to extract the surname from the name and middle name (if applicable) in column C.
How can I modify the IF formula in column D to achieve this?

Some strings with full names do not have a middle name, so the surname must always appear in column D.
In this case, cells in column C should remain empty.

I also need to ensure that the formula in column D can extract the surname together with any suffixes,
not limited to 8 surname characters.

How can I create a function that automatically caters to different surname strings? D column?

I have more precise information in the attached Excel file.

Thanks in advance for your valuable help.
 

Attachments

  • IF formula - Surname, Suffix,.xlsx
    14 KB · Views: 8
This might work.

In B16:

=TEXTBEFORE(A16," ")

In C16:

=IF(AND(LEN(A16)-LEN(SUBSTITUTE(A16," ","")<=2),RIGHT(A16,1)="."),"",IFNA(TEXTBEFORE(TRIM(SUBSTITUTE(A16,B16,""))," "),""))

In D16:

=TRIM(SUBSTITUTE(A16,B16&" "&C16,""))

Copy down.
 

Attachments

  • IF formula - Surname, Suffix, AliGW.xlsx
    13 KB · Views: 2

Kelcher

Rules are for everyone.
  • Cross-Posting. Generally, it is considered poor practice to cross post. That is to post the same question on several forums in the hope of getting a response quicker.
  • If you do cross-post, please put that in your post.
  • Also if you have cross-posted and get an Solution elsewhere, have the courtesy of posting the Solution here so other readers can learn from the answer also, as well as stopping people wasting their time on your answered question.
As You've read from Forum Rules:
Other sites has basically same rules.
 

Kelcher

Rules are for everyone.
  • Cross-Posting. Generally, it is considered poor practice to cross post. That is to post the same question on several forums in the hope of getting a response quicker.
  • If you do cross-post, please put that in your post.
  • Also if you have cross-posted and get an Solution elsewhere, have the courtesy of posting the Solution here so other readers can learn from the answer also, as well as stopping people wasting their time on your answered question.
As You've read from Forum Rules:
Other sites has basically same rules.
Good morning Vietm.
I'm so sorry for my wrong behavior, broken the rules. I apologize and won't do it again
I want to explain that I was looking for help with artificial intelligence from Copilot, but I couldn't get them to understand my request and I never had a solution.
So I thought it would be better with Excel Specialists and suggested by AI; first I posted my request in Excel Help on Saturday, but I checked it several times and got no response. I thought maybe I had posted it in the wrong place because I just finished introducing myself and after I posted my request, I wrote asking for guidance on the steps to post a question. But they never got back to me.
That's when I remembered that more than 5 years ago I had asked for help in Chandoo, I found the data in an Excel sheet with the answer and data.
Then I went through the process to reset the password and access it again. First I found quickly and cleared the steps to post, and fortunately in less than 12 hours, now in the morning when I turned on my computer, it had been 5 hours since I got the kind response from Aligw and now I only need a small solution to achieve the complete solution.
I will now communicate with Aligw by thanking her help and asking for the last help on how to solve the little issue to complete my task.
I'm sorry again, but it doesn't was my intention to create this trouble by breaking the rules.
Now I know that the correct way should have been to include the attached post I had made in Excel Help and explain that I had not received a response from them.
 
This might work.

In B16:

=TEXTBEFORE(A16," ")

In C16:

=IF(AND(LEN(A16)-LEN(SUBSTITUTE(A16," ","")<=2),RIGHT(A16,1)="."),"",IFNA(TEXTBEFORE(TRIM(SUBSTITUTE(A16,B16,""))," "),""))

In D16:

=TRIM(SUBSTITUTE(A16,B16&" "&C16,""))

Copy down.
Good morning Aligw, thank you so much for your help, these three formulas work great.

I just want to ask you if you can help me in the case where it has only first name and last name and then the formula extracts them together in the last name column. (as in the first data extracted in the table, where it only has a first name and a surname. *Sarah Josep*)
How to apply a conditional, so that in this case if there are only two words, only the second one is extracted, that is, the surname.

I'm very grateful for your time, the help, and the great knowledge you are sharing with me.

You can't imagine how long I've been looking for the solution and with this obstacle to be able to finish the typing process without having to do double work typing the same information on both parts.
 
Aligw: I apologize to you for having made it cross-posted.

But after having sought help from friends and then turned to AI that couldn't help me either.

So I turned to Excel Specialists and at the suggestion of the AI they gave me the link to Excel Help. But even though I asked for help on Saturday, they didn't answer me and then I remembered that I had requested help in 2019 and I found a reference in my Excel sheets with the Chandoo data.
However, I made the mistake of not reporting that I was posted before to Excel Help.
I want to tell you that I appreciate a lot your kind help and time in looking for the solution to my request.
 
I have the tools now but, sadly, not the knowledge!
Regular expressions have turned up in my copy of Excel (insider beta channel) but using them is another matter. I have got as far as returning the first name and the last name with specified postnomials as a dynamic array.
1716931097805.png
Code:
= LET(
    firstName, MAP(FullName, LAMBDA(f, REGEXEXTRACT(f, "^\w+", 0))),
    lastName,  MAP(FullName, LAMBDA(f, REGEXEXTRACT(f, "(\w+\s\wr\.)|(\s\w+$)"))),
    HSTACK(firstName, lastName)
  )
The first name is just that, some letters anchored to the start of the string.
Similarly, the last name may be letters anchored to the end of the string OR letters followed by a space, a letter, an explicit 'r' and a full stop / period.
At the moment I am not sure how to ensure that the second name is a middle name and not the surname.
 
I have the tools now but, sadly, not the knowledge!
Regular expressions have turned up in my copy of Excel (insider beta channel) but using them is another matter. I have got as far as returning the first name and the last name with specified postnomials as a dynamic array.
View attachment 87350
Code:
= LET(
    firstName, MAP(FullName, LAMBDA(f, REGEXEXTRACT(f, "^\w+", 0))),
    lastName,  MAP(FullName, LAMBDA(f, REGEXEXTRACT(f, "(\w+\s\wr\.)|(\s\w+$)"))),
    HSTACK(firstName, lastName)
  )
The first name is just that, some letters anchored to the start of the string.
Similarly, the last name may be letters anchored to the end of the string OR letters followed by a space, a letter, an explicit 'r' and a full stop / period.
At the moment I am not sure how to ensure that the second name is a middle name and not the surname.
Thank you Peter for your help and interest to solving my problem.

I think the way to make sure that the second word in the cell is not the middle name but the last name is when they only have two words. Because otherwise, they all have three words: first name, middle name, and surname, and some even have suffixes. So I think that by limiting it to two words you can give a particular characteristic to be in cell B16 and D16.
 
I have the tools now but, sadly, not the knowledge!
Regular expressions have turned up in my copy of Excel (insider beta channel) but using them is another matter. I have got as far as returning the first name and the last name with specified postnomials as a dynamic array.
View attachment 87350
Code:
= LET(
    firstName, MAP(FullName, LAMBDA(f, REGEXEXTRACT(f, "^\w+", 0))),
    lastName,  MAP(FullName, LAMBDA(f, REGEXEXTRACT(f, "(\w+\s\wr\.)|(\s\w+$)"))),
    HSTACK(firstName, lastName)
  )
The first name is just that, some letters anchored to the start of the string.
Similarly, the last name may be letters anchored to the end of the string OR letters followed by a space, a letter, an explicit 'r' and a full stop / period.
At the moment I am not sure how to ensure that the second name is a middle name and not the surname.
Thank you, Peter, I can see that there is a lot of work that you are doing to help me, turning up the regular expressions and using the (Insider beta channel) with all these codes and very, very complicated formulas.

I already tried these Regular Expression formulas, but are giving me an error; #NAME?

Also, I don't know the meaning of the letter f, which is after the LAMBDA formula?

Also, I want to tell you that now I have Excel 365. I must update the information about the Excel version.
 
Last edited:
I just want to ask you if you can help me in the case where it has only first name and last name and then the formula extracts them together in the last name column. (as in the first data extracted in the table, where it only has a first name and a surname. *Sarah Josep*)

Did you not look at the workbook? This is what my formula does. In the attached I have removed everything except my solution - tell me where it fails. The results are what you said you wanted.

AliGW on MS365 Beta Channel (Windows 11) 64 bit


A
B
C
D
15
FULL NAMEFIRST NAMEMIDDLE NAMELAST NAMES
16
Sarah JosephSarahJoseph
17
Derval E. PrinceDervalE.Prince
18
John Nathaniel KingJohnNathanielKing
19
Yeisa Amanda Anderson ThinneryYeisaAmandaAnderson Thinnery
20
Scarville Hamilton Sr.ScarvilleHamilton Sr.
21
Govan James Jr.GovanJames Jr.

Sheet: Sheet4
 

Attachments

  • IF formula - Surname^J Suffix^J AliGW.xlsx
    10.9 KB · Views: 2
@AliGW
If your solution is not there, it is very close. The only query I would have is, in the case of
"Yeisa Amanda Anderson Thinnery",
does one assume "Anderson Thinnery" to be a last name or should the "Anderson" be dropped?

Out of interest, I tried refactoring your solution as a single array formula,
Code:
=LET(
    firstName,  TEXTBEFORE(fullName, " "),
    remaining,  TRIM(SUBSTITUTE(fullName, firstName, "")),
    condition,  (LEN(fullName) - LEN(SUBSTITUTE(fullName, " ", "") <= 2)) * (RIGHT(fullName, 1) = "."),
    middleName, IF(condition, "", IFNA(TEXTBEFORE(remaining, " "), "")),
    lastName,   TRIM(SUBSTITUTE(fullName, firstName & " " & middleName, "")),
    result,     HSTACK(firstName, middleName, lastName),
    result
)

The significance of the 'condition' is not too clear to me but it seems to do what you require.
Now to return to REGEXEXTRACT with a few more ideas!
 
does one assume "Anderson Thinnery" to be a last name or should the "Anderson" be dropped?

One does not assume anything - one consults the workbook in the OP's opening post (see below).

I have produced a formula that does EXACTLY what he showed in that workbook as his requirements. It has been dismissed without saying why it won't do - very frustrating when you've taken time to help. So I'm now out.

AliGW on MS365 Beta Channel (Windows 11) 64 bit


A
B
C
D
E
5
Yeisa Amanda Anderson ThinneryYeisaAmandaAnderson Thinnery<< This cell is OK because som lantin names has two Surnames

Sheet: Sheet4
 
@Peter Bartholomew

The dynamic formula can be made much shorter than your version:

Code:
=LET(
r,A16:A21,
fn,TEXTBEFORE(r," "),
mn,MAP(r,fn,LAMBDA(x,y,IF(AND(LEN(x)-LEN(SUBSTITUTE(x," ","")<=2),RIGHT(x,1)="."),"",IFNA(TEXTBEFORE(TRIM(SUBSTITUTE(x,y,""))," "),"")))),
ln,TRIM(SUBSTITUTE(r,fn&" "&mn,"")),
HSTACK(fn,mn,ln)
)
 

Attachments

  • IF formula - Surname^J Suffix^J AliGW.xlsx
    11 KB · Views: 1
I have produced a formula that does EXACTLY what he showed in that workbook as his requirements.
You are correct. I misread the original table as showing what the OP requires rather than highlighting an error. Please accept my apologies. I am not convinced that there is a generally applicable logic that discriminates between multiple given names and double barrelled surname unless the latter is hyphenated but that is a problem for the OP.

I am not too troubled about brevity in terms of character count, but accept that the logic should be direct as well as being readable. There have even been occasions in which I have doubled or trebled the character count in order to introduce more descriptive names or text for comments! The one term in your formula that I failed to understand is
Code:
LEN(x)-LEN(SUBSTITUTE(x," ","")<=2)
It appears to do what you intend though!

I picked up this problem as an opportunity to try out regular expressions. Getting the balance right between hugely complicated regular expressions with multiple capture groups and using simpler RegEx but supported by Excel formulae appears to be a challenge! I finished up with a worksheet formula
Code:
=LET(
    firstName, MAP(fullName, ExtractNameλ(1)),
    middleName, MAP(fullName, ExtractNameλ(2)),
    lastName, MAP(fullName, ExtractNameλ(-1)),
    HSTACK(firstName, middleName, lastName)
)
which is deceptively simple, but supported by a Lambda function that hides the complexity
Code:
ExtractNameλ
=LAMBDA(k,
    LAMBDA(f,
        LET(
            regex, "((\w+|\w\.)\s(?!\wr\.$))|(\w+\s\wr\.$|\w+$)",
            names, REGEXEXTRACT(f, regex, 1),
            nameCount, COUNTA(names),
            IF(nameCount > k, CHOOSECOLS(TOROW(names), k), "")
        )
    )
)
As always, the catch is that the solution is a nested array, which is not supported at present!
 

Attachments

  • IF formula - Surname^J Suffix^J AliGW.xlsx
    14.7 KB · Views: 0
LEN(x)-LEN(SUBSTITUTE(x," ","")<=2)

This was to ensure that two names were split preoperly as first name and surname. I am sure you know how to manipulate the formula to see exactly what that bit is doing.

I am not convinced that there is a generally applicable logic ...

Nor am I, but you can only build solutions based on the sample data and expected outcomes provided. It will fall over with a bigger dataset, I am sure, but as you say, that's the OP's problem.

I am not too troubled about brevity in terms of character count,

It's not a competition, so neither am I, but there is efficiency to consider, too.
 
FYI

The OP has now accepted my solution as working for their purposes (over at ExcelForum), and has marked the thread there as solved.
 
Back
Top