• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Big data preperation

Bmo89

New Member
Hi there,
For my thesis I collected data of football players. I have the data in an excel file, but it is unstructured. Due to the nature of the dataset, I don't want to sort it out by hand. The result should be that the dataset is ready for statistical analysis. I have three categories, namely; Player_Performance, Player_Profile and Transfer_History. I will discuss them now.

Player Performance
In this excel file the performance data of the player is gathered. As you can see, the player ID and the player name are a row above the performance data in different competitions. This data should be on the same row. Player ID, Name and then the performance data. How can I achieve this?
Player Profile
The player profile file is the easiest. I want to delete the double information. Some players are 2 times in the file. I want to delete every row which doesn't start with a player id. I also want to remove player id's which arent followed by player information.
Transfer History
The transfer history is the same as player performance. In the first row you will find the player id and name, and beneath are the different transfers. I want to put every transfer in a row of their own. The row should start with player id, then name and then the transfer. So it is possible that one player id will be on more rows.

In the attachment I put the files as they are and three files of how it should look like.
 

Attachments

Back
Top