Hi there,
For my thesis I collected data of football players. I have the data in an excel file, but it is unstructured. Due to the nature of the dataset, I don't want to sort it out by hand. The result should be that the dataset is ready for statistical analysis. I have three categories, namely; Player_Performance, Player_Profile and Transfer_History. I will discuss them now.
Player Performance
In this excel file the performance data of the player is gathered. As you can see, the player ID and the player name are a row above the performance data in different competitions. This data should be on the same row. Player ID, Name and then the performance data. How can I achieve this?
Player Profile
The player profile file is the easiest. I want to delete the double information. Some players are 2 times in the file. I want to delete every row which doesn't start with a player id. I also want to remove player id's which arent followed by player information.
Transfer History
The transfer history is the same as player performance. In the first row you will find the player id and name, and beneath are the different transfers. I want to put every transfer in a row of their own. The row should start with player id, then name and then the transfer. So it is possible that one player id will be on more rows.
In the attachment I put the files as they are and three files of how it should look like.
For my thesis I collected data of football players. I have the data in an excel file, but it is unstructured. Due to the nature of the dataset, I don't want to sort it out by hand. The result should be that the dataset is ready for statistical analysis. I have three categories, namely; Player_Performance, Player_Profile and Transfer_History. I will discuss them now.
Player Performance
In this excel file the performance data of the player is gathered. As you can see, the player ID and the player name are a row above the performance data in different competitions. This data should be on the same row. Player ID, Name and then the performance data. How can I achieve this?
Player Profile
The player profile file is the easiest. I want to delete the double information. Some players are 2 times in the file. I want to delete every row which doesn't start with a player id. I also want to remove player id's which arent followed by player information.
Transfer History
The transfer history is the same as player performance. In the first row you will find the player id and name, and beneath are the different transfers. I want to put every transfer in a row of their own. The row should start with player id, then name and then the transfer. So it is possible that one player id will be on more rows.
In the attachment I put the files as they are and three files of how it should look like.