I'm doing some data analysis and extraction on a large number of CSV files (read: several hundred). Because I don't have the time or knowledge to build an access database, here is what I've done so far:
1. I built a template spreadsheet, with all the formulas I need to use pre-populated, and this way I can open the each CSV file, copy the data, and paste it in. This data runs out to column G, and an initial calculation is performed, with the results going into column H.
2. This is where I run into trouble. I need to extract the top and bottom ten values from column H, put them in a column (in order), and also add the data from columns A and F in adjacent columns. I am using the LARGE and SMALL functions to grab the min/max values from column H, and INDEX/MATCH to grab the same-row values from columns A and F:
=LARGE/SMALL(H2:H3500,1....10)
=INDEX($A$1:$A$3500,MATCH(M3,$H$1:$H$3500,0))
=INDEX($F$1:$F$3500,MATCH(M3,$H$1:$H$3500,0))
My problem is that the CSV files I'm pulling this data from vary widely in the number of rows, anywhere from 300 to more than 2000. Because I don't know exactly how many rows I'm dealing with, I ran the column H calculations all the way down to row 3500, because I do know for a fact that these data will not exceed that many rows. However, there will always be cells in column H that have no basis for calculation, resulting in a #DIV/0 error, which carries over to the two LARGE/SMALL tables. Help?
(Bonus points if you can show me how to filter out all values less than a threshold number)
1. I built a template spreadsheet, with all the formulas I need to use pre-populated, and this way I can open the each CSV file, copy the data, and paste it in. This data runs out to column G, and an initial calculation is performed, with the results going into column H.
2. This is where I run into trouble. I need to extract the top and bottom ten values from column H, put them in a column (in order), and also add the data from columns A and F in adjacent columns. I am using the LARGE and SMALL functions to grab the min/max values from column H, and INDEX/MATCH to grab the same-row values from columns A and F:
=LARGE/SMALL(H2:H3500,1....10)
=INDEX($A$1:$A$3500,MATCH(M3,$H$1:$H$3500,0))
=INDEX($F$1:$F$3500,MATCH(M3,$H$1:$H$3500,0))
My problem is that the CSV files I'm pulling this data from vary widely in the number of rows, anywhere from 300 to more than 2000. Because I don't know exactly how many rows I'm dealing with, I ran the column H calculations all the way down to row 3500, because I do know for a fact that these data will not exceed that many rows. However, there will always be cells in column H that have no basis for calculation, resulting in a #DIV/0 error, which carries over to the two LARGE/SMALL tables. Help?
(Bonus points if you can show me how to filter out all values less than a threshold number)