hc-3 - clu files - inconsistency in cluster counts (dataset ec016.233)
hc-3 - clu files - inconsistency in cluster counts (dataset ec016.233)
Posted by Ruxandra Cojocaru at November 26. 2019Hello.
From the documentation, I understand that the first line of a clu file is the number of unique clusters found in that file. I tried the UNIX commands given in the crcns-hc3-data-description.pdf page 5 and I could reproduce the behaviour for the files ec012ec.187.clu.1 and ec012ec.189.clu.1 used as an example.
However, for the dataset ec016.233, I found some inconsistencies.
For example:
ec016.233$ head -n 1 *".clu.1"; tail -n +2 *".clu.1" | sort -n | uniq -c
4
606 0
231543 1
267 3
So there should be 4 clusters according to the first line, but only 3 unique clusters are present.
This also happens in clu.2, clu.3, clu.4, clu.9 and clu.10 of the same dataset.
Is there any reason/explanation for this?
Thank you,
Ruxandra
Re: hc-3 - clu files - inconsistency in cluster counts (dataset ec016.233)
Posted by Ze Henrique Targino at December 16. 2019Re: hc-3 - clu files - inconsistency in cluster counts (dataset ec016.233)
Posted by Ruxandra Cojocaru at December 19. 2019I understand and I believe you are right. I just was a bit confused, as the documentation rather says it's the number of clusters in the session specifically. Anyhow, I will just count them myself, as in this dataset the number of unique clusters in the header of the clu files really does not match the actual number.
Thank you for your reply!