Researchers at the Southern Astronomical Observing Base of Yunnan Observatories, Chinese Academy of Sciences, including Master’s student REN Siying and Senior Engineer WANG Chuanjun, together with other collaborators, have proposed a hybrid indexing method for efficient retrieval of large-scale astronomical star catalog data. The method, which combines HEALPix and R-Tree, has also been extended to tasks such as catalog cross-matching, fusion, and asteroid searching. The study shows that this approach can reduce data access volume during retrieval and improve query efficiency, while its file-based management implementation lowers the complexity of system deployment and maintenance. The results were published in the international journal Astronomy and Computing.
As wide-field survey observations advance, the volume of astronomical data has grown rapidly, with star catalog data expanding from gigabytes to terabytes and even petabytes. In this context, achieving efficient and scalable data retrieval without relying on complex database systems has become a key challenge in astronomical data management. Traditional methods based on single spatial partitioning or tree structure indexing often struggle to balance retrieval efficiency and system complexity when handling large-scale data.
To address this, the team developed a file-level hybrid indexing framework. HEALPix is first used to partition the celestial sphere, dividing the original star catalog data into multiple sub-files and thus quickly narrowing the candidate data range during retrieval. An R-Tree spatial index is then built inside each sub-file to enable efficient and precise queries on the local data. This approach combines spatial partitioning with local indexing into a two-level retrieval mechanism of coarse filtering followed by precise retrieval.
Experimental results demonstrate that, compared to traditional single-level indexing methods, the HEALPix–R-Tree hybrid indexing method offers higher query efficiency and better scalability in typical spatial query tasks such as cone searches. Under large-scale data conditions, the method can reduce data access volume and query time, while avoiding the deployment and maintenance complexity associated with conventional database systems.
The researchers further applied this method to catalog cross-matching and fusion processing, as well as to the efficient searching of moving celestial bodies like asteroids, illustrating its versatility and application potential. The corresponding retrieval service has been deployed on the data server of the Lijiang Branch of the National Astronomical Data Center, providing technical support for online analysis and rapid processing of large-scale data.
The study indicates that a hierarchical indexing strategy based on file management can reduce system implementation costs while maintaining retrieval efficiency, offering an efficient and flexible technical path for small and medium-sized research teams working with massive survey data. The method can also be extended to other large-scale spatial data processing scenarios and holds reference value for astronomical big data management and analysis.
This work was supported by the National Key R&D Program, the National Natural Science Foundation of China, the Yunnan “Xingdian Talent Support Program,” and the Yunnan Basic Research Program.

Figure 1: Comparison of computational cost (left) and query time (right) between the hybrid indexing method and traditional single-level indexing methods. Image by REN.
Contact:
REN Siying
Yunnan Observatories, CAS
e-mail:rensiying@ynao.ac.cn