Recently, deep reinforcement learning has gained attention for solving routing problems due to its ability to learn complex patterns and optimize sequential decision-making. Despite their success in various routing problems, the applicability of learning-based methods on pickup and delivery problems (PDPs) is limited and only addresses one-to-one PDPs and their variants. However, the many-to-many PDP addressed in this paper is a practical variant of the routing problems and finds several applications in logistics and supply chains. We propose a novel reinforcement learning framework empowered by a multi-head heterogeneous attention mechanism (MHHA) and a decoder capable of exploring a diverse set of solutions, namely HAP, to generate efficient solutions for many-to-many PDP. The proposed framework incorporates an encoder-decoder structure designed explicitly for many-to-many PDP. In particular, the encoder consists of the MHHA mechanism to capture the many-to-many relationships and effectively model the flow constraints. Moreover, it is integrated with the multi-solution generator, polynet and masking scheme to generate high-quality diverse solutions. Additionally, we improve the solution quality of HAP using a warm starting variable neighbourhood search. As extensive experimental results demonstrate, the proposed method outperforms state-of-the-art metaheuristic and learning-based approaches. Additionally, in bi-objective (time and gap) comparison, the HAP lies on the Pareto efficient frontier, proving its effectiveness. Moreover, the HAP effectively generalizes to diverse problem sizes, unseen data distributions, benchmark datasets, and also solves one-to-one PDP more effectively than baseline methods, showing its adaptability and robustness. Finally, we conduct ablation studies to justify the proposed design