Publications (31) Josue Feliu Perez publications View referenced research data.


  1. SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors

    Proceedings - 2024 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024


  1. CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

  2. Cloud White: Detecting and Estimating QoS Degradation of Latency-Critical Workloads in the Public Cloud

    Future Generation Computer Systems, Vol. 138, pp. 13-25

  3. Rebasing Microarchitectural Research with Industry Traces

    Proceedings - 2023 IEEE International Symposium on Workload Characterization, IISWC 2023

  4. Speculative inter-thread store-to-load forwarding in SMT architectures

    Journal of Parallel and Distributed Computing, Vol. 173, pp. 94-106

  5. Thread-to-Core Allocation in ARM Processors Building Synergistic Pairs

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT


  1. A Neural Network to Estimate Isolated Performance from Multi-Program Execution

    Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022

  2. DeepP: Deep Learning Multi-Program Prefetch Configuration for the IBM POWER 8

    IEEE Transactions on Computers, Vol. 71, Núm. 10, pp. 2646-2658

  3. Effect of Hyper-Threading in Latency-Critical Multithreaded Cloud Applications and Utilization Analysis of the Major System Resources

    Future Generation Computer Systems, Vol. 131, pp. 194-208

  4. The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

    ACM Transactions on Architecture and Code Optimization, Vol. 19, Núm. 2

  5. VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors

    IEEE Transactions on Computers, Vol. 71, Núm. 6, pp. 1386-1398


  1. ITSLF: Inter-thread store-to-load forwarding in simultaneous multithreading

    Proceedings of the Annual International Symposium on Microarchitecture, MICRO


  1. Bandwidth-aware dynamic prefetch configuration for IBM POWER8

    IEEE Transactions on Parallel and Distributed Systems, Vol. 31, Núm. 8, pp. 1970-1982

  2. Precise runahead execution

    Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020

  3. The forward slice core microarchitecture

    Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

  4. Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors

    IEEE Transactions on Parallel and Distributed Systems, Vol. 31, Núm. 2, pp. 359-373


  1. Precise runahead execution

    IEEE Computer Architecture Letters, Vol. 18, Núm. 1, pp. 71-74


  1. A workload generator for evaluating SMT real-time systems

    Proceedings - 2018 International Conference on High Performance Computing and Simulation, HPCS 2018

  2. Designing lab sessions focusing on real processors for computer architecture courses: A practical perspective

    Journal of Parallel and Distributed Computing, Vol. 118, pp. 128-139


  1. Improving IBM POWER8 Performance Through Symbiotic Job Scheduling

    IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Núm. 10, pp. 2838-2851