Modern TLS libraries allow to handle arbitrary data structures speculatively. This desired feature comes at the high cost of local store and/or remote recovery times: The easier the local store, the harder the remote recovery. Unfortunately, both times are on the critical path of any TLS system. We have proposed a solution that performs local store in constant time, while recover values in a time that is in the order of T, being T the number of threads. This solution, together with some additional improvements, makes the difference between slowdowns and noticeable speedups in the speculative parallelization of non-synthetic, pointer-based applications on a real system.
A comprehensive description and performance results can be found in the following publication:
New Data Structures to Handle Speculative Parallelization at Runtime. Alvaro Estebanez, Diego R. Llanos, Arturo Gonzalez-Escribano. International Journal of Parallel Programming, ISSN 0885-7458, Springer.