Performance is very dependent on hardware and load characteristics. A larger chunk size does seem to improve performance for sequential reads, but I cannot promise it will in your circumstances. If you are going to add a device, then changing the chunksize at the same time is little extra cost so you may as well.
You don't need to install a new mdadm to make use of it. Just compile it somewhere and run ./mdadm --args to run it. You can leave the installed on unchanged. If you have to shutdown or crash while the reshape is happening, then the installed mdadm won't be able to restart the array, so if the array holds '/', then you hit problems. But if the array is separate, and the mdadm you compiled isn't on the the array, then it is easy to restart the array with the new mdadm.
You do however need a new kernel - 2.6.32 at least. If your install has an old mdadm it is unlikely to have a new kernel. So the need for a new kernel might be enough to justify a livecd approach.