I agree with Isaiah - your project sounds very interesting. It sounds like you are on the right track on many topics and I think you understand that there’s a lot of work involved in making these things work but there are a lot of tools for you to try out, and this forum will be a good place to post specific questions (or examples of where you have gotten things to work).
A couple of things that would really help you make use of this community would be open data and open development. If you are able, for example, to post deidentified datasets and start a public github repository with your code so that people can try things themselves and give you very specific advice and feedback. Of course you’ll need to careful to get permission from your institution and colleagues for anything you share publicly.
In terms of getting stared, you could have a look at the new CaseIterator as one tool that could help you.
Regarding the data formats, if you plan to work with 15,000 studies you will need to be super careful organizing things and automate as much as you can.