Site Reliability Engineer
Permanent Position – Toronto
About the Company
Riskfuel is pioneering the use of deep neural networks to accelerate the proprietary financial models used to calculate the values and the risk sensitivities of the financial instruments held by capital markets and insurance firms. Given the size of these portfolios and the many different risk sensitivities required, these models are run in large, overnight batch processes spread over thousands of servers. Riskfuel dramatically accelerates the process by constructing functionally equivalent models using DNNs. The Riskfuel models are a million times faster so that what once took all night to run can now be completed in seconds. With Riskfuel models, you get real-time valuation and risk management … and a massive reduction in the compute workload, saving money and reducing the firm’s environmental footprint.
We are doing exciting cutting-edge research and our technology is winning major industry awards. The work is varied, interesting and fast paced, with lots of opportunity for you to make impactful contributions. We are looking for talented individuals to work and learn alongside colleagues who are leaders in the field.
See more at our website Riskfuel.com.
About the Position
At Riskfuel, we work with big numbers. We run the client’s slow financial pricing models millions of times to generate the training data that we need for training our neural networks. The resulting massive training datasets are used to train our deep neural networks.
At Riskfuel, you’ll get a chance to build something from scratch. This role is very hands-on and you will be responsible for mission critical systems. You’ll be working with a big-hearted and talented team and will be an integral part of Riskfuel’s growing success.
Here are some of the things you’ll be working on:
- Working with and supporting our machine learning engineers
- Designing large scale distributed Kubernetes based systems
- Managing bare-metal and cloud Kubernetes clusters
- Scaling up Riskfuel’s distributed storage cluster
- Working with state of the art hardware (e.g. NVIDIA A100)
About the Candidate
- Experienced with Unix/Linux operating systems internals as well as with networking
- Experienced working with cloud systems and cloud providers
- Experienced with containers and container orchestration tools (Docker, Kubernetes)
- Experienced with automation tools like Ansible
- Is able to install a Kubernetes cluster
These are nice to have but not strictly needed:
- Experienced with rook and/or Ceph
- Experienced designing and deploying CI/CD pipelines
- Interested in web development and/or machine learning
- Experienced with router hardware and software
How to Apply
To apply to this position, please provide a resume and any additional information that you feel demonstrates your experience. We look forward to meeting you!