Description
Salary Range:
Salary commensurate with experience and qualifications
About SMU
SMU's more than 12,000 diverse, high-achieving students come from all 50 states and over 80 countries to take advantage of the University's small classes, meaningful research opportunities, leadership development, community service, international study and innovative programs.
SMU serves approximately 7,000 undergraduates and 5,000 graduate students through eight degree-granting schools: Dedman College of Humanities and Sciences, Cox School of Business, Lyle School of Engineering, Meadows School of the Arts, Simmons School of Education and Human Development, Dedman School of Law, Perkins School of Theology and Moody School of Graduate and Advanced Studies.
SMU is data driven, and its powerful supercomputing ecosystem - paired with entrepreneurial drive - creates an unrivaled environment for the University to deliver research excellence.
Now in its second century of achievement, SMU is recognized for the ways it supports students, faculty and alumni as they become ethical, enterprising leaders in their professions and communities. SMU's relationship with Dallas - the dynamic center of one of the nation's fastest-growing regions - offers unique learning, research, social and career opportunities that provide a launch pad for global impact.
SMU is nonsectarian in its teaching and committed to academic freedom and open inquiry.
About the Department:
SMU supports some of the state's leading high-performance computing (HPC) clusters. The M3 cluster boasts 1,077 TFLOPS, 181 nodes, 22,892 CPU cores, 122,880 accelerator cores, and 200Gb/s bandwidth. Meanwhile, the NVIDIA DGX SuperPOD offers 1,644 TFLOPS, 20 nodes, 2,560 CPU cores, 1,392,640 accelerator cores, and 200Gb/s bandwidth. Both clusters feature cutting-edge CPUs, accelerators, and networking technologies, high memory capacity per node, and provide advanced interactive experiences through the Open OnDemand Portal.
About the Position:
This role is an on-campus, in-person position.
Dedicated to supporting SMU's research community, the Senior System Administrator for High Performance Computing (HPC) works exclusively to design, build, maintain, operate and manage HPC systems at SMU. This position shares responsibility for university HPC technical support as member of a two-person HPC systems infrastructure team. This position also assists with Enterprise Linux support.
This position provides hardware, software and end-user support for SMU's growing number of research faculty and center
compute resources dedicated to advancement of SMU research activities.
Demonstrates advanced knowledge with all the technical tools required to perform the job. Subject matter expert in primary areas of support. Able to solve complex problems crossing multiple research disciplines with little or no escalation support. Effective technical resource to others to resolve problems and implement projects.
Essential Functions:
- Design, plan, deploy, administer services & troubleshoot issues related to HPC services for research at SMU.
- Install and maintain cluster environments and provision systems using automated installation methods. Manage/maintain Lustre parallel file system and NFS storage. Manage/maintain InfiniBand high performance interconnect fabric. Configure, manage, monitor SLURM scheduling & queuing system.
- Develop/maintain programs/scripts that aid in operation and automation of administrative tasks using various shell and scripting languages (bash, Perl, Python) required by systems dedicated to research. Compile, install, and port software in support needed by SMU researchers. Build and deploy open source and vendor/commercial software required by researchers.
- Plans projects, communicates with end users, and management, provides updates and expectations management.
- Document all configurations, procedures, and changes. Document system administration procedures for routine and complex tasks.
- Diagnose and resolve system and operational problems with research systems. Work with researchers and constituents to diagnose and optimize workloads. Participate in on call support of research infrastructure.
- Coordinate with vendors to resolve hardware and software problems. Ensure hardware firmware and software revision levels are maintained at the appropriate level on HPC research systems.
- Keep current with research computing, HPC technology trends and best practices.
Qualifications Education and Experience: Bachelor's degree is required.
A minimum of six years of full time Linux system administration experience in a large computing environment is required.
Knowledge, Skills and Abilities: Candidate must demonstrate to have a clear, professional communication in order to work with team members and customers of diverse technical abilities. Experience with NVidia DGX, Containers and Kubernetes is desired. Candidate with knowledge of reporting tools including XDMO is preferred. Also, candidate with work experience installing and maintaining clustered environments and provisioning systems using automated installation methods is preferred. Candidate must demonstrate direct experience working with InfiniBand and knowledge of configuration and management of SLURM or other scheduling and queuing systems.
Candidate must also demonstrate strong written communication skills. Candidate must possess strong problem-solving skills with the ability to identify and analyze problems, as well as devise solutions. Must also have strong organizational, planning and time management skills.
This position participates un a 24-hour, 7-day on-call support rotation and off-hours maintenance windows.
Preferred Skills:
- Familiarity with DDN hardware and the Lustre file system
- Proficiency in supporting Nvidia/Mellanox InfiniBand networks
- Competence with Bright Cluster Manager
- Knowledge of Nvidia DGX systems
- Experience with Kubernetes
Physical and Environmental Demands: - Sit for long periods of time
Deadline to Apply: Open until filled.
Priority consideration might be given to submissions received by December 4, 2024.
EEO Statement: SMU will not discriminate in any program or activity on the basis of race, color, religion, national origin, sex, age, disability, genetic information, veteran status, sexual orientation, or gender identity and expression. The Executive Director for Access and Equity/Title IX Coordinator is designated to handle inquiries regarding nondiscrimination policies and may be reached at the Perkins Administration Building, Room 204, 6425 Boaz Lane, Dallas, TX 75205, 214-768-3601, accessequity@smu.edu.
Benefits:SMU offers staff a broad, competitive array of health and related benefits. In addition to traditional benefits such as health, dental, and vision plans, SMU offers a wide range of wellness programs to help attract, support, and retain our employees whose work continues to make SMU an outstanding education and research institution.
SMU is committed to providing an array of retirement programs that benefit and protect you and your family throughout your working years at SMU and, if you meet SMU's retirement eligibility criteria, during your retirement years after you leave SMU.
The value of learning at SMU isn't just about preparing our students for the future. Employees have access to a wide variety of professional and personal development opportunities, including tuition benefits.