Joined Rakuten Group as a new graduate in April 2021
Department…Travel Development Department (TDD)
Origin…Aichi Prefecture, Japan
Hobby…OSS activities, watching anime
Hi, this is R-Hack editorial office. Yama, who studied machine learning and molecular dynamics method, worked as an intern and part-timer while in school before joining Rakuten Group as a new graduate in April 2021. In his first year as a DevOps engineer, he served as a project leader and has been performing technically challenging tasks for Rakuten Travel such as late-night maintenance.
We asked Yama about his work, what he finds rewarding as a DevOps engineer, and the company environment.
ーーCan you start by introducing yourself?
Hi, everyone. I’m Yama, engineer of Travel Development Department (TDD). After completing my master’s degree in mechanical engineering, I joined Rakuten Group as a new graduate in April 2021. I like to work on OSS and watch anime. On my days off, I often spend time at home.
ーーYama, you studied machine learning and molecular dynamics method in graduate school. What made you interested in the IT industry?
I had the opportunity to use Python and C language in my research at graduate school, and from that I became interested in the IT industry. Participating in Rakuten Group’s internship was the trigger for me to join the company. I also worked part-time here for a long time while in school.
ーーThere are many types of engineering jobs, but why did you choose to become a DevOps engineer?
I chose to become a DevOps engineer because I wanted to be strong not only in infrastructure but also in applications. The people I worked with when I worked part-time had thorough knowledge of both applications and infrastructure. They became my role models.
Currently, I’m involved in both applications and infrastructure, such as developing in-house tools for infrastructure operation automation and building infrastructure.
ーーWhat do you feel most rewarding as a DevOps engineer?
I think our strength and reward lies in our ability to solve problems specific to large-scale services by leveraging our experience in a wide range of layers and technology stacks.
All the services our department is responsible for are very large, and traffic during Rakuten SUPER SALE is several times higher than normal. Handling this traffic while maintaining system stability is technically very challenging and requires extensive knowledge from hardware such as load balancers and switches to middleware such as web servers and databases. And because the service is so large, the number of instances we manage is in the thousands. To operate these with a limited number of people, we need to use programming languages and infrastructure tools (IaC) effectively.
ーーYou worked here part-time when you were a student. What kind of work did you do during your first year after joining the company?
I developed in-house tools for the automation of infrastructure operations. Our department is responsible for the infrastructure of many large services, so we need to handle many requests to build servers and install middleware on a daily basis. In order to handle these requests as automatically as possible, I developed an in-house tool that, after entering the necessary GUI-based information, runs Terraform and Ansible based on the information to complete infrastructure operations without manual work.
Through this development, I improved my skills in application development using Python and TypeScript and became familiar with the internal implementation of IaC tools such as Terraform and Ansible. Also, as I was entrusted with a leadership role in this project, I improved my human management skills and communication skills with stakeholders in other departments.
ーーYou were already a project leader in your first year. What kind of work were you involved in during your second year?
I built and operated a log management system. The services we handle process many requests, and the number of logs spit out from web servers and applications is huge, amounting to tens of thousands of logs per second. Log management systems are tools to collect and manage all the logs in one place, using software such as Elasticsearch, Kibana, Logstash, Filebeat, and Kafka.
ーーWhen I hear tens of thousands of logs per second, I wonder if the system capacity can keep up. How did you deal with this?
When I took over this project, there was not enough capacity, and a major revamp was needed. So, we decided to rebuild the log management system, identifying bottlenecks in the existing system, reviewing the architecture, estimating the number of servers and capacity, and creating a cost proposal.
We performed various verifications and adjustments of the behavior, and when we saw a behavior that was not described in the specification, we sometimes went to GitHub to see the source code itself to confirm it. It was also necessary to understand the details of each software, such as Elasticsearch, to identify where the problem was and to review the technical configuration used. I also had to consult with my superiors several times in order to get approval from the General Manager for the project. This was also a very valuable experience for me.
ーーYou experienced so much in two years. What is the most impressive work you have done so far?
The most impressive was the late-night maintenance on Rakuten Travel in January 2023. The maintenance was for renovating the infrastructure of Rakuten Travel, where I was primarily in charge of infrastructure operations. The service stopping for just one minute could affect many users, leading to economic loss. Therefore, it was critical that operations were performed accurately and without error, and that operations were completed and services were resumed on time as scheduled.
Also, because the operations performed in this maintenance were technically challenging, we collaborated with many engineers from various departments to conduct multiple rehearsals, list possible problems, and draw up improvement plans to speed up the operation time. As a result, we were able to complete all operations on schedule. The technical difficulty and importance of the operation and the collaboration with engineers from other departments made this maintenance deeply memorable.
ーーYama, many of your team members are foreign nationals or have roots in countries outside of Japan. How do you communicate with each other?
We basically communicate in English.
In addition, a regular meeting is held every day for about an hour, so any problems can be discussed and resolved at that time. Also, 1 on 1 meetings with the manager are set up regularly, so you are free to ask any questions, such as about your career, there.
ーーDid it take time for you to get used to English communication?
Before I was assigned to my current department, I was anxious about communicating in English, but once I was assigned, I gradually got used to it and it became less of a struggle. Because I’m no longer afraid of English, I now actively look for English-written literature when searching the web. From a technical point of view, I’m also glad to be accustomed to English as I’m now able to have discussions in English during OSS activities.
ーーWhat would you like to accomplish at Rakuten Group in the future?
I want to ensure that our service continues to grow steadily. As a large-scale service, the infrastructure is also large and requires many daily operations to keep the service growing. This causes high risk, and if a problem were to occur that affected the service, it would cause inconvenience to many users, resulting in significant economic loss. Therefore, I’d like to reduce operational risks as much as possible and create a technical and organizational structure that will let us quickly detect unexpected events should they occur. To accomplish this, knowledge of both applications and infrastructure are required. I’d like to keep developing these skills.
Today’s interview was with Yama, who is steadily gaining experience and is improving his technical skills. I look forward to his continued success!
Come work with us!
Travel Development Department (TDD) is looking for colleagues to join our team in developing new services, performing daily operations, and making improvements! Recruitment is open for a wide range of positions, including engineers and product managers. We look forward to your application.