In episode 22 of Software Security Gurus, Matias Madou chats to Julie Tsai, Head of Information Security at Roblox. They discuss how to implement the right internal security structure, how to measure its success and quantify security risk, as well as how DevOps has evolved tactically to expand beyond tooling.
Want to nominate a Guru? Get in touch: www.softwaresecuritygurus.com.
Internal security structures and relations, in a growing organization: 00:55
Where to begin with measuring and quantifying security risk: 09:47
How is the DevOps/DevSecOps movement evolving? 16:14
Matias Madou:
Welcome to the Software Security Gurus webcast. My name is Matias Madou, CTO and co-founder of Secure Code Warrior. With me today, I have Julie Tsai. Welcome, Julie.
Julie Tsai:
Thank you, Matias. Thank you for having me here.
Matias Madou:
Fantastic to be on the show. Julie, do you mind sharing a few words about yourself?
Julie Tsai:
Not at all. So, I am head of security for Roblox. I have been working in [inaudible 00:00:35] years. My core specialties before I went into management were in DevOps and configuration management, which led me naturally to security. I was assistant administrator for 13 years, but now glad to be part of the larger security community.
Matias Madou:
Excellent. Excellent. Great. So for today I have three topics in mind. The first one is I saw you on a Forrester talk, which really sparked my interest because you were talking about structure within an organization and the relations within that structure. You talked about the tech chief, the risk chief, security. Essentially, what you were advocating for is if you have the structure right and it has to be simple, so many things go easy. Can you shed a little bit of a light on how you see structure?
Julie Tsai:
Yeah, absolutely. I see structure somewhat as the mechanics or the machinery behind the people and the culture. It's kind of the skeleton of how we're encouraging people to grow, how to be incentivized, where to expedite alignments and where to find healthy, constructive tension. So, I think it really is a good and important thing to do, whether you're at the beginning stage of starting your startup or you're evolving the company to that next level of the larger stage. So you keep evaluating whether or not the structural scaffolding that you've set up for the company is encouraging the right behaviors and alignments across the board.
Julie Tsai:
Security, I think, is a particularly fascinating and interesting area because I often like to tell people in our teams, it's a full brain, full person exercise. There's the technical aspects of it, which are infinitely deep and very broad, but then there's also the human component, which is at least as important in terms of people's experience, their appetite for risk and fear, and any kinds of cultural experiences they may be bringing in from other companies. So, when you're thinking about how to make things go efficiently, you want things to be lined up in places where certain things will get fast-tracked, certain voices will get prioritized. As the company changes, those things will change as well.
Matias Madou:
Yeah. I think that rapidly changes, right? If you go from a startup to a scale up to... I'm not sure if you're scalable. You're a full-blown company. With the structure that you have right now, were they the latest changes that you've done that you said, "Well, you know what, that's something that I'm very happy with," and that helps, again, drive the things forward?
Julie Tsai:
Yeah, absolutely. So I think one of the things that's an opportunity when you're in a... Once the company has decided that, "Okay, security is its own named function, and there's going to be a leader at the top who's representing and absorbing the responsibility for that," it's effectively matrixed, because usually in the earlier stage of company, you'll have all the individual departments responsible for their own security and reliability and delivery. It's all bundled up in one package. My earliest security roles were like that. Well, just as a matter of fact of running infrastructure or running technical operations, you are also responsible for the security and compliance that goes with it. That goes all right for awhile. But once the company has decided to name it now, now what you're actually responsible for is influencing behaviors across departments that you in most abouts do not directly control. So, the whole principle behind winning friends and influencing people becomes very, very important, as well as what are the structural incentives that can be set up.
Julie Tsai:
I think once you were in a matrix role where you do need to influence across the org and get things to point in a certain direction, it's important to line up both the soft and hard lines of partnership. This can take a couple of different forms. In the industry, it's very popular, and not just popular, effective to have some version of a champions or an ambassadors program across the org. Usually, a lot of people in the company, even if they don't work inside security will find security interesting or fascinating, or someday may stumble upon a crisis. And so, giving them the tools with which to analyze and handle it is only helpful to them in the company in terms of, "Hey, this is how you think about it. This is how you think about risks. This is where you need to push it all the way to the top or push it broadly across to your major leaders to make a quick decision on. These are areas where maybe sometimes it's less escalated." So, there's different things that can do.
Julie Tsai:
The other piece that's really, really important is that once a company gets to a certain size, it's impossible for any one department or one executive to know everything on an extremely granular level of what's happening. You need eyes and ears throughout the company and friendly ones. So, they're in a position to understand what's happening in their teams much better than you do. They understand their tech stack and they understand what's possible and what's not possible. So it's essential that you get that partnership across the board in terms of, "Hey, the thing that you're thinking in terms of the ideal may not work, but here's something that will. This is a different way to limit the risk. These are some other options here." And so, this is where, I think, the art comes in of really understanding your company and the culture and the people there, because we all start out with an ideal of what we want it to look like. But then the reality, when you come into different companies and your different strengths and weaknesses, you must tailor it for the situation.
Matias Madou:
That matrix structure that you're talking about, is that something that comes if you're moving from reactive to proactive? Because initially everything is just firefighting because it's not the first thing on your mind when you're a startup or when you're a scale up. But once you get out of that firefighting mode and you go to more of a proactive, does that have anything to do with that or not at all?
Julie Tsai:
I think the shift from reactive to proactive is first and foremost a function of resourcing. I think that when you either historically haven't spent time on the area or just don't have the people, then it will be a much more reactive function, where you're just going from fire to fire. There's a lot of fires, including ones that you may not even know about until it becomes very severe. And then once you get more people, you can start driving the discussions further up the chain. The good positive part about fires is they're very obvious. It's not an abstract question of what's the problem. There's clearly a problem. It's threatening the business. People can see it. It's very tangible. The problem, of course, is the damage is happening and the chance to have averted it is gone.
Julie Tsai:
So, as you have more people to both work the problems and to partner across the org, now you can start driving slightly more proactive discussions. You can start talking about... I think about it almost in a physics way of actual exploitation versus potential exploitation. Now, we can move it to a place like this might happen. We do have a vulnerability report. We've done the analysis. We've done the deep dive and these are things that can happen. So you start to have more work around the potentials of what's been happening. There's sometimes a lot of robust debate that's healthy there because people sometimes are going to challenge the risks in a healthy way and say, "Hey, I'm not sure if I think it's that much of a problem. We've gone along this long without it being an issue." And that's one of, I think, the biggest evolutions for our company and for humans, is this is maturing of the capability to see what might happen and what they can proactively do, because there is definitely a tendency for people to normalize risk, right?
Julie Tsai:
I've been going along for so long that I think it's okay, right? But as you go along, sometimes you don't realize that the terrain has changed. I think COVID being a great example, geopolitical problems. Some of the things we've seen in terms of major exploit rings and that kind of thing, some of those things can change on the outside. But then at the same time, things are changing on the inside also. Your company is changing. You're getting different people, more people. Maybe you're releasing more products and you have more attack surface. Maybe you're now more public, so you're more of a target. So, I think that security just like any other endeavor tech is fast changing and evolving, and it's always good to stay on your toes in terms of, if I feel where we are right now and measure where we are right now, both internally and externally, are we different than we were a year ago or a month ago?
Matias Madou:
Perfect segue into my second subject, which is measurement. You already touched on it. So with that, it's even more difficult than I thought it was because now we were talking about the never or an ever evolving landscape internally, externally. You have to measure that, so you have to constantly adapt. How do you do that? Where do you start with measurement? How do you quantify security risk ultimately?
Julie Tsai:
Start simple, start with one or two things, and it's a deceptively simple approach, right? Because actually to simplify a complex thing is the hardest thing. But usually what we'll start looking at is, "Okay. What's within our capability to measure today and with the instrumentation we have today?" If I don't have a lot of tooling or automation, I'll start with some manual metrics and say, "Okay. How many investigations are we having to do? How many different incidents are coming up? How many were proactive versus reactive, like incidents versus vulnerabilities? What are the things that we can start pushing that conversation we're upstream proactively, so that you're actually getting the problem at the root?" But each of these things starts to become a example that has to become a valuable teaching moment.
Julie Tsai:
There's ways of now analyzing things where you have a comprehensive exhaustive amount of information. In early stages, you don't have that, either the luxury or burden of that. In that case, what you have to do is take the telling incident or the telling detail and expand and infer from there in a very disciplined way so that people understand that it's not a fear and in doubt, but to know that, "Hey, based on..." Probably the best example I can think of in recent times is the positivity rate that we've measured for COVID, right? There's an assumption right now that, "Hey, there's stuff going on. We don't have full visibility. We won't get full visibility, but based on a certain metric that we can get, the sample size that I have, I'm going to infer beyond that to say that I believe that the general population is reflecting what I see in a much smaller sample."
Matias Madou:
Okay. So, do you have a dashboard you regularly look at in terms of measurement, or do you rely on these samples that you make bigger or-
Julie Tsai:
Sure. Yeah. With dashboards, the problem is there's so many of them, right? So yeah, once you get past that initial flush of needing to start to measure, if you have a motivated team and good technology, just very rapidly, you will have the opposite problem of too many to maybe use the dashboards. So I think that getting it back to those principled questions, those original ideas in the beginning, what were we trying to solve for? What are we trying to measure? If I had to take a one or two fact indicator of where we are, what would it be? Now, we can expand on that, especially for teams that are doing the deep dive. But I think, especially in terms of the broader communications people who are outside of the security department and above and beyond, it's really important to take a couple of simple things and just to consistently show a pattern, so that there's time for people to absorb and internalize it.
Matias Madou:
That was exactly what I was going to ask. What do you show, for example, to the board? What do you show upwards? Because there's so many things that are measured these days. What are key metrics that you think they should be aware of?
Julie Tsai:
Sure. I always start number one with incidence in terms of the number of security incidents that we're seeing, the type of security incidents, the genre of stuff that we're seeing and the impact levels, because I like to start with empirical things, but also you are needing to... From an impact level, you're looking at something that actually is or did happen to the company. So, just start there. It speaks to the common understanding around monitoring and around what the... Is it going to take down the business? Is it taking down our trust levels? Are we compromising volumes of data? So start with the thing that is most obvious and most important, the incidents. From there, I think it's equally important to talk about coverage level. So, this is something that's talked about in some metrics methodologies as, I think, specific coverage metric where that visibility, but it is...
Julie Tsai:
Let's suppose I'm seeing some sort of 5% failure rate on an issue, right? Let's suppose I'm seeing successful attacks on a particular system 5% of the time over a month. However, if I'm only covering or have a visibility into 10% of my total area, my unknown unknowns are actually a much greater risk. So, it's important, I think, to focus the lens of the conversation around where are the big risks. I think in a lot of companies, as you're growing and evolving, the unknown unknowns will usually be more than 50%. Now, more mature [inaudible 00:15:07] and more resourced, it should be much lower than that, but they also can have their pockets of lack of visibility usually because of complexity.
Julie Tsai:
So, I think that it's always good to go back to that question of like, "Okay. This is what I'm seeing in terms of issues that are coming up. How much am I actually seeing? How well are we actually seeing it?" Rather than getting too detailed about the smaller metric, make sure that we have a good understanding of our visibility coverage area. And then from there, of course, every board is going to want to understand how we're doing on our vulnerability of deviations and patching and the fundamentals. There's a reason that these things keep coming back.
Matias Madou:
The unknowns unknowns is very interesting for me. If we talk to organizations some basic stuff, "How many developers do you have?" "Eh." How many applications do they have? "Eh."
Julie Tsai:
Exactly.
Matias Madou:
It's very interesting, right? And then the larger the organization, the harder it gets to get all that information together.
Julie Tsai:
It totally does.
Matias Madou:
So, we went from the organizational structure to measuring. Let's go one level deeper. Let's go to DevOps and DevSecOps. Let's use a buzzword. But hey, how is that evolving? How do you see that evolving? What are your thoughts on that?
Julie Tsai:
Yeah, it's interesting. I think of it in terms of... So, I first encountered the term when coming out of it out of the configuration management and self-healing systems school of thought. And so, when it first emerged, I was wondering, how would we still find definition between people's expertise and roles. I have a belief that it's that acting about 10,000 hours or 10 years to achieve mastery in different areas. I believe this is true for both development, as well as systems individually. And so, at the time, I wonder, okay, how are we going to maintain that, understanding the specialization. But as time evolve, I saw that, well, almost every company that I would work at, and I worked at a lot of startups, you would end up with this almost predictable sort of battle, sometimes friendly, sometimes not so friendly between these camps. It became clear like, "Well, clearly something has to evolve here in terms of the conversation."
Julie Tsai:
It's tiring to have the same conversation every time going into each place and who's going to deploy and who's going to control these changes and what layer of the stack people own. It's like it happens repeatedly. So, I think that the really important thing it did is it made a really painful part of this probably dynamic, visible and understandable and created friendships and relationships across a wide array of technical specializations. Over time, I think that my understanding of how DevOps is and should work is that you built out your automation from the beginning part of the way that you image, the way that you manage configurations, rebuild and deployment in such an articulate and precise way that you have the accountability that you need in terms of there are always another pair of eyes or another team that's able to review something, a change that's going out, that things are happening very quickly because it's predictable and consistent and automated and you have very, very good hooks in terms of being able to roll back. This is my ideal perfect picture of how the switch should flow.
Julie Tsai:
What seemed to evolve over time in the businesses that these jokes that we would have about like DevOps has no ops, or this is where you give the keys to... You give route to the developers and just call it a day. Well, no, that was a joke. But we've gotten to the stage where I think we ended up having to iterate a little bit about how it manifests in your particular organization. What I definitely see coming into different companies is that there's pockets of a mission and pockets where it just either wasn't done or there's other kinds of shortcuts that are being done. And so, I think being able to deepen that understanding of, "Hey, we want to get to automation empowerment across the entire edge, or we want to get to precision," but there's a couple of work steps that need to happen in order to get to that place. We can't just achieve this Nirvana freedom without having done the work beforehand, and then understand that there's a lot of value to have at the end.
Julie Tsai:
Now, not coincidentally, this ends up being awesome for security, because you have really precise change. You have really precise configuration management. Things are logged. You know when things have gone in and out and you have total accountability on the users and authorizations, and that principle should flow throughout, even as we're evolving into more widespread microservices platforms and APIs, same principles about having very specific authorization between different kinds of services and calls, very specific ways of when you're going to allow a function or not. So, I think that that's part of how this needs to evolve, but it's... I think part of what we see is that there's a natural evolution of that thinking around these things between lumping together all the functions and then splitting them apart.
Julie Tsai:
And so, I think that we're at a particular phase where there's a lot of things, a lot of tools in the ecosystem have come together to try to abstract and connect all these things, especially in the virtualization, in the container space, and all of that to make it more seamless, and yet we've also built on more complexity. And so, I think that probably what needs to happen next is some way of simplifying how to truly embody those principles of really well-understood change control, really well-understood configuration and the image expression even within this complex tool environment that we now have. I don't have an easy answer for that.
Matias Madou:
So, I actually would like to go a little bit deeper on the security side and what I quite often see and tell me if that's a similar thing that you see, they try to go the DevOps way and they're on its way of doing it and they're progressing, but they quickly throw into security into the mix, and then it's all over the place. It doesn't matter if they threw it in too fast.
Julie Tsai:
Yes, that's right. That's right. And so, I think this is where security can help drive DevOps, in my opinion, back to its true meaning, as opposed to this idea of creating a mishmash or sort of a free for all. It's like, "Hey, actually we need to be able to have departments be able to do good code to change reviews. We need to have some predictability on these things. We need to have a good organization of different repository structures of code structures and that kind of thing." And so, I think that some of the needs that you get on the security and compliance side, where people may have had a temptation to defer deeper architectural work or deeper design work, it becomes a forcing function for better or for worse in terms of like, "Okay. Now, you actually have to prove separation of duties for secs. Now, you actually have to show a change management and predictable vulnerability management for these different compliance needs."
Julie Tsai:
But I don't think that it's at odds. It doesn't need to be at odds with agility. I think it comes back to trying to create your pipelines and work flows in a way that you may have to go a little bit slower to build in the right controls and the right design, but going slower to go faster ultimately. So, my hope is that security and compliance can help drive back some of the more what I think the true original meaning of DevOps is, as opposed to this idea of everyone who has access to everything or that everyone can be both the most senior developer and the most senior operator at the same time.
Matias Madou:
Yeah, absolutely. Couldn't agree more. Maybe one final question. You're at Roblox. So back in the day when I was a kid, and I know you're a mom, when I was a kid and somebody's dad was, for example, a firefighter, they were like, "Oh wow. His dad is a firefighter." You're working at Roblox. If your daughter goes to school, are you the popular mom? There's like, "Well, her mom works at Roblox." Is it like that?
Julie Tsai:
Well, you know what, we get a lot of... Now, right now, with shelter in place, we don't get the same gratification of seeing everybody to do that, but I will say that it is a big hit with... I got to sit in on a presentation we did for an elementary school class. It totally felt like being a virtual rap star because the kids are just blowing up the Zoom chat with questions. There were so many. They were so engaged. Or every once in a while, I'll be busy with something and my daughter will want to do something else. I'm like, "No, no, no, I'm working." She's like, "Oh no, you have to protect Roblox. It's for the kids." She's like, "I just love it so much. I want to help do it someday." So I was like, "Okay. This is great."
Matias Madou:
Nice. Does your daughter get more screen time because you work at Roblox?
Julie Tsai:
I think inadvertently she does, because she sees what we're doing all day and then she's doing stuff, but we definitely try to keep it engaging. I just showed her the way that you can use Roblox Studio to build games, and she was just so excited, like, "Oh my gosh, I could build my own game?" Exactly.
Matias Madou:
Nice. Nice. All right. Julie, thank you very, very much for coming on to the webcast. Really appreciated to chat. It was a fantastic chat.
Julie Tsai:
Same here. Thank you, Matias.
Matias Madou:
Thank you.