Privacy, Security, Society – Computer Science for Business Leaders – July 2016

Privacy, Security, Society – Computer Science for Business Leaders – July 2016


DAVID J. MALAN: All right, welcome back. This is day two of two, Computer
Science for Business Leaders. So a little pop quiz perhaps. What did we do yesterday? So the answer is on the board. The goal is to explain. So what was computational
thinking all about? Whoever makes eye contact first. AUDIENCE: Input, output. DAVID J. MALAN: OK, good. So inputs into algorithms
gives us outputs, and it’s a way of framing your
thought processes and problem solving techniques more
methodically and generally bringing to bear to problems ideas
that have been inspired by and embraced by computer science. And we looked at some
mechanical examples like drawing pictures to reinforce
the precision with which instructions are necessary, as well as to actually
solve problems like a phone book more generally. Internet technology. So this was a loaded discussion with
a whole alphabet soup of topics. But any questions or confusions that
remain from those various acronyms and technologies? If no, it sounds like I could ask
a question like, what does GHCP do and everyone should have an answer. AUDIENCE: It configures Macs and PCs. DAVID J. MALAN: It configures
Macs and PCs to do what? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. So to get the DNS server IP
addresses as well as their own IP address, their so-called subnet mask–
which we didn’t talk too much about– as well as their own default router. And then that was intermingled with
any number of other technologies, but there too there was
this theme of layering. So HTTP and HDPS we claimed were
built, like, on top of the Internet. These are applications
or services, so to speak, that we know more casually as the web
or the formerly the world wide web, though no one really says that anymore. And so what do HTTP
and HDPS actually do? You were to define them, what is it? What is HTTP, the protocol? Katy? AUDIENCE: It’s protocol that
knows to go to that IP address. DAVID J. MALAN: OK, the protocol
that knows to go to that IP address. Technically not quite
insofar as HTTP actually doesn’t know anything about
the destination IP address. That’s the lower level. So the operating system and the browser
know, but not necessarily HTTP itself. What was the extent of the message that
we had to send in accordance with HTTP? Yeah? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Exactly. We sent a message And
just really a one, maybe a two-line message that just said
get slash where slash means give me the default page from some website. And then HTTP slash 1.1, which
is just the version number. And then I also typed in the
host to remind the server what host name we wanted,
not IP per se, and that was enough to get the
server to send me a cat or to send me Google
search results or the like. All right, we then transitioned
to cloud computing. So now you all had probably
heard about cloud computing before yesterday, but
now going back home or to work you can perhaps explain
it a little better to someone. So if someone asks you tomorrow
back at home or work, oh, what is cloud computing by the way, what
is your well-formulated answer to that now? Everyone’s very preoccupied. What is cloud computing? What is the cloud? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. Cons might be complexity,
reliance on third parties to keep your own website
or workflows alive. And you described it as a framework,
and that’s– I think that works. I would be careful not to define it
too precisely as a framework insofar as it’s really just a
collection of technologies, a collection of
Internet-based technologies that people are leveraging
to store their data, to do their computation, and
any number of other processes. And if you had a moment last
night to read the New York Times article, for instance, that was a
wonderful use– years ago now– of how you might spin up, so to
speak, turn on a whole bunch of servers or virtual
machines all at once, do some interesting computation
like generating PDFs, and just turn them off. And so for just a few dollars
or a few hundred dollars later, you’ve converted millions
of articles, in that case, to PDF without having to invest a single
dollar in your own physical hardware, let alone the configuration and
installation of all of that. We talk more generally about some
silly acronyms like IAAS and PAAS, which generally refer to
infrastructure as a service. Something like Amazon, which we
described as pretty low level. You have to know a little
something about your load balancers and where you’re storing the data
and where your servers are living, and that’s all fine and good
but it’s fairly low level. And we then transitioned
more to platform as a service as a topic, which was
something like Heroku. Amazon has something
called Elastic Beanstalk, Google’s long had Google
App Engine and similar, which are higher-level services built
on top of these lower-level details. And then there are still
things like software as a service, which is an even fancier
term for really just describing a web application that does useful stuff. And you really don’t care about the
language it’s written in and in turn you really don’t care about the
infrastructure it’s running on. So, again, this theme of
layering or abstraction has kind of hit us in
any number of ways. Then lastly we looked
at web development. CSS and HTML, little bit of
time on the mechanics of that. It’s not a programming language. It’s a markup language, and more on
the distinction there later today. And then we looked at Cloud 9
as just an example of a software as a service, web-based programming
environment that’s somewhere in the cloud but we
don’t really know or need to care where that is
at the end of the day. And talked ultimately about
what you can actually do and how a web server and
browser inter-communicate. So just so that we cross off everyone’s
to do lists while we’re still here in Cambridge today, is
there anything you’d like to add to today’s list,
which includes the following? So we’ll start today with a
focus on privacy and security motivated by a couple
of specific examples, followed by any number of
directions in which folks might want to go related to those two topics. Two, looking at
programming and some basic constructs in certain
algorithms and data structures. We’ll tease apart what
those are and these are sort of ingredients
that you might bring when designing a piece of software
at a white board or in a computer ultimately. Then technology stacks, which is a
general term describing any number of these kinds of things here. Sort of additional ingredients,
higher-level ingredients like actual tools and
software and technologies that you might use to
solve problems in software. And then finally a look at
web programming in particular and some of the technologies
related there too, some of which we scratched the surface of
yesterday like databases, which we’ll look at in a bit more detail. But is there anything else? Yeah? JP? AUDIENCE: Block chain. DAVID J. MALAN: Block chain. OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. Block chain in the context of Bitcoin
and such or a different context? AUDIENCE: The logic of it. DAVID J. MALAN: The logic of it. OK. All right. AUDIENCE: What we’re trying
to do is to [INAUDIBLE]. DAVID J. MALAN: Yep. OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. Other topics to add to today’s list? Yeah? [INAUDIBLE] AUDIENCE: Microservices. DAVID J. MALAN: What’s that? AUDIENCE: Microservices. DAVID J. MALAN: Microservices. OK. So microservices. AUDIENCE: Just, like, the notion
of splitting up monolithic apps into independent microservices. DAVID J. MALAN: Sure. And this relates nicely to
our brief discussion yesterday of containerization, which is
the sort of infrastructure that tends to support stuff like this. Other topics? Victor? AUDIENCE: You touched upon
content delivery networks but I’d like to know how it works in the
context of the different [INAUDIBLE]. DAVID J. MALAN: OK, sounds good. Content delivery networks or CDNs,
like Akamai was one such example. Sean? AUDIENCE: It was an article
last night too in Hadoop. DAVID J. MALAN: Oh, Hadoop? Oh, yes. The tech– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK, sure. Yes, [INAUDIBLE]? AUDIENCE: Maybe [INAUDIBLE] DAVID J. MALAN: OK. Well, maybe we’ll end
with that cliffhanger. OK. AUDIENCE: Take some inspiration there. DAVID J. MALAN: Stay till the end today
and you’ll find out about the future. Alicia? AUDIENCE: You had talked
about programming– I see these articles about who
we need to program, [INAUDIBLE] to know how to program, but
what’s behind that [INAUDIBLE]? DAVID J. MALAN: Oh, OK. So CS for All, for instance, which
is one of the hashtags going around. OK, so– OK. Other topics? Yeah? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Oh, interesting. I’m sure we’re not going to do
it well is the short answer. But– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Sure. We can weave that into this
morning’s discussion in particular. Other topics? Yeah? Alicia? AUDIENCE: More a business topic but
[INAUDIBLE] Google Docs or Microsoft [INAUDIBLE] talk about
the risk of [INAUDIBLE] DAVID J. MALAN: OK, so– yeah, so– AUDIENCE: [INAUDIBLE] in the cloud tool. DAVID J. MALAN: Cloud tool. Sure, let’s weave that in, perhaps into
the privacy and security discussion as well. Other topics? Anything at all? Yeah? AUDIENCE: [INAUDIBLE] about
machine learning maybe. DAVID J. MALAN: Machine learning. OK. I’ll put an abbreviated ML. Sure. Other topics? Sort of creating a problem for myself
with this very long list today. Let me propose this. I’ll do my best to weave
these into today’s framework, but let me also encourage
you to remind me or come up during any of the breaks or
lunch time or after today too if we don’t quite hit on everything
lest touch on something of interest. And worst case I can follow up
via email with some references if we don’t get through
everything today. All right, so you might have enjoyed
if you hadn’t in previous months the John Oliver segment
on Last Week Tonight, which is a fun excuse
for homework to watch an 18-minute segment about
encryption, but he actually does a wonderful job in
general at peeling back the layers of some interesting societal
trends, some of them technological. Among them, for instance,
has been encryption. So some months ago the FBI really
wanted to get into someone’s iPhone and Apple said no. They weren’t willing to
help with this request. But does anyone want to explain
a bit more of the context and perhaps some of the technicalities? Like, what was it the FBI
wanted Apple to do for them? Either high-level answer
or low level is fine. Yeah? Griff? AUDIENCE: Wanted Apple
to write a small program to bypass the security
feature of the phone. DAVID J. MALAN: OK, they– AUDIENCE: [INAUDIBLE]
the password reset. DAVID J. MALAN: OK. So they wanted Apple
to write some software to bypass a security mechanism of
the phone related to password reset. And can someone dive in deeper there? What exactly is the security
feature in question that some of us are perhaps using? Sean? AUDIENCE: An iPhone [INAUDIBLE] so
many times, you get one more time or it’ll erase the phone. DAVID J. MALAN: Exactly. Yeah. So this is optional. So by default your phone
should not do this, since I’m sure there are people who will
enter their own password wrong 10 or 11 times maybe late at night sometimes
too or after a night on the town and that’s bad if it just
deletes itself automatically. But this is also a good feature. Why? Why might you want your phone to
sort of figuratively self-destruct? AUDIENCE: If someone’s tampering
with it and it automatically wipes the information [INAUDIBLE]. DAVID J. MALAN: Exactly. If it’s someone malicious or
your phone’s been lost or stolen or it’s a nosy roommate or such you
might want the data to self-destruct, provided it’s hopefully backed
up somewhere either on iTunes or in iCloud, in the cloud where
it’s completely safe, I’m sure. So you at least assume
that it’s backed up or that you don’t care about
the data at that point. And the worry in this particular
case was that the phone in question might have had this feature enabled
because there was apparently some evidence in like the iCloud
server logs that in the past he had enabled this particular feature. But it was unclear and the
FBI, as I understand it, didn’t really want to take the risk of
trying to guess the person’s password and have the phone
self-destruct, so to speak, whereby it deletes all
of the data on the phone. Now let’s pause for just a moment. What would it mean to
delete data on the phone, especially in the light
of yesterday’s chat? Grace? AUDIENCE: It’s just
deleting the directory? [INAUDIBLE] DAVID J. MALAN: It’s a good question. So hopefully Apple’s
not so unsophisticated as to just forget where
the data is and market this as a self-destruct feature,
though it wasn’t all that long ago that even industry
miscommunicated these kinds of things. If I can find a screenshot–
DOS delete all data erased. I’m trying to remember an error mess–
or a warning message from yesteryear. Delete secure erase–
don’t run DOS anymore so I can’t just run the command for you. Let’s see. DOS format command. So DOS, Disk Operating System,
is what we all had before Windows and newer versions of Mac OS came about. And this, for instance, was the
message you would get years ago if you tried to format your hard drive,
where formatting is in most people’s minds equivalent to erasing it. And it surely seems to be the case
that all data on non-removable disk drive C, which is your default
hard drive, will be lost. So that sounds really,
really bad, and it is. You’re certainly creating a problem for
yourself if you regret this decision. But what do they really mean by lost? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. It’s just– and it’s not
that you can’t find it. It’s just harder to find it. Your data gets lost in the
sense that it’s still there. The zeros and ones, the little
magnetic particles back in the day were still oriented north-south,
south-north representing ones and zeros as you had left them, but the directory
entries were simply forgotten. So the data’s still there but the
recollection thereof is no longer. And as an aside if you’ve ever wondered
why in the Windows world or Mac world some people say folders and
directories interchangeably– like, a directory really is just that table. It’s like a directory in a building
looking up someone’s name and office number. That they’re equivalent,
we just represent them iconographically with folder icons. This just forgets where the data is. All right, so hopefully Apple’s not
just doing that and forgetting the data. And they’re not. What are they doing instead? Or what should you do instead if you
really want to delete someone’s data? AUDIENCE: Write it over. DAVID J. MALAN: Write it over? In what sense? AUDIENCE: Jibberish. DAVID J. MALAN: Jibberish. So you only have zeros
and ones at your disposal, so what does gibberish
mean in that context? AUDIENCE: Random. DAVID J. MALAN: So random. So if you have the ability to generate
random numbers, which to some extent computers indeed do, you can just
change all of the zeros and ones to just random zeros and ones so
that it looks like random noise. It is noise and with
very low probability can you actually resurrect that data. Now, as a technical
aside, back in the day, especially in the days of magnetic
storage, it’s long been– well, I think it’s been the
case in super old hardware that when you reorient like a
1 to a 0 it doesn’t necessarily go, like, 180 degrees. There might be cases where it goes,
like, 179 degrees, for instance, thereby leaving a little bit of a
hint that that bit wasn’t necessarily what it appears to be right now. But modern hard drives,
there’s just so much data, everything is so packed, this
really isn’t a concern these days. And so simply randomizing the
data in this way would work well. Overwriting everything with zeros,
which is very common, would work well. Or ones, equivalently, would work well. But that’s not actually what Apple
does because that would actually take a decent number
of seconds or minutes, especially now that you have gigabytes
and gigabytes of space in your phone. They instead do something
that’s actually pretty fast. If you’ve ever securely
erased your iPhone, it’s mostly the boot up process that’s
the slow part not so much the wiping. AUDIENCE: [INAUDIBLE] they just
reinstall the iOS [INAUDIBLE] DAVID J. MALAN: Yeah, OK. So maybe they’re
reinstalling the software. And to be honest, they’re
probably not reinstalling all the apps because
those are probably stored in a different portion of memory. AUDIENCE: It’s the same
when you reset your iPhone. DAVID J. MALAN: OK. AUDIENCE: You can do that
wherever you are [INAUDIBLE] DAVID J. MALAN: Oh. So, yes, it does redownload applications
that you had previously installed. Absolutely. But that’s separate from the data. So what is it actually doing
with respective user data– your emails, your text messages,
and other things you had on there? When Apple shows that progress bar or
Android equivalent, what’s it doing? It’s not overwriting the zeros and ones. That– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Not even. It actually doesn’t go
and touch all of the bits. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. It’s more sophisticated than that. So we’ll talk about this in a
little more detail in just a bit. But encryption, again, is the
art of scrambling information and most encryption assumes
that you have some secret key. In particular, suppose I were
to write, let’s say, this. That’s an encrypted message
for you all this morning. What does it say? AUDIENCE: Hi? DAVID J. MALAN: Hi. OK, who said hi? OK, why do you say hi? AUDIENCE: It’s the alphabet plus 1. DAVID J. MALAN: It’s
the alphabet plus 1. And fortunately I chose a short
word so we could sort of brute force this by just figuring out what’s a
two-letter word we can think of which is not unreasonable strategy either. This indeed means hi. But what was the cipher, the algorithm
with which this is encrypted? Well, as you glean, Dan, it looks
like I took the original word and just added one to it. So H plus one letter of the
alphabet gives me I and I plus 1 gives me J and thus
is born my ciphertext. So this might be called
plaintext by convention and this would be called ciphertext,
which is the resulting encrypted text. Now, of course, this is not a very
secure cipher because my choice of key was not very good. So what if I choose a different key? How about if I write this
message to you, which not meant to be an acronym or text speak. What’s this say? AUDIENCE: Same thing. DAVID J. MALAN: Same thing,
but what’s different? AUDIENCE: Algorithm. DAVID J. MALAN: Yeah, well the
algorithm is the same I’d argue. The input is slightly different. There’s two inputs,
indeed, in this case, not only the plaintext
but also the number. So if this previously was
1, now I’ve gone and done hi plus 2, which is what’s
ultimately given me J and K. So how many possible keys are there
for this algorithm, for this cipher? The key, again, is the number I added. What would you argue is the
total number of possible keys? AUDIENCE: Infinite. DAVID J. MALAN: Infinite’s true,
but practically speaking not really because many of them are
equivalent to each other. AUDIENCE: 25, 26. DAVID J. MALAN: Yeah, 25 or 26. Why do you say that, Katy? AUDIENCE: You start adding– one is
the same if I have 26 or 27, right? DAVID J. MALAN: There’s 27. Yep, exactly. Yeah, so if you’re assuming
we’re confining this cipher to only operate on letters,
to only take letters as input and only produce letters
as output, there’s going to be a corner case so to
speak, a sort of extreme scenario that you better anticipate
lest your algorithm be buggy. And the scenario I’m thinking of is
what if the letter is Z, like you’re encrypting the word “zoo” for instance? Well, suppose the key is 1. What the o is easy. It just becomes “PP.” But what’s Z become? It’s like A feels like a nice,
clean solution where you just wrap back around otherwise it’s going
to be like some funky punctuation symbol or something like
that, which is fine, but you have to decide in
advance what you’re going to allow as your inputs and outputs. So in this case, if
we’re just adding numbers and we’re comfortable with the
idea of wrapping around from Z to A and so forth, well then we only
have 25, maybe 26 possible keys. But 26 is kind of silly
because by the time you add 26 letters it just
wraps back around to A. So there’s this old joke on the
Internet which geeky people find funny. So there’s this algorithm
called rot13 which back in the day on message
boards, bulletin board systems, used to be this sort of low impact
way of encrypting information like movie spoilers and the like. So it would look like
nonsense but if you really wanted to see the movie spoiler you
could rotate everything by 13 places. So the joke on the
Internet is, well, you can– if you want your data to be
twice as secure, you should use rot26. OK, one person finds this funny. rot26 of course would be stupid because
you’re just rotating from A to A, B to B, and so forth. And to your point earlier, Avi, about
infinite, true but infinitely many of those numbers are
equivalent to others so it all boils down to
still just 25 useful values. But even then, these values
aren’t all that useful because Dan cracked this cipher
in just a couple of seconds. So what might be a better way
of encrypting information? And, again, the goal here is to
come back to Siobhan’s conjecture that maybe Apple is just forgetting an
encryption key, whatever that means. But we’ll come back to that. What might be more secure than this
rotational cipher otherwise known as a Caesar cipher where you just
add some fixed number to each letter? What else might you do? AUDIENCE: [INAUDIBLE] have each letter
could correspond to a different letter but it’s not the same [INAUDIBLE]. DAVID J. MALAN: Yeah. AUDIENCE: You create a key where each
letter and other letter can randomize [INAUDIBLE]. DAVID J. MALAN: OK, good. Let’s take those both in turn. So instead of using just
one key, let’s say– let me try to simplify the definition
as just we’ll use one key per letter. So for instance let’s just so we have a
longer word, if we had the word “hello” and we were just using a key of 1, this
would become I, this would become F, this would be M, M, and P. But of course this could be very
quickly cracked in a couple of ways. One, you could just kind of noodle
on it for a moment and see, oh, this has just been rotated one place. There’s another piece of information
that’s leaked by nature of this cipher. There’s a hint, a pattern. What kind of pattern, Grace? AUDIENCE: It’s a double letter. DAVID J. MALAN: Yeah. And double letters are kind of
interesting because it’s probably not going to be, like, two Q’s in a row. It’s probably not going to be,
you know, “cc” isn’t that common. “ll.” I feel like I see “ll” a lot. And that kind of intuition
or those kinds of statistics ultimately can help you
crack whatever is going on. So that is a bad property
of this rotational cipher. But what if we refine it somehow
so that instead of adding 1 to each of the letters,
why don’t we choose maybe a number– and this
is overly simplistic, but maybe we should
add a different value to each thereby obfuscating
the fact that there is indeed a repetition of letters. Because if you add 3 to
one L and 4 to the other, it’s going to be two
different letters as a result. And so this is more generally
known as Vigenere’s cipher, a French gentleman years ago, which was
an improvement upon the Caesar cipher, or the rotational cipher
we just discussed. Instead of using one key,
you use one key per letter, although technically you would
still use a finite number of keys. And if you need to reuse
them– for instance, if the message you’re trying
to encrypt is “hello world,” you would just reuse
those same keys again. So it just helps you remember so you
don’t need a super, super long key. But in reality this is not
what most modern computers do. What most modern computers
use are different algorithms like DES or RSA or AES. These are a number of
the acronyms that you might see if you poke around security
software even on your own computer. But at the end of the day,
all of these algorithms have some notion of secrecy involved. And we’ll tease this apart
in more detail for this one and come back to yesterday’s
topic of browser-based encryption. But for now, all of them have
some secret, like the number 1 or the number 2 or the
number 12345 or such, and the whole security of
your system is predicated on that secret remaining a secret. As soon as one of you
knows that my key is 1 you can now crack any messages I send
throughout the room here, for instance, on a piece of paper. So that would be bad. So Siobhan proposed earlier that
maybe Apple is deleting your data by forgetting an encryption key. And that’s actually true,
but can we infer now what he might mean from
that given this definition? Anyone other than Siobhan? How might forgetting an encryption
key allow you to delete data? How could you leverage that
primitive, so to speak? AUDIENCE: [INAUDIBLE] treat
the ciphertext like plaintext? DAVID J. MALAN: Treat the
ciphertext like plaintext. In what sense? AUDIENCE: Like plaintext
[INAUDIBLE] become encrypted and if you forget it [INAUDIBLE] DAVID J. MALAN: Oh. Or maybe not what you
input but what you intended to be random I think is
where you’re going with this? So, yeah. So, like, if we– let’s
choose a little example. So suppose that I have
encrypted some sentence or word and the result is GZYEAZYFQ– I’m
just making these up in succession. I suppose that this is what some
secret key on my phone has encrypted. So this is what is
currently on my phone. Now, that might be a text
message I sent to someone, but it looks like nonsense if
you’re just looking at the letters or looking at the pattern of the bits
because its Apple iOS, their software, has used an encryption key to turn it
into something that looks like that. Now, when I, the human, am using my
phone I of course don’t see that. I see the actual English text message
because I have logged into my phone, enabled the encryption key. This is all sort of built
in automatically for us and it is automatically
for me decrypting anything before I see it on the screen. So if it looks like this when it’s
actually stored in my phone’s memory but it can be turned back into plaintext
by using that supposedly secret key and iOS knows how to
do this, it would seem that simply forgetting the
key, whatever it is– it’s probably something fancier than
the number 1 or the number 13. But whatever that key
is, if you forget it, the side effect of forgetting
your key means that what is left in your phone’s memory? Just nonsense like this, which,
yes, represents encrypted data, but the definition of a good cipher
is that the resulting ciphertext appears to be random and is
indiscernible from truly random data. And so, as such, even though, yeah,
all of your data is still on there, it’s encrypted. And because you threw away the
encryption key, the FBI and no one else is going to be able to
decrypt it, at least not without a huge amount of luck. And so for all intents and
purposes it is in fact deleted. It’s equivalent to Avi’s suggestion
earlier of just overwriting your data with random bits. These are random if you just have no
way of recovering what they once were. Now, what if Apple or
a nosy roommate just tries to guess your password
or, in turn, your key? And to be clear, we humans just have
to remember a four-digit passcode or a six-digit passcode
or a longer pass phrase and that is used to unlock,
essentially, the encryption key, which is actually bigger inside of the phone. So how secure is this whole process? Yeah? AUDIENCE: So does each phone
have its own encrypted key? DAVID J. MALAN: Yes. And every time you reset your
phone it generates a new one. Exactly. So how secure is your own phone? So most of you probably–
well, some of you probably don’t have any
passcodes whatsoever, which means this whole conversation is moot. But if you have a four-digit passcode–
something, something, something, something– how secure is your phone? And on iOS and I think Android by
default those are typically numeric. So if you have a four decimal digit
passcode how secure is your phone? How do you answer a question like that? AUDIENCE: Not. DAVID J. MALAN: Not. OK, let’s probe a little deeper here. How not? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Exactly. And that’s the way– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Perfect. And that’s a good way of
quantifying the security of a system is, well, how many
possible keys are there? Because if it takes you some number of
seconds or milliseconds or nanoseconds or whatever to try inputting
one of those numbers, well then the security of your system
is defined by the total amount of time it would take to input one of those
times the total number of them. And then on average
you’ll get lucky and it’ll be, like, you’ll be halfway through
the list when you guess someone’s code and so it’s going to be roughly
equivalent to how many codes there were in the first place. So if 1, 2, 3, 4 is
the length of your code and you have 10 possibilities– 10,
10– because you have 0’s through 9, that of course is going to give
you how many possibilities? 10,000 of them, from 0 on up to 9,999. All right, so not very
secure, but how insecure? Well, let’s just do some quick math. And this would be what a computer
scientist or engineer might just generally call a
back-of-the-envelope calculation. Let’s suppose for simplicity that just
typing in a passcode that’s four digits takes one second. So that means we have 10,000 seconds
if I want to try all possible codes. And there are 60 seconds
in a minute and 60 minutes in an hour, which means in 2.7 hours
I can figure out your passcode. It’s a little tedious and
it’s a boring three hours, but your phone is not at all secure
unless what feature could we enable, just to be clear, as Sean proposed? David? Oh, I was thinking even simpler. The feature with which we
started today’s chat, the one the FBI was worried was enabled. AUDIENCE: The resets. DAVID J. MALAN: The resets, right. So the self-destruct after 10 attempts. Now, this doesn’t make your
phone fundamentally secure because the FBI or your
nosy roommate could just get lucky and out of the 10,000
possibilities your password was 0, 0, 0, 0. Or if he or she is just
typing in random numbers, they get lucky one out of 10,000
times, or really 10 out of 10,000. So one out of 1,000 chances
will they just guess it right and crack into your phone. So that helps us, but it really
just narrows the scope of the threat by having 10 attempts. All right, so we could
push back on that. Why don’t we give the
user just one attempt? That makes it even less
likely, 10 times less likely, that you’re going to be compromised by
someone trying to guess your password. But– AUDIENCE: Not user friendly. DAVID J. MALAN: What’s that? AUDIENCE: Not user friendly. DAVID J. MALAN: Not user friendly. And, Alicia? AUDIENCE: You could make a mistake. DAVID J. MALAN: If you make the
mistake, which most of us surely do, now you’re screwed
because you just deleted your data with even higher
probability because you occasionally make those mistakes. So, again, this theme of tradeoffs here. All right, now
realistically it’s not going to take a second if you
can automate this process, but about a second feels
right if it’s going to be a physical process like this. But suppose that you have
instead a six-digit passcode. How many possibilities are there then? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. So now you’re up to a million because
you add two more zeros to this. So if we have a million and now
that’s one per second, for instance, and there’s 60 seconds in a
minute, 60 minutes in an hour, and 24 hours in a day,
now it’s going to take you 11.57 days of nonstop inputting
in order to crack this password. So doable still, especially
if you automate it or share the load with a friend, but
also doesn’t feel particularly secure. Feels like within 12 days your
data could be compromised. So what’s better? Yeah, Danny? AUDIENCE: I learned this the
hard way, but with the iPad if you enter it in incorrectly
a certain number of times it makes you wait five minutes. DAVID J. MALAN: Nice. AUDIENCE: So you can reenter it. And then if you screw it up again it’ll
say you have to wait a half an hour or an hour. And it keeps adding how long– DAVID J. MALAN: How many times
did you screw up your password? AUDIENCE: I figured it out. DAVID J. MALAN: OK. So that’s clever and that is true
of a lot of systems, but why? That seems incredibly
annoying of Apple to do that. AUDIENCE: Yeah. DAVID J. MALAN: But why? Why is that actually a
brilliant idea, I would argue? Security– AUDIENCE: It increases the amount
of time it requires [INAUDIBLE] DAVID J. MALAN: Yeah, exactly. If this is like the best system
you have for protecting a device, and a passcode isn’t
fundamentally horrible. It’s just horrible if you
can input them really fast or there’s relatively few of them. So what if you do insert
artificial delays such that the software after each attempt
or each several attempts says, wait a minute. You seem like an adversary, not just
someone who forgot his password. Let me let you keep doing this but only
after a minute break or a five minute break. And so you generally just
increase the cost to an adversary by delaying or slowing down his or
her attempts to crack into your phone. Of course the price you
pay, as you discovered, is that this too is a trade off
in terms of user experience, in terms of the user having
a scenario like you just forgot your passcode presumably. But this is a wonderful defense
mechanism, this back off. What else could we do to increase
the security of these phones? Yeah? Dan again. AUDIENCE: So you could allow
both letters and numbers. DAVID J. MALAN: Yeah. So– AUDIENCE: And that increases the
number of possibilities for each. DAVID J. MALAN: All right, to
how many do we have here now? AUDIENCE: Well, at 10 you’d have 37. DAVID J. MALAN: OK, so 30– AUDIENCE: If you use just letters. DAVID J. MALAN: OK. So if we– AUDIENCE: 36? DAVID J. MALAN: Yeah. So if we have letters and numbers,
and we can actually ratchet this up even further– AUDIENCE: Include, like,
exclamation points. DAVID J. MALAN: OK, can definitely
do that, but even simpler. You’re forgetting sort of half– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. If you take into account
case sensitivity now we go from 26 plus 26 plus 10 so
that’s 62 possibilities for each. So that’s more. So 62, 62, 62, 62, 62 possibilities
for each of these characters. So now this is 62 to the sixth
power– 62 times 62, duh duh duh. And that gives us this
many possible codes, which is definitely a bigger number. So let’s try this. So 60 seconds in a minute, 60 minutes
in an hour, 24 hours in a day. And now let’s go further. There are 365 days in a year. So 1,800 years now we’re up to, assuming
it takes just one attempt per second. So now we’re probably
safe because we’re going to be long dead by the time
someone gets into our phone. So but we’ve raised the bar. And what’s nice is that we haven’t
really changed the algorithm per se or the process for
getting into the phone, we’ve just increased
the search space, so to speak, by increasing the total
number of possibilities here. All right, so then let’s come
back to I think Griff proposed. What was it then, to recap,
the FBI wanted Apple to do? AUDIENCE: So the FBI wanted Apple
to write them a software that’s going to bypass [INAUDIBLE] DAVID J. MALAN: OK, good. So they wanted to bypass
that self-destruct feature and they could only do this by changing
the underlying operating system. Unfortunately– OK, so they
were hoping to do that. What else were they hoping for? AUDIENCE: Unlimited assistance. DAVID J. MALAN: What’s that? AUDIENCE: Unlimited assistance. DAVID J. MALAN: Unlimited assistance. So at the end of the day
so they wanted that access. They also wanted the ability to
electronically input the code so that they don’t have
to have an FBI agent, like, manually typing in
all of these possible codes well past the previous
limit potentially of 10. And they also wanted the ability for–
and this is more technical– for Apple to side load the software into
the phone so that it’s only in RAM recall from yesterday, not on
the solid state drive inside the phone. Because they didn’t want
to forensically alter any of the zeros and ones
that were on the phone so that presumably in court
no claim could be made that, well, the FBI injected this software
or this evidence into the phone by putting it into RAM
thereby not touching the persistent data that was apparently
forensically of interest to them. So I forget– let me
see if I have this here. So just to be clear, this is more
tongue in cheek than anything. But this is a little
video on YouTube such that technically the FBI
could also do something like this as opposed to inputting codes
electronically as via a Thunderbolt cable. Whoops. This apparently is a thing. And you can actually see in
the top right-hand corner looks like an Android phone. So that’s how you might
brute force an Android phone if you have at least
a short code and 12 days to spare with a device
like that, but they wanted to avoid that kind
of silliness, for instance. So what did they– OK, so all
of this, at the end of the day Apple ended up not obliging
and supposedly the FBI was able to have someone else,
a security firm supposedly, get into the phone without going
through this whole process. So at the end it was sort
of an anticlimactic ending and it was about to be an incredibly
frightening legal case potentially as to the degree to which companies
could be coerced to actually helping the FBI in cases like this. But this whole fiasco was
also in part the result of a lack of feature
in the iPhone, which is that in theory they
could have installed the software into the first place. This was arguably a bit of an Achilles
heel in the design of the iPhone version 9.2 or whatever they
were up to at that point whereby it seems to be
completely beside the point if your adversary– in this
case, the FBI in some sense– is able to just install new software
onto your phone thereby circumventing the very protections
built into the phone. So this was a flaw, arguably,
from a security perspective, and I believe it’s already
been fixed– “fixed”– such that now to upgrade the
software on your phone, on an iPhone, you have to– guess what–
input your passcode now. And that was the catch before. Supposedly the upgrade process for
updating the iOS software on a phone up until recently did not require you
to input your password necessarily to update the software. And I do believe as of the last time
I updated my own phone and I noticed, oh, I don’t remember being
asked for my passcode before, and that seems to be
a side effect of this. So it’s a fascinating sort of cat
and mouse game where in this case, you know, the FBI were
trying to do a good thing, but from a security perspective they
were in this scenario the adversary in some sense insofar as they were
trying to get into a phone and Apple either hadn’t anticipated or hadn’t
worried about or just hadn’t bothered to implement this particular
additional defence mechanism. So questions about the
scenario or what happened or how it ultimately played out? Yeah. AUDIENCE: Was there any way
to see where people typically touch on their phone to see what
their passcode is? [INAUDIBLE] DAVID J. MALAN: Oh, you
know, I’m sure there is. I feel like I’ve seen movies certainly
where this is– you sort of go– AUDIENCE: Yeah. DAVID J. MALAN: And then
you see their passcode. I do think there’s something to that. Certainly on physical devices you see
wear on, like, old school alarm panels where you– AUDIENCE: Even on laptops, like, they’re
touching the same [INAUDIBLE] some wear [INAUDIBLE]. DAVID J. MALAN: Yeah. I’d have to Google around to see
the sort of tales that I’ve read. But I’m sure this is possible. I mean, even you can glean which keys
are used the most, which is information because it narrows your search space. And I do think I’ve even
seen some article where– it was something more clever than
just filming someone’s hands. I’d have to remember. But, I mean, yes I’m
sure this is possible. And even on glass you’re going to
leave some kind of oily residue presumably if you’re touching
the same place repeatedly. So these two are fairly sophisticated
attacks that most typical adversaries aren’t going to wage, but absolutely. There’s such things as that
I would imagine possible. All right, so let’s transition
for a moment to a slightly deeper dive into encryption itself and
how it might otherwise be implemented. Because this whole Caesar cipher
thing, or the rotational cipher, and this whole idea of a secret
key seems fundamentally flawed. For instance, Felicia, if I wanted to
send you a secret message in this room by passing you a note that,
sort of like grade school style, that only you and I can read and write
and understand, how could we do it? How can I write on a
piece of paper a note and somehow encrypt it and send it
to Felicia so that if any of you intercept it– like I did for
David’s packet yesterday– it’s not at risk for being figured out? Avi. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah. Which is kind of a catch-22, right? Because in this context I could say, you
know, walk over here, password is 13, right? And assuming only she could
hear that, that would work. But this is also kind of stupid
because if I have the ability to secretly communicate
to her the password I might as well just hand her the
note physically at that point. So there’s this chicken
and the egg problem whereby we can’t communicate securely
unless we can first communicate securely in order to exchange that
one piece of secret information, whether it’s the number 1, 13, 12345,
or something even more sophisticated still. So how do we address this? Well, it turns out that this
is an example in the middle here, RSA, of what’s called
a public key cryptosystem. The first two and the Caesar and
Vigenere cipher we were just discussing are generally called
secret key cryptosystems whereby there truly is one and only
one secret and both parties, A and B, have to know it in advance in
order for the whole system to work. But this of course is problematic,
especially in the world of the web because, like, I don’t
know anyone personally at Amazon.com with whom I can exchange
a secret so that I can buy something on their website and check out. I don’t know anyone at
Facebook necessarily that I can establish a secret with
so that when I log into Facebook my password is encrypted between
my laptop and their servers, right? Just the world wouldn’t work
if this were a prerequisite. So it turns out that in public key
cryptosystems you have two keys, not a secret key per se, but a public
and a private key, as it’s called. And they’re sort of
semantically the same idea, but they’re mathematically
a little different. Turns out that with algorithms
like RSA and Diffie Hellman and bunches of others
you have two numbers that have some kind of mathematical
relationship between them such that you can give out your public key. And Felicia should
give me her public key. I could then use her
public key to encrypt a message using some kind of system
similar in spirit to what we’ve discussed then send her that message. Even if David or someone else
in the room intercepts it, presumably he’s not going
to be able to understand it because what he doesn’t have
is Felicia’s private key, presumably, with which there’s
this reversible relationship to the public key. So only Felicia’s private key can
undo the effects of her public key, and so long as she keeps her private key
private and tells no one, including me, the whole system rather works. And so technically
it’s not a public key, private key she and I would usually use. Those tend to be fairly
computationally expensive relative to the secret key encryption
techniques we discussed earlier. So instead what we would
do is use our public keys to establish a shared secret key. So I would use her public key and
I would encrypt the message 13, for instance, if that’s the
secret key we want to use. She uses her private key
to decrypt that message. She says, oh, David wants to talk
to me using the secret key 13, and then we can use something simpler
like the Caesar cipher or something else but hopefully with a bigger
key or fancier algorithm than that. So this allows us to address
the chicken and the egg problem but you want to know
whom you’re talking to. So I want to be confident that
I’m talking to, like, Felcia.com, so to speak, and she wants
to be confident she’s talking to David.com or whatnot. And so to do this, this is where those
whole SSL certificates come into play that we discussed yesterday. She can pay someone else, you
know, $50 a year or $300 a year, give that certificate authority,
so to speak– her name and mailing address and all of this. And they will then give her a stamp
of approval, a digital signature, so to speak, on her
public key essentially that allows me to trust that if I
trust this third party called Verisign or Namecheap or GoDaddy or any
number of third party companies, I, by transitivity, should
be able to trust Felicia as well because they vouched for folks. And actually we were into an interesting
corner case with David’s laptop yesterday where as best
we could tell he wasn’t able to use Cloud 9 which uses HDPS
and CS50 IDE because presumably this government laptop didn’t
necessarily– as best I can tell– trust the certificate authority who
signed the keys that are being used by Cloud 9 and also by Harvard’s site
I think and probably several others. And that’s probably because the
sysadmins, the system administrators, remove certain keys
to limits the access. Because, indeed, there have been
cases out there where there’s sort of fly by night operations that
are verified certificate authorities but they’re really not doing
any due diligence whatsoever and so illicit or, rather,
untrustworthy websites have also been able to
enable this kind of feature. Felicia? AUDIENCE: [INAUDIBLE]
RSA SecurID [INAUDIBLE] DAVID J. MALAN: No,
that’s a little different. So RSA is also the name of the
company and it’s also the initials of the founders of the company. So it has a lot of different
meanings and different contexts. RSA SecurID, which is
increasingly common, is an example of something called
two-factor authentication, which is actually a good transition to
a more general topic of security of your own data, from which we can
now transition into Dropbox as well. So most of us for most services
use one-factor authentication. When you log into a website
with a username and password, the username is uninteresting. It might as well be
public by definition. But the password is meant to be
secret and that is your one factor that only you, hopefully, know. Unfortunately, passwords are not all
that great of a defense mechanism. Why? They’re not all that secure. AUDIENCE: Get stolen? DAVID J. MALAN: Get stolen how? AUDIENCE: Phishing DAVID J. MALAN: Phishing attacks
like we discussed yesterday. Copying Bank of America’s
site and duping me into typing it into
some bad guy’s website. Sure. How else? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Looking at
someone’s keyboard, I would argue, or whatever sort of sticky
residue they’ve left on their– the oils from their fingers. We could figure out with some
probability what their password is. Not secure for that reason. AUDIENCE: [INAUDIBLE] hard
to remember a long password. DAVID J. MALAN: Yeah. Most of us probably have
pretty bad passwords, right, because they’re hard to remember
so we choose easier passwords. And even if– this is often
kind of a scary revelation. Even if you think you’re being clever
by let’s say you’re– the first “i” word that came to mind was “igloo.” Suppose you’re being clever
by using a 1 for the i and maybe you’re being triply clever
by using zeros for the o’s, well, guess who else knows those heuristics? Like all of the bad guys, right? So even if you’re
being clever like this, you are increasing the cost to that
bad guy to figuring out your password because now instead of checking all the
letters of the alphabet– all 26 or 52 of them, uppercase and lowercase– now
he or she has to try numbers as well, increasing by a factor
of 10 the total number of possibilities here for
each of these characters. But even still, with enough savvy and
computational cycles, computing cycles, he or she could certainly
figure this out too. So not clever. Better would be to use a password
that doesn’t look like this but is gol17$_’a– that’s not an a– gAB. And I’m just truly making
this up randomly somehow, but I will never remember
what this password is. Or if I do and then I go on,
like, holiday for a week or two and come home perhaps as in Danny’s
scenario with the iPad, like, I’m not going to remember
within some amount of time what that very clever password was. So there’s this tension then
between keeping your system secure but keeping them accessible because
the more secure you make it, the higher the probability
it is that you’re going to lose access to the very data or
services that you’re trying to protect. So in this case, passwords
aren’t all that great and they can also be stolen, which
is sort of like salt in the wound. So this is if you have one factor,
a password that only you know. And, Felicia, can you
then hypothesize what the formal definition of
two-factor authentication must be and why it’s better? AUDIENCE: Does it verify on the
other side who that person is? DAVID J. MALAN: Does it verify on
the other side who that person is? In some sense. AUDIENCE: [INAUDIBLE] that information. DAVID J. MALAN: In some sense. Can you elaborate for those
unfamiliar, what is this RSA SecurID? AUDIENCE: So I log into my [INAUDIBLE] DAVID J. MALAN: Yep. AUDIENCE: [INAUDIBLE] gives me a key. So when I log in it not only verifies
my username, password [INAUDIBLE] but it’s another security. DAVID J. MALAN: Exactly. So you’re asked for both a
password and then some number. And the earlier incarnation
of this used to be RSA key fob– some people might still
carry these– that looked like this. And it’s a clever idea and
it kind of predates everyone having phones in their pockets. This is a number that changes
every second, every minute or so, and the idea is that Felicia in
addition to typing her username and password also has to type in
whatever the current number is on this key fob that’s on
her key chain in her pocket. So this is a second factor,
but it’s fundamentally different from her
password in what sense? It’s not just the second password,
which would be still one factor. AUDIENCE: It changes. DAVID J. MALAN: It changes,
which is compelling, but that, too, isn’t what fundamentally
distinguishes it, I would argue. AUDIENCE: It’s physical. DAVID J. MALAN: It’s physical. It’s physical. Well, it’s actually
not random otherwise it would be problematic to
have it on her key chain and then somehow
synchronize with the server. So there is a pattern that it follows
in some sense, but it’s physical. So whereas the first factor
is generally something you know and have sort of up here but
that can be extracted or somehow stolen if you type it in or
have it logged somehow, the second factor needs
to be fundamentally different which needs to be, in
this case, something you have. So now given that Felicia is protecting
her account with these two factors, not only– I could probably
pretty easily figure out her password looking over
her shoulder or tricking her into a phishing attack
or the like, but I have to have physical access to her
now to thread her security further because now I need that key fob. And unless I have
physical access to that, I’m not going to know what that
six-digit code is at any given time. And what happens is when these devices
first shipped from the factory, essentially they’re synchronized
with some piece of software or some kind of initialization
routine on a server so that Felicia’s company knows that
this is 159759 at this moment in time. But then a minute
later both the server’s understanding of that number and
her own key fob device change, and sometimes they can drift
out of sync because clocks will be ever so slightly askew. So every time you log in, which is
probably once a day or once a week or what not, the clocks
will resync as a result. But if you’ve ever carried this
thing and you haven’t used it for weeks on end or maybe even
a year, it might very well not work after some time
because it will drift or it will be forcibly
expired for lack of use. AUDIENCE: [INAUDIBLE] the Internet? DAVID J. MALAN: These are– no. No. These are just– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: The server
syncs with this device. So if it starts to drift
out of date slightly, Felicia’s server knows, oh, wait
a minute, I just saw that number. I was expecting to see
it a few seconds ago. Let me just update my own
clock relative to that device. So nowadays most people
don’t bother carrying these because we all carry these. And so now you have
software versions of this, and Google offers this,
lots of banks offer this. And even if you don’t
have a special application you can certainly just receive a text
message, which is super compelling. And so, personally
speaking, after this week if you go back home and
take nothing else away security-wise besides choosing
better passwords, for instance, also start to look for
companies, especially banks, that offer two-factor authentication. Most of us probably
don’t care all that much if our Peapod.com account is compromised
or some silly website, for instance, that you occasionally log into, but
bank accounts are more important, maybe your email is more important
or any number of other such accounts. And unfortunately relatively few
vendors offer these services, but Gmail does and certain banks
do now, certain brokerage houses. And so that’s when you want to really
raise the bar to the adversary. Of course, there’s a
price paid, so to speak. I mean, sometimes they might charge
you for this but generally not. What, Felicia, what other
price do you pay, so to speak? AUDIENCE: Time. DAVID J. MALAN: Time, right? You have to, like, fish
around for the damn thing. And there’s corner cases like, you
know, what can go wrong with this? Battery dies or you’re in the basement
somewhere on campus or at home and you just don’t have reception
so you can’t get the text message, so now to log in you
have to walk outside, walk upstairs, get the message,
now go back downstairs, and you only have a few
seconds to actually type it in. So there’s some annoying
corner cases but at least you don’t have to figure out how to buy
or carry around something like this. AUDIENCE: I thought, like, a
few years ago this got hacked and it was a big scandal because
it was supposed to be unhackable. DAVID J. MALAN: Yeah, they took a hit. I forget the particulars of this. But somehow I think the
keys on their server, so to speak, that they were
using to keep things synchronized were compromised, and
via that information you could figure out
what someone’s code was. I’m guessing they had to reissue
or change some aspect of it. But, yeah, that was scandal
because this was the product. Like, that is not supposed to happen. But you’ve got to trust someone. And so in fact behind
most of these topics today is you have to trust
someone at the end of the day. You just want to ideally
minimize the number of people that you’re actually
having that trust with. All right, so speaking
of trust– oh, yes. Sal? AUDIENCE: What do you think about
cloud-based password managers? DAVID J. MALAN: Ah. Good question. What do I think about
cloud-based password managers? They are probably better than
not using any password manager because if the result is
that, understandably, you have very easy passwords like
12345 or welcome or 1gloo with a 1, then you’re probably gaining on net
from using a cloud-based service. The problem with cloud-based
password managers– these are pieces of software
for which you generally have one hard to guess password. So you have to sort of bite your tongue
and just choose a really hard password for just one of your accounts. And then use that one
super hard password to protect all of your other accounts
by encrypting them essentially. The problem with cloud-based
services is to get those passwords or access to them you’re probably
going to, like, LastPass.com or whatever the service is and you are
typing in your super secret password, hitting Enter, and then
copying and pasting whatever your other passwords are. There are some threats here though. What’s one such threat? AUDIENCE: Somebody could
hack your computer. DAVID J. MALAN: Someone
could hack your computer. Like I personally would never do
that on, like, a friend’s computer or a lab computer or a
computer I don’t physically own and keep in my possession
because it’s just too possible for there to be what’s called a
keystroke logger on the computer where it’s just logging all of the keystrokes. Now, if Last Pass or the
equivalent has two-factor I’d be more comfortable
because that way, sure, my friend or some adversary in a
computer lab might steal my password, but he or she also needs
to physically take this from me or some equivalent device. So that would be my
biggest concern there. So I use a password manager,
but it’s not cloud based. It all runs on my phone, runs on my Mac
and my desktop and my laptop computers. With that said, it also supports
Dropbox, which is a beautiful segue, but it encrypts the file
before putting it on Dropbox. AUDIENCE: What was the name? DAVID J. MALAN: One
Password is an alternative that does the same thing as Last Pass. So One Password and Last
Pass If you’re unfamiliar. So a perfect segue to
services like this. So you don’t have to offer
up which one you use, but how many people in this
room use either Dropbox or One Drive or Google Drive or Box. Like, pretty much almost everyone
uses one of these things. So let’s consider how
these services might work. And they do have different
security policies if not practices, but let’s see if we can’t
figure out just to what extent we are secure with something
like Dropbox, for instance. So for those unfamiliar, Dropbox
is a file synchronization service whereby on your Mac or your PC or
your phone you install their software and then just automatically
behind the scenes it constantly ensures that a folder called
Dropbox on your computer has the exact same files on this
Mac, on this phone, on this PC, on this desktop– any computer that I
install the software on and log into. And this is beautiful because if
you have a work computer and a home computer, maybe a phone, you can
create the illusion for yourself that all of your files are
constantly accessible to you. Now, in reality it’s doing a
lot of copying and updating the files moving around. It’s not the same file. It’s not just one cloud service. It’s maintaining local copies
on all of your computers. So how does Dropbox achieve this? So if this is the cloud
and this is Dropbox here and this is one of my laptops and here’s
another laptop, maybe work versus home, each of them have a Dropbox folder. Suppose I change a file
on this laptop here. How does it get over here
according to this kind of model? AUDIENCE: Sychronize [INAUDIBLE]. DAVID J. MALAN: Yeah. So the data, if I update it here,
first goes through the cloud and gets stored on Dropbox’s servers. They formerly used to use
Amazon S3 but I believe they use their own infrastructure now. But that’s sort of an uninteresting
implementation detail for now. Then it syncs the file to here. And then if you change it here, it goes
back up here and saves it down here. So where might you be
vulnerable in this scenario In terms of your privacy or security? Sorry? AUDIENCE: In the Dropbox server? DAVID J. MALAN: Yeah, the
Dropbox server, right? At the end of the day,
we have to trust someone. And it would seem that if I’m
uploading, like, my important documents, they’ve got to go to
the cloud, so to speak, which is an actual,
physical place somewhere. So this is Dropbox’s
servers and they’ve got to get synchronized
back down to my laptop. So Dropbox would seem to have access. OK, but wait a minute. If you read the fine
print and if you read the marketing speak on Dropbox’s
website, they encrypt your data. So what might that mean actually,
because you shouldn’t just take these marketing
statements at face value? They encrypt your data, so
what are they doing to it? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Good. So they’re scrambling it to create the
illusion of just random zeros and ones on their servers. That seems like a good thing. They’re using some key
presumably for that. OK, so what key? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK, well good, right? Although that seems to be a thing. So you don’t know, but is it yours? So it turns out it could
be maybe your own password. When you sign up for Dropbox, you
have your own username and password. So maybe they’re using
your password as the secret with which to encrypt the data. But this would kind of break
the file-sharing features of Dropbox and other tools insofar as I
can share my files or folder with Katy as well and with Christina and
with any number of other people and this would be problematic if my
files now are encrypted with my key but Katy’s installed the same thing on
her computer, needs access to my files, but to her they look like
random zeros and ones. So can’t be my key it would
seem logically in this scenario. It can’t be Katy’s key because,
conversely, I couldn’t then see the files. So it must be some central key, some
main key that Dropbox generated years ago, for instance, has
locked away in a vault, and is used to encrypt
all of their data. Yes? Turns out that’s true. So what are the threats? At least, that’s true
the last time I inquired. AUDIENCE: Someone could get that key. DAVID J. MALAN: Someone
could get that key, right? And so really all that’s
protecting you at that point is the physical security
of that key, which is probably on some piece of paper
or a USB stick or something like that hopefully in a vault with
very few people in the company having access to what is effectively
a really big seemingly random looking number that’s
used on their servers. But it could certainly
be compromised somehow. But this is actually a feature insofar
as this is possible for Dropbox. So it turns out Dropbox
supports a technique which is in their own interests, like other
companies, called deduplication. Does anyone know what it means
to deduplicate your data? Or for Dropbox to deduplicate
our collective data? So– AUDIENCE: No. DAVID J. MALAN: No. So deduplicate– suppose that
all of us in this room had, let’s say, all downloaded the same
video file, which tend to be big. So it’s like some movie that we’ve
all rented or bought or whatnot and it’s two gigabytes
or four gigabytes. So it’s a pretty big file. And there’s 20 plus of
us in this room and it would be pretty wasteful for Dropbox
to store two or four gigabytes times 20 people because at the end of the day
how many copies does Dropbox technically need in order to provide us all
with access to that same movie? I mean, in theory one copy. Now, for backup purposes
they should have a few spares anyway on different servers, but
that’s besides the point for now. They certainly don’t need to
keep one copy of every file that we share among us. One, in theory, suffices. So to deduplicate your data
means to do exactly that. Instead of remembering the file
for every user and a copy thereof, what should suffice instead whereby
Dropbox could use 1/20th the space? AUDIENCE: Directory. DAVID J. MALAN: The directory entry. So just remember that I have this
file and Siobhan has this file and Dario has this file and
everyone in the room has this file. And just like yesterday’s
discussion of file system, just remember where that file is
and then any other backups as well. So this too is only possible
for Dropbox if what is the case? If they’re not encrypting
our data individually. Because if all 20 of us
downloaded this movie and dragged it into our
Dropbox folder, suppose that it were encrypted with
our own encryption keys. So far as Dropbox is
concerned, it would look like they have 20
different patterns of bits that are thus not the
same by definition so they can’t throw away 19 of those copies. They have to keep all 20. So in order to save on cost, 1/20th
the cost in our particular sample, like Dropbox logically has
to be using the same key to encrypt all of our data in
order to leverage that feature. So here there’s this trade too between
space and security and financial cost and security. If they want that feature, they can’t
go about logically throwing away– they can’t go about, rather, encrypting
all of our data individually. Question? Yeah? Inessa? AUDIENCE: So usually
the data in Dropbox is able– you are able to manipulate
the particular piece of data so long as an individual
user manipulates it and it creates a separate copy that
becomes a different source of code, right, that’s separate from that. So if 20 of us downloaded a movie– DAVID J. MALAN: Uh huh. AUDIENCE: Let’s say if that one source
[INAUDIBLE] I manipulate the movie, say, for the [INAUDIBLE] just mine? DAVID J. MALAN: That is correct. If you make your own
edit of it, absolutely. Then you become decoupled
from everyone else. So now there’s two copies– the
original and the altered version of it. Absolutely. So that is the case, for sure. Yeah? AUDIENCE: Also I noticed if
you delete something on Dropbox it goes to the Trash and then it stays
there for a certain amount of time. DAVID J. MALAN: Yeah. So that’s a feature
from their perspective, right, because– or
feature for us, the end users, because if you
accidentally delete something or if, like, Katy deletes some file
that I did not want Katy to delete, it’s a nice feature that I can
retrieve it from the trash. And it’s probably
ill-defined just because they don’t feel like committing to it. It’s sort of like a nice feature. Nice to have but not
a guaranteed feature. And that’s very similar in spirit
to what we discussed yesterday whereby eventually they’ll
just forget where the file is, but it’s surely somewhere
on their servers because it’s not worth
overriding necessarily. Yeah, Felicia? AUDIENCE: Dropbox,
Box, iCloud [INAUDIBLE] DAVID J. MALAN: Unclear. I know less about the other services
just because there’s less information kind of floating around. But you can start to infer
these kinds of details. And technically I don’t
know– Dropbox might maybe have different keys for
different subsets of its users. Now they have business features,
for instance, where you can actually put users in an organization, and within
those narrowly-defined organizations they might be using separate keys. So there are ways to re-scope the
threat in question but at the– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: I’m sorry. AUDIENCE: The synchronization piece? DAVID J. MALAN: The synchronization
piece is pretty much equivalent, yeah. They all offer roughly
the same feature set. Exactly. Sean? AUDIENCE: [INAUDIBLE] if you share
your whole folder and someone makes a change to those files in the folder,
you get the same effect [INAUDIBLE]. They change it for you. But if you just give them a link
would [INAUDIBLE] the file themselves, just kind of read and not write? DAVID J. MALAN: That is correct. And they will get a copy
of whatever the file looks like when they initiate
the download process if you haven’t changed it since. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: That’s fine too. And there’s actually another
feature built into the client now where you can make
the folder read-only where I presume if someone does
change it, Dropbox essentially undoes the change by deleting it and
re-downloading the original version or some such trick as well. All right, other questions
as it relates to security? Yeah, David? AUDIENCE: It’s kind of related to
the Dropbox and the synchronization. A lot of the browsers also have
a synchronization mechanism, like Firefox– DAVID J. MALAN: Yep. AUDIENCE: –where you could just change
your browser history and your bookmarks [INAUDIBLE] and does that
store all the cookies or all your certificates as well? Does that get synchronized
and unencrypted? DAVID J. MALAN: That’s a good question. It definitely gets
synchronized to some server. What they do is unclear. Certainly in this case in
the abstract I’m not sure. And frankly almost always
if you go to some website, they will use some fluffy
marketing speak like, encrypted with industry-grade encryption. That could mean so many things. That could just mean
they’re using HTTPS so that when your data goes from
your browser to their servers, yes, it’s encrypted and it’s
pretty safe mathematically, but once it gets to their servers
it’s not very well-defined. In fact, this is one of the hard
things, especially since most consumers aren’t savvy enough yet
in society to demand better clarity around these issues. The companies are under no obligation
otherwise really other than ethically to disclose this information,
and even then they might not want you to know what
they really mean when they say, this is secure. So in cases like that, you could
try reading the fine print, you could try googling
around for a white paper as it would typically be called on the
security of that particular system. And, like, Dropbox has such
documents available, for instance. But even then, the secret
sauce is often not disclosed, either because it’s
particularly secret or they just don’t want you to know exactly
what the attack scenarios are. Not good for business. But even here for the
password manager I mentioned earlier, One Password, like I would–
so personally using Dropbox, like, I don’t put personal stuff
in it for this reason. I’ll put encrypted stuff in
it where I will encrypt it or I’ll use a tool like One Password
which will encrypt its data file, put a seemingly random
sequence of zeros and ones in the Dropbox folder
that gets synced so that even if something
does get compromised, all they have is seemingly
random zeros and ones. Unfortunately, the world
is not user friendly enough yet to do all of
this for us automatically. And so this is kind
of– we’re kind of here. And I forget who asked this. Was there something more
precise we can address? But risks of cloud tools, certainly
Dropbox and password management and any other number
of variants all kind of relate to the security
of cloud services. Per our discussion yesterday, I would
say that you have physical threats certainly. Like, Amazon is a physical place. They have data centers in
various parts of the world and so there are certainly
engineers at the end of the day that could get physical access to the
hard drives, get off that data, but that’s hopefully a fairly
narrow threat and most likely some random person working at Amazon
doesn’t care about your specific data. But there’s only so much you
can do at the end of the day if you don’t physically
control the space. In fact, there’s a common
kind of saying where the only secure computer in
a room is in a locked closet with no power and no cords. I mean, that is truly secure. Everything after that starts
to poke holes in the system. AUDIENCE: When I was asking the question
[INAUDIBLE] Google Docs [INAUDIBLE] because of the security [INAUDIBLE]. So I’m just curious how secure was it
because clearly there’s a cost element. You don’t have to pay for
Microsoft Office [INAUDIBLE]. DAVID J. MALAN: Oh,
those kinds of tools too. AUDIENCE: [INAUDIBLE] Google Docs. You could use [INAUDIBLE]
you could use all those tools for free as opposed to
paying for [INAUDIBLE] licenses. DAVID J. MALAN: Yeah. I mean, I would probably– the free
services are typically freemium services whereby there’s paid versions. And so you’re probably getting
essentially the exact same tools and in turn the exact same security
for something like Google Docs as you would if you paid for Google Apps. The difference being for certain
governmental requirements sometimes cloud-based companies will run
separate servers or that adhere to certain security protocols that
are too expensive for consumers to care about or to commit
to, but the US government might require that, hey, if
we’re going to use Amazon, you have to put us in our own locked
concrete room with our own servers on no one else’s servers, no other
country’s servers and so forth. But that would be more a
matter of policy or practice. Box typically was much better at that
in terms of focusing on the enterprise. Dropbox started more
as a consumer company. And there’s different levels
of offerings along those lines. But the short answer is that cloud-based
service is not secure fundamentally, right? Unless you run it yourself, unless
you’re doing the encryption yourself, it’s not fundamentally secure. And even then the great
tragedy in computing is suppose I do download some
program that encrypts information. PGP– it’s called Pretty Good Privacy. It’s sort of a tongue in cheek kind
of– it’s actually good privacy, but they call it Pretty Good Privacy. But if I use this kind of tool
or any number of these algorithms I enumerated earlier, who’s to
say that the software you’re using to encrypt your data– which
you might not even understand mathematically and so you’re
just trusting that someone else understands this stuff
better than you and thus wrote the software for you– how
do you know that that person didn’t slip in their own backdoor? This is what the
government, for instance, is so clamoring for– give
us backdoors into this. Years ago it was like the V-chip in TVs
and so forth giving them this backdoor. Well, how do you know there aren’t
backdoors in our phones right now? Like, how do I know my phone is
not listening to this entire class and transmitting it somewhere? I’d have to intercept it
with some physical device. But, you know, unless the green
light turns on on my laptop, I’m going to assume the camera’s not on. And yet if you watch like Mr. Robot or
other shows, I mean, this is possible. If you have malware,
malicious software installed on your computer, who’s to say
it didn’t turn on my camera and just didn’t turn on
the green light and is recording everything I do and say? So this is kind of– and you
can go to the lowest level. Like, we will talk a little
bit about this later today, but when you write
software you typically use a program called an
interpreter or a compiler that turns the words that you
write into zeros and ones, essentially, the computer understands. Who’s to say that the compiler
or the interpreter you’re using isn’t adding to your own
software malicious zeros and ones that are injecting backdoors
into every piece of software we write? So unless you wrote the
sort of software that understands the zeros and
ones at the lowest level, like, you can’t trust anything we use. You really are just computing
while crossing your fingers. Felicia? AUDIENCE: I have a question on security. So I have deleted some text
messages I didn’t mean to delete and I wanted to retrieve them. So technically I guess
they were lost somewhere. DAVID J. MALAN: OK. AUDIENCE: So I paid for some
software that could retrieve them. DAVID J. MALAN: OK. So this tells me, OK, they’re not
really deleted off my phone, right? DAVID J. MALAN: OK. AUDIENCE: So again that’s
another backdoor [INAUDIBLE]. DAVID J. MALAN: I wouldn’t
call that a backdoor. Backdoor generally means there is
a way of logging into or accessing a system outside of the normal means. That would be just how
deletion is implement. That’s a bug or a feature. The phone was just forgetting where
the files were a la yesterday’s chat about directories. Yeah. I mean– AUDIENCE: So it’s just forgetting? DAVID J. MALAN: Yeah. And, I mean, these days too we
have so much performance capability in these phones, it’s
lazy and it’s bad design I would argue if phones and computers
these days are not actually deleting your files when you request as much. But we consumers, especially
given messages like the DOS window from earlier, have never been
really educated fundamentally to expect as much and
so people don’t do it. And in fact you can’t– I mean, it’s
Apple’s and Microsoft’s own faults. Like Apple at least to
their credit for some time– it’s no longer in the
most recent version– you used to have a Secure Empty Trash
option that’s since been removed that would actually overwrite
with zeroes and ones the data on your hard drive. Nowadays I think this ends up impacting
the drive too much because SSDs don’t last as long as mechanical devices
necessarily in terms of how many times you can write them. But it’s Microsoft’s
and Windows’ own fault that they’ve not been doing this for us. So hopefully this tendency
will change over time. And in the case of Apple,
if you instead turn on something called File Vault,
which is the equivalent of what iOS automatically does, you can make
sure your hard drive is constantly encrypted so that if it is stolen or
someone physically tries to access it, they only see random zeros and ones
unless they know your password. Of course, if your password is super
easy to guess, it’s still insecure. So bad design I’m guessing. Like, there’s no way Apple would
let you run software like that, so I’m guessing it’s an Android phone? AUDIENCE: [INAUDIBLE] DAVID J. MALAN: They were able to? What version of– AUDIENCE: One or two
versions of [INAUDIBLE]. DAVID J. MALAN: Really? Oh, OK. So it might be something
that’s since been– AUDIENCE: [INAUDIBLE]
text messages purely. It was kind of gobbledygook, but it
did retrieve some of the text messages. DAVID J. MALAN: Really? Interesting. Poorly implemented software
that you were– good for you but bad that you were able
to get them back nonetheless. AUDIENCE: They told me,
yes, that [INAUDIBLE]. DAVID J. MALAN: Gotcha. And did they have to
connect a cable to do this? AUDIENCE: No. You just download the software online
[INAUDIBLE] and link that to an iPhone DAVID J. MALAN: Oh, really. AUDIENCE: Up to the cloud, I mean. DAVID J. MALAN: Interesting. OK. Can’t– AUDIENCE: [INAUDIBLE] backup
copy of her [INAUDIBLE]. DAVID J. MALAN: Oh, that’s what it is. So it was– AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK, good, good. AUDIENCE: I do it before for my old
business [INAUDIBLE], my old phone– DAVID J. MALAN: OK. AUDIENCE: –or whatever and I
pulled the backup [INAUDIBLE]. DAVID J. MALAN: That makes sense. Yeah, you can’t just toss
words like cloud around now. That has no meaning anymore
after today and yesterday. OK, other questions? All right, well let’s
go ahead and pause here. Let’s take a 15 or so minute break. And when we come back, we will
focus on what it means to program.

One thought on “Privacy, Security, Society – Computer Science for Business Leaders – July 2016

Leave a Reply

Your email address will not be published. Required fields are marked *