Photonic Accelerators for Machine Intelligence
>>Hi, Maya. Mr England. And I am an associate professor of electrical engineering and computer science at M I T. It's been fantastic to be part of this team that Professor Yamamoto put together, uh, for the entity Fire program. It's a great pleasure to report to you are update from the first year I will talk to you today about our recent work in photonic accelerators for machine intelligence. You can already get a flavor of the kind of work that I'll be presenting from the photonic integrated circuit that services a platonic matrix processor that we are developing to try toe break some of the bottle next that we encounter in inference, machine learning tasks in particular tasks like vision, games control or language processing. This work is jointly led with Dr Ryan heavily, uh, scientists at NTT Research, and he will have a poster that you should check out. Uh, in this conference should also say that there are postdoc positions available. Um, just take a look at announcements on Q P lab at m i t dot eu. So if you look at these machine learning applications, look under the hood. You see that a common feature is that they used these artificial neural networks or a and ends where you have an input layer of, let's say, and neurons and values that is connected to the first layer of, let's Say, also and neurons and connecting the first to the second layer would, if you represented it biomatrix requiring and biomatrix that has of order and squared free parameters. >>Okay, now, in traditional machine learning inference, you would have to grab these n squared values from memory. And every time you do that, it costs quite a lot of energy. Maybe you can match, but it's still quite costly in energy, and moreover, each of the input values >>has to be multiplied by that matrix. And if you multiply an end by one vector by an end square matrix, you have to do a border and squared multiplication. Okay, now, on a digital computer, you therefore have to do a voter in secret operations and memory access, which could be quite costly. But the proposition is that on a photonic integrated circuits, perhaps we could do that matrix vector multiplication directly on the P. I C itself by encoding optical fields on sending them through a programmed program into parameter and the output them would be a product of the matrix multiplied by the input vector. And that is actually the experiment. We did, uh, demonstrating that That this is, you know, in principle, possible back in 2017 and a collaboration with Professor Marine Soldier Judge. Now, if we look a little bit more closely at the device is shown here, this consists of a silicon layer that is pattern into wave guides. We do this with foundry. This was fabricated with the opposite foundry, and many thanks to our collaborators who helped make that possible. And and this thing guides light, uh, on about of these wave guides to make these two by two transformations Maxine and the kilometers, as they called >>input to input wave guides coming in to input to output wave guides going out. And by having to phase settings here data and five, we can control any arbitrary, uh, s U two rotation. Now, if I wanna have any modes coming in and modes coming out that could be represented by an S u N unitary transformation, and that's what this kind of trip allows you to dio and That's the key ingredient that really launched us in in my group. I should at this point, acknowledge the people who have made this possible and in particular point out Leon Bernstein and Alex lots as well as, uh, Ryan heavily once more. Also, these other collaborators problems important immigrant soldier dish and, of course, to a funding in particular now three entity research funding. So why optics optics has failed many times before in building computers. But why is this different? And I think the difference is that we now you know, we're not trying to build an entirely new computer out of optics were selective in how we apply optics. We should use optics for what it's good at. And that's probably not so much from non linearity, unnecessarily I mean, not memory, um, communication and fan out great in optics. And as we just said, linear algebra, you can do in optics. Fantastic. Okay, so you should make use of these things and then combine it judiciously with electronic processing to see if you can get an advantage in the entire system out of it, okay. And eso before I move on. Actually, based on the 2017 paper, uh, to startups were created, like intelligence and like, matter and the two students from my group, Nick Harris. And they responded, uh, co started this this this jointly founded by matter. And just after, you know, after, like, about two years, they've been able to create their first, uh, device >>the first metrics. Large scale processor. This is this device has called Mars has 64 input mode. 64 Promodes and the full program ability under the hood. Okay. So because they're integrating wave guides directly with Seamus Electron ICS, they were able to get all the wiring complexity, dealt with all the feedback and so forth. And this device is now able to just process a 64 or 64 unitary majors on the sly. Okay, parameters are three wants total power consumption. Um, it has ah, late and see how long it takes for a matrix to be multiplied by a factor of less than a nanosecond. And because this device works well over a pretty large 20 gigahertz, you could put many channels that are individually at one big hurts, so you can have tens of S U two s u 65 or 64 rotations simultaneously that you could do the sort of back in the envelope. Physics gives you that per multiply accumulate. You have just tens of Tempted jewels. Attn. A moment. So that's very, very competitive. That's that's awesome. Okay, so you see, plan and potentially the breakthroughs that are enabled by photonics here And actually, more recently, they actually one thing that made it possible is very cool Eyes thes My face shifters actually have no hold power, whereas our face shifters studios double modulation. These use, uh, nano scale mechanical modulators that have no hold power. So once you program a unitary, you could just hold it there. No energy consumption added over >>time. So photonics really is on the rise in computing on demand. But once again, you have to be. You have to be careful in how you compare against a chance to find where is the game to be had. So what I've talked so far about is wait stationary photonic processing. Okay, up until here. Now what tronics has that also, but it doesn't have the benefits of the coherence of optical fields transitioning through this, uh, to this to this matrix nor the bandwidth. Okay, Eso So that's Ah, that is, I think a really exciting direction. And these companies are off and they're they're building these trips and we'll see the next couple of months how well this works. Uh, on the A different direction is to have an output stationary matrix vector multiplication. And for this I want to point to this paper we wrote with Ryan, Emily and the other team members that projects the activation functions together with the weight terms onto a detector array and by the interference of the activation function and the weight term by Hamad and >>Affection. It's possible if you think about Hamad and affection that it actually automatically produces the multiplication interference turn between two optical fields gives you the multiplication between them. And so that's what that is making use of. I wanna talk a little bit more about that approach. So we actually did a careful analysis in the P R X paper that was cited in the last >>page and that analysis of the energy consumption show that this device and principal, uh, can compute at at an energy poor multiply accumulate that is below what you could theoretically dio at room temperature using irreversible computer like like our digital computers that we use in everyday life. Um, so I want to illustrate that you can see that from this plot here, but this is showing. It's the number of neurons that you have per layer. And on the vertical axis is the energy per multiply accumulate in terms of jewels. And when we make use of the massive fan out together with this photo electric multiplication by career detection, we estimate that >>we're on this curve here. So the more right. So since our energy consumption scales us and whereas for a for a digital computer it skills and squared, we, um we gain mawr as you go to a larger matrices. So for largest matrices like matrices of >>scale 1,005,000, even with present day technology, we estimate that we would hit and energy per multiply accumulate of about a center draw. Okay, But if we look at if we imagine a photonic device that >>uses a photonic system that uses devices that have already been demonstrated individually but not packaged in large system, you know, individually in research papers, we would be on this curve here where you would very quickly dip underneath the lander, a limit which corresponds to the thermodynamic limit for doing as many bit operations that you would have to do to do the same depth of neural network as we do here. And I should say that all of these numbers were computed for this simulated >>optical neural network, um, for having the equivalent, our rate that a fully digital computer that a digital computer would have and eso equivalent in the error rate. So it's limited in the error by the model itself rather than the imperfections of the devices. Okay. And we benchmark that on the amnesty data set. So that was a theoretical work that looked at the scaling limits and show that there's great, great hope to to really gain tremendously in the energy per bit, but also in the overall latency and throughput. But you shouldn't celebrate too early. You have to really do a careful system level study comparing, uh, electronic approaches, which oftentimes happened analogous approach to the optical approaches. And we did that in the first major step in this digital optical neural network. Uh, study here, which was done together with the PNG who is an electron ICS designer who actually works on, uh, tronics based on c'mon specifically made for machine on an acceleration. And Professor Joel, member of M I t. Who is also a fellow at video And what we studied there in particular, is what if we just replaced on Lee the communication part with optics, Okay. And we looked at, you know, getting the same equivalent error rates that you would have with electronic computer. And that showed that that way should have a benefit for large neural networks, because large neural networks will require lots of communication that eventually do not fit on a single Elektronik trip anymore. At that point, you have to go longer distances, and that's where the optical connections start to win out. So for details, I would like to point to that system level study. But we're now applying more sophisticated studies like this, uh, like that simulate full system simulation to our other optical networks to really see where the benefits that we might have, where we can exploit thes now. Lastly, I want to just say What if we had known nominee Garrity's that >>were actually reversible. There were quantum coherent, in fact, and we looked at that. So supposed to have the same architectural layout. But rather than having like a sexual absorption absorption or photo detection and the electronic non linearity, which is what we've done so far, you have all optical non linearity, okay? Based, for example, on a curve medium. So suppose that we had, like, a strong enough current medium so that the output from one of these transformations can pass through it, get an intensity dependent face shift and then passes into the next layer. Okay, What we did in this case is we said okay. Suppose that you have this. You have multiple layers of these, Uh um accent of the parameter measures. Okay. These air, just like the ones that we had before. >>Um, and you want to train this to do something? So suppose that training is, for example, quantum optical state compression. Okay, you have an optical quantum optical state you'd like to see How much can I compress that to have the same quantum information in it? Okay. And we trained that to discover a efficient algorithm for that. We also trained it for reinforcement, learning for black box, quantum simulation and what? You know what is particularly interesting? Perhaps in new term for one way corner repeaters. So we said if we have a communication network that has these quantum optical neural networks stationed some distance away, you come in with an optical encoded pulse that encodes an optical cubit into many individual photons. How do I repair that multi foot on state to send them the corrected optical state out the other side? This is a one way error correcting scheme. We didn't know how to build it, but we put it as a challenge to the neural network. And we trained in, you know, in simulation we trained the neural network. How toe apply the >>weights in the Matrix transformations to perform that Andi answering actually a challenge in the field of optical quantum networks. So that gives us motivation to try to build these kinds of nonlinear narratives. And we've done a fair amount of work. Uh, in this you can see references five through seven. Here I've talked about thes programmable photonics already for the the benchmark analysis and some of the other related work. Please see Ryan's poster we have? Where? As I mentioned we where we have ongoing work in benchmarking >>optical computing assed part of the NTT program with our collaborators. Um And I think that's the main thing that I want to stay here, you know, at the end is that the exciting thing, really is that the physics tells us that there are many orders of magnitude of efficiency gains, uh, that are to be had, Uh, if we you know, if we can develop the technology to realize it. I was being conservative here with three orders of magnitude. This could be six >>orders of magnitude for larger neural networks that we may have to use and that we may want to use in the future. So the physics tells us there are there is, like, a tremendous amount of gap between where we are and where we could be and that, I think, makes this tremendously exciting >>and makes the NTT five projects so very timely. So with that, you know, thank you for your attention and I'll be happy. Thio talk about any of these topics
SUMMARY :
It's a great pleasure to report to you are update from the first year I And every time you do that, it costs quite a lot of energy. And that is actually the experiment. And as we just said, linear algebra, you can do in optics. rotations simultaneously that you could do the sort of back in the envelope. You have to be careful in how you compare So we actually did a careful analysis in the P R X paper that was cited in the last It's the number of neurons that you have per layer. So the more right. Okay, But if we look at if we many bit operations that you would have to do to do the same depth of neural network And we looked at, you know, getting the same equivalent Suppose that you have this. And we trained in, you know, in simulation we trained the neural network. Uh, in this you can see references five through seven. Uh, if we you know, if we can develop the technology to realize it. So the physics tells us there are there is, you know, thank you for your attention and I'll be happy.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
2017 | DATE | 0.99+ |
Joel | PERSON | 0.99+ |
Ryan | PERSON | 0.99+ |
Nick Harris | PERSON | 0.99+ |
Emily | PERSON | 0.99+ |
Maya | PERSON | 0.99+ |
Yamamoto | PERSON | 0.99+ |
two students | QUANTITY | 0.99+ |
NTT Research | ORGANIZATION | 0.99+ |
Hamad | PERSON | 0.99+ |
Alex | PERSON | 0.99+ |
first | QUANTITY | 0.99+ |
second layer | QUANTITY | 0.99+ |
20 gigahertz | QUANTITY | 0.99+ |
less than a nanosecond | QUANTITY | 0.99+ |
first layer | QUANTITY | 0.99+ |
64 | QUANTITY | 0.99+ |
first metrics | QUANTITY | 0.99+ |
Lee | PERSON | 0.99+ |
today | DATE | 0.99+ |
tens | QUANTITY | 0.98+ |
seven | QUANTITY | 0.98+ |
England | PERSON | 0.98+ |
six | QUANTITY | 0.98+ |
1,005,000 | QUANTITY | 0.98+ |
five | QUANTITY | 0.98+ |
two | QUANTITY | 0.98+ |
64 Promodes | QUANTITY | 0.98+ |
two transformations | QUANTITY | 0.98+ |
five projects | QUANTITY | 0.97+ |
each | QUANTITY | 0.97+ |
Leon Bernstein | PERSON | 0.97+ |
M I T. | ORGANIZATION | 0.96+ |
Garrity | PERSON | 0.96+ |
NTT | ORGANIZATION | 0.96+ |
PNG | ORGANIZATION | 0.96+ |
about two years | QUANTITY | 0.94+ |
one thing | QUANTITY | 0.94+ |
one way | QUANTITY | 0.94+ |
Thio | PERSON | 0.92+ |
Marine | PERSON | 0.92+ |
two optical fields | QUANTITY | 0.89+ |
first year | QUANTITY | 0.88+ |
64 rotations | QUANTITY | 0.87+ |
one vector | QUANTITY | 0.87+ |
three entity | QUANTITY | 0.85+ |
three orders | QUANTITY | 0.84+ |
one | QUANTITY | 0.84+ |
single | QUANTITY | 0.84+ |
Professor | PERSON | 0.83+ |
next couple of months | DATE | 0.81+ |
three | QUANTITY | 0.79+ |
tens of Tempted jewels | QUANTITY | 0.74+ |
M I t. | ORGANIZATION | 0.73+ |
Andi | PERSON | 0.71+ |
Dr | PERSON | 0.68+ |
first major step | QUANTITY | 0.67+ |
Seamus Electron ICS | ORGANIZATION | 0.63+ |
65 | QUANTITY | 0.62+ |
Who | PERSON | 0.59+ |
Q P | ORGANIZATION | 0.57+ |
P | TITLE | 0.48+ |
orders | QUANTITY | 0.47+ |
Mars | LOCATION | 0.43+ |
Maxine | PERSON | 0.41+ |