UNLIST TILL 4/2 - Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning

>> Paige: Hello, everybody, and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled "Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning." I'm Paige Roberts, Opensource Relations Manager at Vertica, and I'll be your host for this session. Joining me is Vertica Software Engineer, George Larionov. >> George: Hi. >> Paige: (chuckles) That's George. So, before we begin, I encourage you guys to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. So, as soon as a question occurs to you, go ahead and type it in, and there will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to get to during that time. Any questions we don't get to, we'll do our best to answer offline. Now, alternatively, you can visit Vertica Forum to post your questions there, after the session. Our engineering team is planning to join the forums to keep the conversation going, so you can ask an engineer afterwards, just as if it were a regular conference in person. Also, reminder, you can maximize your screen by clicking the double-arrow button in the lower right corner of the slides. And, before you ask, yes, this virtual session is being recorded, and it will be available to view by the end this week. We'll send you a notification as soon as it's ready. Now, let's get started, over to you, George. >> George: Thank you, Paige. So, I've been introduced. I'm a Software Engineer at Vertica, and today I'm going to be talking about a new feature, Vertica's Integration with TensorFlow. So, first, I'm going to go over what is TensorFlow and what are neural networks. Then, I'm going to talk about why integrating with TensorFlow is a useful feature, and, finally, I am going to talk about the integration itself and give an example. So, as we get started here, what is TensorFlow? TensorFlow is an opensource machine learning library, developed by Google, and it's actually one of many such libraries. And, the whole point of libraries like TensorFlow is to simplify the whole process of working with neural networks, such as creating, training, and using them, so that it's available to everyone, as opposed to just a small subset of researchers. So, neural networks are computing systems that allow us to solve various tasks. Traditionally, computing algorithms were designed completely from the ground up by engineers like me, and we had to manually sift through the data and decide which parts are important for the task and which are not. Neural networks aim to solve this problem, a little bit, by sifting through the data themselves, automatically and finding traits and features which correlate to the right results. So, you can think of it as neural networks learning to solve a specific task by looking through the data without having human beings have to sit and sift through the data themselves. So, there's a couple necessary parts to getting a trained neural model, which is the final goal. By the way, a neural model is the same as a neural network. Those are synonymous. So, first, you need this light blue circle, an untrained neural model, which is pretty easy to get in TensorFlow, and, in edition to that, you need your training data. Now, this involves both training inputs and training labels, and I'll talk about exactly what those two things are on the next slide. But, basically, you need to train your model with the training data, and, once it is trained, you can use your trained model to predict on just the purple circle, so new training inputs. And, it will predict the training labels for you. You don't have to label it anymore. So, a neural network can be thought of as... Training a neural network can be thought of as teaching a person how to do something. For example, if I want to learn to speak a new language, let's say French, I would probably hire some sort of tutor to help me with that task, and I would need a lot of practice constructing and saying sentences in French. And a lot of feedback from my tutor on whether my pronunciation or grammar, et cetera, is correct. And, so, that would take me some time, but, finally, hopefully, I would be able to learn the language and speak it without any sort of feedback, getting it right. So, in a very similar manner, a neural network needs to practice on, example, training data, first, and, along with that data, it needs labeled data. In this case, the labeled data is kind of analogous to the tutor. It is the correct answers, so that the network can learn what those look like. But, ultimately, the goal is to predict on unlabeled data which is analogous to me knowing how to speak French. So, I went over most of the bullets. A neural network needs a lot of practice. To do that, it needs a lot of good labeled data, and, finally, since a neural network needs to iterate over the training data many, many times, it needs a powerful machine which can do that in a reasonable amount of time. So, here's a quick checklist on what you need if you have a specific task that you want to solve with a neural network. So, the first thing you need is a powerful machine for training. We discussed why this is important. Then, you need TensorFlow installed on the machine, of course, and you need a dataset and labels for your dataset. Now, this dataset can be hundreds of examples, thousands, sometimes even millions. I won't go into that because the dataset size really depends on the task at hand, but if you have these four things, you can train a good neural network that will predict whatever result you want it to predict at the end. So, we've talked about neural networks and TensorFlow, but the question is if we already have a lot of built-in machine-learning algorithms in Vertica, then why do we need to use TensorFlow? And, to answer that question, let's look at this dataset. So, this is a pretty simple toy dataset with 20,000 points, but it shows, it simulates a more complex dataset with some sort of two different classes which are not related in a simple way. So, the existing machine-learning algorithms that Vertica already has, mostly fail on this pretty simple dataset. Linear models can't really draw a good line separating the two types of points. Naïve Bayes, also, performs pretty badly, and even the Random Forest algorithm, which is a pretty powerful algorithm, with 300 trees gets only 80% accuracy. However, a neural network with only two hidden layers gets 99% accuracy in about ten minutes of training. So, I hope that's a pretty compelling reason to use neural networks, at least sometimes. So, as an aside, there are plenty of tasks that do fit the existing machine-learning algorithms in Vertica. That's why they're there, and if one of your tasks that you want to solve fits one of the existing algorithms, well, then I would recommend using that algorithm, not TensorFlow, because, while neural networks have their place and are very powerful, it's often easier to use an existing algorithm, if possible. Okay, so, now that we've talked about why neural networks are needed, let's talk about integrating them with Vertica. So, neural networks are best trained using GPUs, which are Graphics Processing Units, and it's, basically, just a different processing unit than a CPU. GPUs are good for training neural networks because they excel at doing many, many simple operations at the same time, which is needed for a neural network to be able to iterate through the training data many times. However, Vertica runs on CPUs and cannot run on GPUs at all because that's not how it was designed. So, to train our neural networks, we have to go outside of Vertica, and exporting a small batch of training data is pretty simple. So, that's not really a problem, but, given this information, why do we even need Vertica? If we train outside, then why not do everything outside of Vertica? So, to answer that question, here is a slide that Philips was nice enough to let us use. This is an example of production system at Philips. So, it consists of two branches. On the left, we have a branch with historical device log data, and this can kind of be thought of as a bunch of training data. And, all that data goes through some data integration, data analysis. Basically, this is where you train your models, whether or not they are neural networks, but, for the purpose of this talk, this is where you would train your neural network. And, on the right, we have a branch which has live device log data coming in from various MRI machines, CAT scan machines, et cetera, and this is a ton of data. So, these machines are constantly running. They're constantly on, and there's a bunch of them. So, data just keeps streaming in, and, so, we don't want this data to have to take any unnecessary detours because that would greatly slow down the whole system. So, this data in the right branch goes through an already trained predictive model, which need to be pretty fast, and, finally, it allows Philips to do some maintenance on these machines before they actually break, which helps Philips, obviously, and definitely the medical industry as well. So, I hope this slide helped explain the complexity of a live production system and why it might not be reasonable to train your neural networks directly in the system with the live device log data. So, a quick summary on just the neural networks section. So, neural networks are powerful, but they need a lot of processing power to train which can't really be done well in a production pipeline. However, they are cheap and fast to predict with. Prediction with a neural network does not require GPU anymore. And, they can be very useful in production, so we do want them there. We just don't want to train them there. So, the question is, now, how do we get neural networks into production? So, we have, basically, two options. The first option is to take the data and export it to our machine with TensorFlow, our powerful GPU machine, or we can take our TensorFlow model and put it where the data is. In this case, let's say that that is Vertica. So, I'm going to go through some pros and cons of these two approaches. The first one is bringing the data to the analytics. The pros of this approach are that TensorFlow is already installed, running on this GPU machine, and we don't have to move the model at all. The cons, however, are that we have to transfer all the data to this machine and if that data is big, if it's, I don't know, gigabytes, terabytes, et cetera, then that becomes a huge bottleneck because you can only transfer in small quantities. Because GPU machines tend to not be that big. Furthermore, TensorFlow prediction doesn't actually need a GPU. So, you would end up paying for an expensive GPU for no reason. It's not parallelized because you just have one GPU machine. You can't put your production system on this GPU, as we discussed. And, so, you're left with good results, but not fast and not where you need them. So, now, let's look at the second option. So, the second option is bringing the analytics to the data. So, the pros of this approach are that we can integrate with our production system. It's low impact because prediction is not processor intensive. It's cheap, or, at least, it's pretty much as cheap as your system was before. It's parallelized because Vertica was always parallelized, which we'll talk about in the next slide. There's no extra data movement. You get the benefit from model management in Vertica, meaning, if you import multiple TensorFlow models, you can keep track of their various attributes, when they were imported, et cetera. And, the results are right where you need them, inside your production pipeline. So, two cons are that TensorFlow is limited to just prediction inside Vertica, and, if you want to retrain your model, you need to do that outside of Vertica and, then, reimport. So, just as a recap of parallelization. Everything in Vertica is parallelized and distributed, and TensorFlow is no exception. So, when you import your TensorFlow model to your Vertica cluster, it gets copied to all the nodes, automatically, and TensorFlow will run in fenced mode which means that it the TensorFlow process fails for whatever reason, even though it shouldn't, but if it does, Vertica itself will not crash, which is obviously important. And, finally, prediction happens on each node. There are multiple threads of TensorFlow processes running, processing different little bits of data, which is faster, much faster, than processing the data line by line because it happens all in a parallelized fashion. And, so, the result is fast prediction. So, here's an example which I hope is a little closer to what everyone is used to than the usual machine learning TensorFlow example. This is the Boston housing dataset, or, rather, a small subset of it. Now, on the left, we have the input data to go back to, I think, the first slide, and, on the right, is the training label. So, the input data consists of, each line is a plot of land in Boston, along with various attributes, such as the level of crime in that area, how much industry is in that area, whether it's on the Charles River, et cetera, and, on the right, we have as the labels the median house value in that plot of land. And, so, the goal is to put all this data into the neural network and, finally, get a model which can train... I don't know, which can predict on new incoming data and predict a good housing value for that data. Now, I'm going to go through, step by step, how to actually use TensorFlow models in Vertica. So, the first step I won't go into much detail on because there are countless tutorials and resources online on how to use TensorFlow to train a neural network, so that's the first step. Second step is to save the model in TensorFlow's 'frozen graph' format. Again, this information is available online. The third step is to create a small, simple JSON file describing the inputs and outputs of the model, and what data type they are, et cetera. And, this is needed for Vertica to be able to translate from TensorFlow land into Vertica equal land, so that it can use a sequel table instead of the input set TensorFlow usually takes. So, once you have your model file and your JSON file, you want to put both of those files in a directory on a node, any node, in a Vertica cluster, and name that directory whatever you want your model to ultimately be called inside of Vertica. So, once you do that you can go ahead and import that directory into Vertica. So, this import model's function already exists in Vertica. All we added was a new category to be able to import. So, what you need to do is specify the pass to your neural network directory and specify that the category that the model is is a TensorFlow model. Once you successfully import, in order to predict, you run this brand new predict TensorFlow function, so, in this case, we're predicting on everything from the input table, which is what the star means. The model name is Boston housing net which is the name of your directory, and, then, there's a little bit of boilerplate. And, the two ID and value after the as are just the names of the columns of your outputs, and, finally, the Boston housing data is whatever sequel table you want to predict on that fits the import type of your network. And, this will output a bunch of predictions. In this case, values of houses that the network thinks are appropriate for all the input data. So, just a quick summary. So, we talked about what is TensorFlow and what are neural networks, and, then, we discussed that TensorFlow works best on GPUs because it needs very specific characteristics. That is TensorFlow works best for training on GPUs while Vertica is designed to use CPUs, and it's really good at storing and accessing a lot of data quickly. But, it's not very well designed for having neural networks trained inside of it. Then, we talked about how neural models are powerful, and we want to use them in our production flow. And, since prediction is fast, we can go ahead and do that, but we just don't want to train there, and, finally, I presented Vertica TensorFlow integration which allows importing a trained neural model, a trained neural TensorFlow model, into Vertica and predicting on all the data that is inside Vertica with few simple lines of sequel. So, thank you for listening. I'm going to take some questions, now.

Published Date : Mar 30 2020

SUMMARY :

and I'll be your host for this session. So, as soon as a question occurs to you, So, the second option is bringing the analytics to the data.

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
Philips	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
George	PERSON	0.99+
99%	QUANTITY	0.99+
20,000 points	QUANTITY	0.99+
second option	QUANTITY	0.99+
Charles River	LOCATION	0.99+
Google	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
third step	QUANTITY	0.99+
first step	QUANTITY	0.99+
George Larionov	PERSON	0.99+
first option	QUANTITY	0.99+
two things	QUANTITY	0.99+
first	QUANTITY	0.99+
Second step	QUANTITY	0.99+
Paige	PERSON	0.99+
each line	QUANTITY	0.99+
two branches	QUANTITY	0.99+
Today	DATE	0.99+
two options	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
300 trees	QUANTITY	0.99+
two approaches	QUANTITY	0.99+
millions	QUANTITY	0.99+
first slide	QUANTITY	0.99+
TensorFlow	TITLE	0.99+
Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning	TITLE	0.99+
two types	QUANTITY	0.99+
two different classes	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
first one	QUANTITY	0.98+
two cons	QUANTITY	0.97+
about ten minutes	QUANTITY	0.97+
two hidden layers	QUANTITY	0.97+
French	OTHER	0.96+
each node	QUANTITY	0.95+
one	QUANTITY	0.95+
end this week	DATE	0.94+
two ID	QUANTITY	0.91+
four things	QUANTITY	0.89+

Karthik Rau, SignalFX | BigDataSV 2015

hi Jeff Rick here with the cube welcome were excited to to get out and talk to startups people that are founding companies when they come out of stealth mode we're in a great position that we get a chance to talk to him early and we're really excited to have a cute conversation with karthik rao the founder and CEO of signal effects just coming out of stealth congratulations thank you Jeff so how long you've been working behind the scenes trying to get this thing going yeah we've been at it for two years now so two years a founder and I started the company in February of 2013 so excited to finally launch and make our product available to the world all right excellent congratulations that's always a great thing we've launched a few companies on the cube so hopefully this will be another great success so talk a little bit about first off you and your journey we have a lot of entrepreneurs that watch a show and I think it's it's an interesting topic as to how do you get to the place where you basically found in launched a company yeah absolutely I started my career at a company at a cloud company before cloud really exists this is a market there's a company called loud cloud oh yeah Marc Andreessen right recent horse or two of the company and we were trying to do what the public cloud vendors are doing today before the market was really all that big and before the technologies really existed to do it well but that was my first introduction to cloud o came out of college and that's where I met my co-founder Phillip Lou as well Phil and I were both working on the monitoring products at loud cloud from there I ended up at VMware for a good run of about seven years where I ran product had always wanted to start a company and then a couple of years ago Phil and I thought the timing was right and we had a great idea and decided to go build signal effects together okay so what was kind of the genesis of the idea you know a lot of times it's a cool technology looking for a problem to solve a lot of times it's a problem that you know and if I only had one of these they would solve my problems so how did the how did that whole process work yeah it was rooted in personal experience my co-founder phil was at Facebook for several years and was responsible for building the monitoring systems at Facebook and through our personal experience and what we'd seen in the marketplace we had a fundamental belief and a vision that monitoring for modern applications is now an analytics problem modern applications are distributed they're not you know a single database running on is system you know even small companies now have hundreds of VMs running on public cloud infrastructure and so the only way to really understand what's happening across all of these distributed applications is to collect the data centrally and use analytics and so that was our fundamental insight when we started signal effects what we saw in the marketplace was that most of the monitoring technologies haven't really evolved in the past 15 or 20 years and they're still largely designed for traditional static enterprise applications where if you get an alert when an individual node is down or a static thresholds been passed that's enough but that doesn't really work for modern apps because they're so distributed right if one node out of your twenty nodes is having a problem it doesn't necessarily mean that your application is having a trough having a problem and so the only way to really draw that insight is to collect the data and do analytics on it and that's what signal okay really because that distributed nature of modern of modern apps and modern architecture yes there are three things that are fundamentally different number one modern applications are distributed in nature and so you really have to look at patterns across many systems number two they're changing for more frequently than traditional enterprise apps because they're hosted for the most part route applications so you can push changes out every day if you want to and then third they're typically operated by product organizations and not IT organizations so you have developers or DevOps organizations that are actually operating the software and those three changes are quite substantial and require a new set of products right and so the other guys are just they're still kind of in the you know fire off the pager alert something is going down it's very noisy yes when you're firing off alerts every time an individual alert goes off when you've got thousands of a DM and we all know that the trend these days is towards micro services architectures you know small componentized you know containers or VMs and so you don't have to have a very sophisticated large application to have a lot of systems it's so do you fit into other existing kind of infrastructure monitoring systems or kind of infrastructure management systems so I'm sure you know it's another tool right guys got to manage a lot of stuff how does that work yeah we are focused on the analytics part of the problem okay so we collect data from any sources so our customers are typically sending us data you know infrastructure data that they're collecting using their own agents we have agents that we can provide to collect it a lot of the developers are instrumenting their own metrics that they care about so for example they might care about latency metrics and knowing Layton sees by customer by region so they'll send us all that data and then we provide a very rich analytics solution and platform for them to monitor all of this and and in real time detect patterns and anomalies so you just said you have customers but you coming out stealth so you have some beta customers already yes we have great customers already now just beta customers right are great console customers awesome yes congratulation thank you very much they're very excited about our product and we you know they range from small startups to fairly large web companies that are sending in tens of billions of data points every day into signal effects right right and again in the interest of sharing the knowledge with all of our entrepreneurs out there you know when did they get involved in the process how much of the kind of product development definition did they did they participate in you said you've been at it for a couple years yeah we've had a lot of conviction about this space from the very beginning because we our team had solved this problem for themselves and in previous experiences but we did include we've been in beta for about six months but better to launch and so over the course of those six months we recalibrated based on feedback we got from customers but on the whole we you know are we philosophy and the approach that we took was was pretty much validated by the early customers that we engaged with okay excellent and so um I assume your venture funded we are can you can you talk about who your who your backers are yes we raised twenty eight and a half million dollars eight million dollars yeah twenty-eight point five million dollars from andreessen horowitz okay with Ben Horowitz on our board okay and Charles River ventures with a lurker on our board and how big are you now time in terms of the company well we're just getting started now right at this is 1 million all that money - well we we've got a great group of engineers or our company is you know and still in the few dozen people stage at this point ok we're planning to invest aggressively in building out our team both on R&D and on the go-to-market side this excellent once you detect patterns and anomalies what's kind of the action steps you work with with other systems to swap stuff out together because now I hear like it's these huge data centers they don't swap out this they don't swap out machines they swap out racks it's soon they'll be swapping out data centers so what are some of the prescriptive things that people are using they couldn't do before by using your yeah I'll give you a great example of that one of our early beta customers they do code pushes very aggressively you know once a week they'll push out changes into their environment and they had a signal effects console open which and we're a real-time solution so every second they're seeing updates of what was happening in their infrastructure they pushed out their code and they immediately detected a memory leak and they saw their memory usage just growing immediately after they did their code Bush and they were able to roll it back before any of their users noticed any issues and so that's an example of these days a lot of problems introduced into environments are human driven problems it's a code push it's a new user gets onboard it or a new customer gets onboard and all of a sudden there's 10x the load onto your systems and so when you have a product like signal effects where you can in real time understand everything that's happening in your environment you can quickly detect these changes and determine what the appropriate next step is and that appropriate next step will depend on your application and who you are and what you're building right so our key philosophies we get out of your way but we give you all of the insights and the tools to figure out what's happening in your arm right it's interesting that really kind of two comes from from your partners you know kind of Facebook experience right because they're pushing out new code all the time when there's no fast and break things right right exactly and then you're at VMware so you know kind of the enterprise site so what if you could speak a little bit about kind of this consumerization of IT on the enterprise side and not so much the way that the look and feel of the thing works but really taking best practices from a consumer IT companies like Facebook like Amazon that really changed the game because it used to be the big enterprise software guys had the best apps now it's it's really flipped for people like Google and Netflix and those guys have the best apps and even more importantly they drive the expectation of the behavior of an application every Enterprise is finally getting it and then are they really embracing it we're definitely seeing a growth in new application development I think you know when I spend a lot of time talking to CIOs at enterprises as well and they all understand that in order to be competitive you have to invest in applications it's not enough to just view IT as a cost center and they're all beginning to invest in application development and in some cases these are digital media teams that are separate from traditional IT and other places it's you know they're they're more closely tied together but we absolutely see a kind of growth in application development in many of these end up looking a lot like the development teams that we see here in the Bay Area you know and companies that are building staffs and consumer cloud apps yeah exciting time so you should coming out of stealth what's kind of your your next kind of milestone that you're looking forward to you have a big some announcements you got show you're gonna kind of watch out we're we're we're gonna see you make a big splash well for us it's it's steadily building our business and so we hope to you know we're launching now and we've got a lot of great customers already and hope to sign on several more and help our customers build great applications about that's our focus again congratulations two years that's a big development project Karthik thank growl the founder and CEO of signal effects just launching their company coming out of stealth we'd love to get them on the cube share the knowledge with you guys both the people that are trying to start your own company take a little inspiration as well as as the people that need the service tomorrow with the cloud with a modern application thanks a lot thank you Jeff thank you you're watching Jeff Rick cube conversation see you next time

Published Date : Mar 12 2015

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
February of 2013	DATE	0.99+
Jeff	PERSON	0.99+
Phil	PERSON	0.99+
Marc Andreessen	PERSON	0.99+
Ben Horowitz	PERSON	0.99+
two years	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
10x	QUANTITY	0.99+
Karthik	PERSON	0.99+
karthik rao	PERSON	0.99+
signal effects	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Karthik Rau	PERSON	0.99+
1 million	QUANTITY	0.99+
andreessen horowitz	PERSON	0.99+
three changes	QUANTITY	0.99+
five million dollars	QUANTITY	0.99+
six months	QUANTITY	0.99+
SignalFX	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Jeff Rick	PERSON	0.99+
eight million dollars	QUANTITY	0.99+
loud cloud	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.98+
twenty eight and a half million dollars	QUANTITY	0.98+
one	QUANTITY	0.98+
about six months	QUANTITY	0.98+
VMware	ORGANIZATION	0.98+
three things	QUANTITY	0.98+
Bay Area	LOCATION	0.98+
both	QUANTITY	0.98+
hundreds of VMs	QUANTITY	0.97+
tens of billions of data points	QUANTITY	0.96+
Jeff Rick	PERSON	0.96+
first	QUANTITY	0.96+
about seven years	QUANTITY	0.95+
today	DATE	0.94+
a couple of years ago	DATE	0.94+
2015	DATE	0.93+
dozen people	QUANTITY	0.93+
tomorrow	DATE	0.93+
twenty-eight	QUANTITY	0.92+
several years	QUANTITY	0.92+
BigDataSV	ORGANIZATION	0.92+
first introduction	QUANTITY	0.91+
single database	QUANTITY	0.87+
twenty nodes	QUANTITY	0.86+
Layton	ORGANIZATION	0.86+
once a week	QUANTITY	0.85+
third	QUANTITY	0.84+
Phillip Lou	PERSON	0.76+
thousands of a DM	QUANTITY	0.74+
one node	QUANTITY	0.74+
phil	PERSON	0.71+
20 years	QUANTITY	0.71+
times	QUANTITY	0.7+
lot	QUANTITY	0.69+
a couple years	QUANTITY	0.65+
lot of stuff	QUANTITY	0.64+
every second	QUANTITY	0.62+
Charles River	LOCATION	0.59+
past 15	DATE	0.59+
companies	QUANTITY	0.56+
systems	QUANTITY	0.55+
every day	QUANTITY	0.53+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Charles River: