AI Internet celebrity Andrej Karpathy’s latest masterpiece, Burundi Sugar daddy quora teaches 33 practice skills of neural network

Huaqiu PCB

Highly reliable multilayer board manufacturer

Huaqiu SMT

Highly reliable one-stop PCBA intelligent manufacturer

Huaqiu Mall

Self-operated electronic components mall

PCB Layout

High multi-layer, high-density product design

Steel mesh manufacturing

Focus on high-quality steel mesh manufacturing

BOM ordering

Specialized Researched one-stop purchasing solution

Huaqiu DFM

One-click analysis of hidden design risks

Huaqiu Certification

Certification testing is beyond doubt

AI Network The latest masterpiece of Red Andrej Karpathy teaches you 33 training skills of neural network. It is a must-read for practical information.

Andrej Kapathy is one of the experts in the field of computer vision and deep learning. He graduated from the Stanford Artificial Intelligence Laboratory and studied with Professor Li Feifei for his Ph.D. at Google Burundi SugarInterned at Brain and DeepMind, worked with Andrew Ng, and worked in several deep learning laboratories in the industry.

In June 2017, Karpathy joined Tesla as head of the AI and Autopilot Vision departments, reporting directly to Elon Musk.

More importantly, Karpathy is willing and good at sharing his own experiences and opinions with friends. He is very active on Twitter and blogs and is known as an AI “Internet celebrity”.

KarKarpathy’s Twitter profile says “I like to train Deep Neural Nets on large datasets.” Recently, Karpathy shared his experience in training neural networks and received a lot of praise.

The following is Xinzhiyuan’s translation of this article “Neural Network Alchemy”:

30 lines of code to practice neural network? Too young

A few weeks ago, I tweeted a “Most Common Neural Network Mistakes” list of some common mistakes related to training neural networks. This tweet got more engagement than I expected (there was even an online workshop :). Obviously, many people have experienced firsthand the huge gap between “this is how convolutional layers work” and “our convolutional network achieves state-of-the-art results.”

So I thought it would be better to open my almost dusty blog and write a long article on this topic. However, instead of listing more common errors in detail, I want to take a more in-depth look at how you can avoid them entirely (or fix them quickly). The trick to doing this is to follow a specific process, which, as far as I know, is rarely documented.

So, let’s start with two important observations.

1) Neural network training is a kind of “leaky abstraction” (leaky abstraction)

It is said that it is not difficult to start training neural networks. Many libraries and frameworks pride themselves on being able to solve your data problems with 30 lines of miracle code snippets, giving the (false) impression of plug-and-play. A rare approach is:

>>> your_data = # plug your awesome dataset here >>> model = SuperCrossValidator(SuperDuper.fit, your_data, ResNet50, SGDOptimizer)# conquer world here

These libraries and examples activate It removes the parts of standard software that our brains are familiar with – usually having access to clean APIs and abstractions. The request library is demonstrated as follows:

>>> r = requests.get( https://api.github.com/user , auth=( user , pass )) >>> r.status_code2Burundins Escort00

This is cool! The developers bravely took on the task of understanding query strings, urls, GET/POST requests, HTTP connections, etc., and hid it to a large extent.Complexity behind a few lines of code. This is what we are familiar with and expect.

Unfortunately, neural networks don’t work that way. If you deviate slightly from practicing ImageNet classifiers, they are not “off the shelf” techniques.

In my previous blog post introducing backpropagation, I tried to illustrate this by using backpropagation and calling it “image leakage”, but unfortunately, the situation is much worse. Backprop + SGD doesn’t magically make your collection task. Batch norm doesn’t magically make it converge faster. RNNs don’t magically let you “unplug” text. You can express your problem in RL, but just because you do that doesn’t mean you should. If you insist on using this technique without knowing how it works, you’re likely to fail. Which brings me to…

2) Neural network training often fails silently

When you corrupt or misconfigurate your code, you usually encounter some kind of anomaly. It’s like you pulled out an integer containing the expected string. This function only requires 3 parameters. But the import failed. This key basically does not exist. The number of elements in these two lists is not equal. In addition, unit tests can often be created for a specific performance.

This is just the beginning of practicing neural harvesting. Grammatically speaking, everything is correct, but the whole thing is not set up well, which is really difficult to judge. The “possible scope for error” is large, BI Escorts conforms to logic (as opposed to syntax), and is difficult to unit test. For example, during data enhancement, when you flip the image from left to right, you may forget about the flip label. Your network can still work (shockingly) well because your network can learn internally to detect a flipped image and then manipulate the prediction of flipping it. Maybe your regression model accidentally outputs what it is trying to predict due to an off-by-one bug. Maybe you try to prune the gradient, but end up reducing the loss, causing outlier examples to be ignored during training. Or you initialize your weights from a pre-trained checkpoint but don’t use the original averages. Maybe you just messed up the settings for regularization strength, learning rate, decay rate, model size, etc. So, only under lucky circumstances will a misconfigured neural network throw an exception; most of the time, it will continue to train but silently make things worse.

Therefore, using “fast and violent” methods to train neural networks is not feasible and will only lead to pain. While pain is a very natural part of making neural networks work properly, it can be mitigated by being thorough, defensive, paranoid, and visualizing almost everything possible. In my experience, the qualities most associated with success in deep learning are patience and attention to detail.

How to train a neural network

Based on the above two facts, I developed a specific process for myself. When I apply the neural network to a new problem, I follow this process. In this article I will try to describe this process.

You will see that it attaches great importance to the above two principles. In particular, it is built from simple to complex, and at each step we make detailed assumptions about what will happen, then verify them through experiments or conduct investigations until we find a problem. What we are trying to avoid is introducing a lot of “unproven” complexity at the same time, which is bound to introduce bugs/faulty configurations that will never be found. If you were writing neural network code like in the exercises, you would want to use a very small learning rate and make guesses, and then make predictions every time The complete test set is evaluated after iterations.

1. Combing the data

The first step in learning neural network is to not touch any neural network code at all, but to start with a thorough examination of the data. This step is crucial. I like to spend a lot of time (hours) reading thousands of examples, understanding their distribution and looking for patterns. Fortunately, our brains are very good at doing this. At one point, I discovered that the data included repeated examples. Another time I invented broken images/tags. I look for imbalances and errors in the data. I also usually track my own process of classifying the data, which is indicative of Burundi Sugar what we will eventually explore. Class schema type.

For example, are only partial features sufficient, or do global contexts be required? How many variables are there and in what form do they appear? Which variables are fake and can be disposed of after the fact? Does spatial location matter, or do we want to pool it uniformly? How important is detail, and to what extent can we downsample an image? How many tones does the label have?

Additionally, since the neural network is actually a compressed/compiled version of the dataset, you can check your network predictions and understand where they may have come from. If your network gives you predictions that are different from what you see in the data, something is wrong.

Once you get a qualitative feel, it’s also a good idea to write some simple code to search/filter/sort whatever you can think of (e.g. type of tag, size of comments, number of comments, etc. etc.) and visualize their distribution and outliers along any axis. Outliers almost always reveal some bugs in the quality of the data tool or in the preprocessing.

2. Establish end-to-end training/evaluationFramework + Getting a Simple Baseline

Now that we know our data, can we implement a cool multi-standard ASPP FPN ResNet and start training awesome models? Definitely not. This is a painful path. The next step is to establish a complete training + evaluation framework and gain trust in its accuracy through a series of experiments. At this stage, it’s best to choose some simple model that you can’t mess up – such as a linear classifier, or a very small convolutional network. We hope to train this network, visualize losses, any other metrics (such as accuracy), model predictions, and in the process perform a series of melting experiments with clear assumptions.

Reminders and skills for this stage:

Fix random seed. Always use a fixed random seed to ensure that when you run the code twice, you will get the same results. This eliminates the cause of variation and will help stick to your senses.

Simplify. Make sure not to have any unnecessary fantasies. For example, all data enhancements must be turned off at this stage. Data augmentation is a regularization strategy that we may incorporate later, but for now it just introduces some awkward bugs.

Add useful numbers to your review. When drawing tests are lost, run the evaluation on the entire (large) test set. Don’t just plot test losses on batches and then rely on smoothing them in Tensorboard. We want to pursue what is right and are willing to waste time.

Authentication lost during initialization. Verify that your losses start with the right loss value. For example, if you initialize the last layer correctly, you should measure the softmax -log(1/n_classesBurundi Sugar Daddy during initialization) ). The same default values can be used for L2 fallback, Huber loss, etc.

Right place initialBurundins Escortization. Initialize the last layer of weights correctly. For example, if you want to return some values with an average of 50, initialize the ultimate error to 50. If you have an unbalanced data set with a positive/negative ratio of 1:10, then set the error on your log so that your network guess probability is 0.1 when initialized. Properly setting these parameters will speed up the convergence rate and eliminate the “hockey stick” loss curve where your network is basically just learning error in the last few iterations.

human baseline. Monitor goals other than humanly accountable and reviewable losses (e.g., correctness). Evaluate your own (human) accuracy as much as possible and compare it to that. Maybe,Annotate the test data twice, with one annotation as the guess and the second annotation as the ground truth for each example.

input-indepent baseline. Train a baseline that is independent of the outputs (for example, the easiest way is to set all outputs to zero). This should be worse than actually pulling out the data without returning it to zero. That is, can your model learn to extract any information from the output?

Overfitting on a single batch of data. Only overfit a batch containing a large number of examples (for example, only two). To do this we need to increase the capacity of the model (e.g. add layers or filters) and verify the lowest loss value we can achieve (e.g. 0). I also like to be able to Burundins Sugardaddyvisualize labels and predictions in the same graph and make sure they end up being perfect once minimal losses are achieved Alignment. If there is no perfect alignment, then somewhere BI Escorts Burundi Sugar Daddyhad a bug that prevented us from continuing to the next stage.

Verify the reduction in training losses. At this stage, you can hope to achieve underfitting on the data set, because the model at this stage is a toy model. Try to increase its capacity a little and see if the training loss is reduced.

Visualize output before it is collected. The right time to visualize the data is before running y_hat = model(x) (or sess.run in tf). That is, we need tools that correctly visualize the output to your collection, decoding and visualizing the raw data tensors and labels. This is the only “source of truth”. This process saved me a lot of time and raised issues in data preprocessing and enhancement.

Visual guessing static. Burundi Sugar Daddy experienced in practice BI Escorts During the process, I like to visualize the model predictions for fixed test batches. These “dynamics” of how the predictions will change position will provide you with good intuition about the progress of your training. Many times, if a network is excessively volatile in some way, it shows instabilityBurundins Sugardaddy, you may find it “difficult” for the network to fit your data. Very low or very high learning rates are easily noticeable in fluctuating numbers.

Apply backpropagation to draw dependency graphs. Deep learning code often includes complex, vectorized, and broadcast operations. One of the more common bugs I’ve encountered is that people use views instead of transpose/permute somewhere, inadvertently mixing in batch dimensional information. However, your network usually still trains well because it will learn to ignore data from other samples. One way to debug this is to set the loss of some example i to 1.0, then run backpropagation all the way to the output and make sure you only get a non-zero gradient on i-theexample. In other words, gradients provide dependency information in the network, which is very useful for debugging.

Generalized special cases. This is a more general coding skill, but I often see people making bugs when using this Burundi Sugar skill, especially When writing an absolutely universal function from scratch. I like to write very specific functions for what I’m doing, make the function work, and then generalize it to ensure the same results. This usually applies to vectorized code, where I write the complete loop version first and then convert it to vectorized code.

3. Overfitting

At this stage, we should have a good understanding of the data set, and we have a complete training and evaluation pipeline. For any given model, we can (reproducibly) calculate a reliable metric. We can also use the performance of baselines independent of the output (and beat these baseline performances), and we have a rough idea of human performance. The task at this stage is to iterate on a model with good performance.

Finding a good model can be divided into two extremes: first get a large enough model that it can be overfitted, and then adjust it appropriately (giving up some training losses to improve verification losses). I think the advantage of using these two stages is that this approach can re-expose some issues, bugs, or faulty configurations if we are ultimately unable to achieve a low enough error rate with any model.

Some tips and techniques for this stage:

Select a model. In order to achieve good training loss, you need to choose an appropriate architecture for your data. My suggestion is: Don’t be a hero. I’ve seen a lot of crazy people stack up the various tools in the neural network toolbox like Lego bricks in various strange structures. Efforts should be made to avoid doing this in the early stages of the project.

I always propose to find the most relevant papers first, and copy the simplest structures among them first to get good results.function. For example, if you are doing image classification, don’t think about being a hero, just use ResNet-50 first. You can make some more custom settings and improvements later and achieve better performance than itBI Escorts.

It’s always right to choose Adam. In the early stages of setting up a baseline approach, I like to use the Adam architecture with a learning rate of 3e-4. In my experience, Adam is more tolerant of hyperparameters, including poor learning rates. For ConvNets, a properly tuned SGD will almost always be slightly better than AdaBI Escortsm, but the former’s optimal learning rate area is much narrower and is limited by specific problems. (If you are using RNN and related sequence models, Adam is more commonly used in the early stages of the project. Again, don’t be a hero and refer to relevant papers.)

Complicate only one object at a time. If multiple electronic signals enter the classifier, introduce them one by one to ensure that the expected performance improvement is achieved every time the electronic signals are introduced. Don’t feed all the electronic signals to the model at the beginning. There are other ways to increase complexity, such as trying to pull out smaller images first and then shrink them, that sort of thing.

Burundins Escort Don’t put too much faith in the default learning rate decay. If you are reusing code from other fields, you must be very careful when dealing with learning rate decay. Not only do you want to use different attenuation schemes for Burundi Sugar different problems, but – even worse – in a typical implementation, the scheme Will be based on the current number of epochs, which will vary widely depending on the size of the data set.

For example, ImageNet will attenuate 10 times in 30 epochs. If you haven’t practiced ImageNet then this is definitely not the desired result. If you’re not careful, the code may push the learning rate to zero prematurely, causing the model to fail to converge. In my own research I always completely disable learning rate decay (the learning rate is constant) and adjust it at the end.

4. Regularization

Ideally, we are dealing with a large model that can at least fit the training set. Now is the time to get some verification correctness by giving up some training correctness and doing some regularization. Here are some tips and techniques for this:

Get more data. First of all, the best and preferred approach in any real-world situation is to add more Burundins Sugardaddy with more real training data. Stop spending a lot of time trying to squeeze performance from small data sets when you can collect larger data. As far as I know, adding more data is almost the only way to indefinitely improve the performance of a device with good neural networks. Another way is to go solo (if you can afford it), but it’s best after 5 models.

Data enhancement. Semi-real data, second only to real data, requires trying more aggressive data enhancement.

Inventive data enhancement. If half-fake data can’t do that, fake data can. People are looking for innovative ways to expand data sets; for example, domain randomization, the use of simulation, clever hybrids such as introducing (simulated) data into scenes, or even GANs.

Pre-exercise. Even if there is sufficient data, pre-training collections are rarely used if at all possible.

Maintain supervised learning. Don’t put too much faith in unsupervised pre-practices. As far as I know, no kind of unsupervised learning has a strong performance on modern computer vision tasks (although excellent models such as BERT are now born in the NLP field, but this is likely due to the more mature form of text data, and more high signal-to-noise ratio).

Smaller output dimensions. Remove features that may include false electronic signals. If the data set is small, any added false output can cause overfitting. Likewise, if low-level details are optional, try exporting smaller images.

Smaller mold. Many times domain knowledge constraints can be used on the network to reduce model size. For example, it used to be popular to use a fully connected layer on top of the ImageNet backbone, but now it has switched to simple uniform pooling, removing a lot of parameters from the process.

Reduce batch size. Since normalization is based on batch size, a smaller batch size will have a more obvious regularization effect. This is because the Burundi Sugar batch experience mean/scale is a more approximate version of the full mean/scale, so scale and offset” Wiggle” your batch for more.

Drop. Add dropout. Apply dropout2d (loss of space) to ConvNets. Propose careful use of dropout, by Burundi Sugar seems to be less suitable for batch unification.

Weight decay. Increase the weight decay penalty.

“Early stop”: End training late. End training based on validation loss to get the model before overfitting occurs.

I mentioned this last, and only after finishing early, but I was in the past. It has been discovered several times that large models end up overfitting, but their “early stopping” performance is often much better than that of small models.

Finally, in order to take a further step to ensure that the network is a reasonable classifier, I like to visualize the weights of the first layer of the network and make sure I get meaningful edges. If the first layer of filters looks like noise, then something may need to be removed. Similarly, the activation function in the network sometimes needs to be removed. There will be exceptions that expose some problems.

5. Fine tuning

At this time, you should use the data set to explore a wide model space to obtain a system structure with low verification loss:

Random Grid Search. It may sound tempting to use a grid search to ensure that all settings are tuned in order to tune multiple hyperparameters simultaneously, but keep in mind that it is better to use random search. This is because neural networks are often more sensitive to some parameters than others. In the extreme case, if one parameter is important but changing the other parameter has no effect, it is better to completely sample the first parameter.

Hyperparameter optimization. There are many fancy Bayesian hyperparameter optimization toolboxes out there, and some examples of their successful use, but my personal experience is that the best way to explore good wide-width models and hyperparameter spaces is Find a trainee. Haha, just kidding.

6. Performance Squeezing

After determining the best type of system structure and hyperparameters, you can still use some techniques to finally “squeeze” the performance of the system. :

Integration. Models are a very guaranteed way to get 2% correct on anything. If you can’t afford to do the calculations while testing, consider using dark knowledge to bring your whole body to the network.

Keep training. I often see people want to stop training on models when they have verified that the losses have stabilized. In my experience, I once accidentally stopped training during the winter break. The training was not completed. When I came back in January, I found that the model performance reached the SOTA level.

Written at the end

At this point, I believe you have achieved all the elements for success: the skills. , have an in-depth understanding of the data sets and questions, establish training/evaluation infrastructure for the entire model, and BI Escorts its accuracy have high levels of confidence and explore increasingly complex models to predict eachA step-by-step approach to performance improvement. Now is the time to start reading a lot of papers, try a lot of experiments, and get ready to get SOTA results. Good luck!

Original title: Li Feifei’s disciple, AI “Internet celebrity” Karpathy: 33 things you must read when practicing neural network Burundi Sugar DaddySkills

Article source: [Microelectronic signal: AI_era, WeChat public account: Xinzhiyuan] Welcome to add tracking and follow! Please indicate the source when transcribing and publishing the article.

How to use a trained neural network model Using a trained neural network model is a process involving multiple steps, including data preparation, model loading, predictive execution, and subsequent optimization. 's avatar Issued on 07-12 11:43 •491 views
How to practice Spiking Neural Network The practice of Spiking Neural Network (SNN, Spiking Neural Network) is a complicated one But it’s a challenging process that mimics the way biological neurons transmit information through pulses, or spikes Burundins Escort method. The following is the pulse neural network avatar Published on 07-12 10:13 •299 views
How to re-train neural network Re-training neural network is a complicated matter The process involves multiple steps and considerations. Introduction Neural network is a powerful machine learning model that is widely used in image recognition, natural language processing, speech recognition and other fields. However, as time goes by, the data distribution may change 's avatar Published on 07-11 10:25 •318 views
The basic structure and training process of BP neural network Through the network structure, the network is trained through the Error Backpropagation Algorithm to realize learning and solving of complex problems. The following will discuss the working method of BP neural network in detail, covering its basics 's avatar Published on 07-10 15:07 •1610 views
How to use Matlab for neural network Training makes the creation, training and simulation of neural networks more convenient. This article will introduce in detail how to use Matlab for neural network training, including network creation, data preprocessing, 's avatar Published on 07-08 18:26 •1121 views
Basic principles of artificial neural network model training image recognition, speech recognition, Natural language processing, etc. This article will introduce the basic principles of artificial neural network model training. 1. Basic concepts of neural collection 1.1 Neuron Neuron is 's avatar Issued on 07-05 09:16 •356 views
Deep neural collection and basic neural collection When discussing the differences between deep neural networks (DNNs) and basic neural networks (usually referred to as traditional neural networks or forward neural networks), we need to conduct in-depth analysis from multiple dimensions. These dimensions include 's avatar Published on 07-04 13:20 •357 views
The difference between backpropagation neural networks and bp neural networks Neural networks are used in many fields There are a wide range of applications, such as speech recognition, image recognition, natural language processing, etc. However, BP neural networks also have some problems, such as it is not difficult to fall into the local optimal solution, the training time is long, and it is sensitive to the initial weights. In order to solve these problems, researchers have proposed some improved BP 's avatar Published on 07-03 11:00 • 400 views
bp neural network and volume There are certain differences in the structure, principle, usage scenarios, etc. of different neural networks. The following is a comparison of the two neural networks: Basic structure BP neural network is a multi-layer feedforward neural network, consisting of an output layer, a hidden layer and an input layer. Each neuron is connected through weights and passed 's avatar Published on 07-03 10:12 •570 views
What the convolutional neural network trains is What, training process and application scenarios. 1. Basic concepts of convolutional neural network 1.1 Definition of convolutional neural network Convolutional neural network is a feed-forward deep learning model. Its core idea is to use convolution operations to extract local features of the output data and perform processing through a multi-layer structure 's avatar Published on 07-03 09:15 •224 views
How to train and optimize neural networks Neural networks are an important branch in the field of artificial intelligence and are widely used in images Recognition, natural language processing, speech recognition and other fieldsBurundi Sugar Daddydomain. However, for neural networks to achieve good results in actual applications, effective training and optimization must be carried out. This article will be based on neural network 's avatar Published on 07-01 14:14 •282 views
Detailed software and hardware for electrocardiogram denoising using deep recurrent neural network Complete click http://mcu-ai.com/ MCU-AI Technology Web Page_MCU-AI We propose a method to denoise electrocardiogram signals using a deep recurrent neural network built from black-and-white memory (LSTM) units. Published on 05-1 Burundi Sugar5 14:42
Applying neural collection to electroencephalogram (EEG) noise reduction data and clean EEG data Compose training data and divide it into training, validation and test data sets. Plotting Noisy EEG Data vs. Clean EEG Data Obviously, it is difficult for any traditional algorithm to filter out EEG data from noise. The God of Definition BI Escorts was organized by the Internet and the reason for choosing right and wrong memory was published on 04-30 20:40
Kaggle knowledge points : 7 scientific techniques for training neural networks. The neural network model uses stochastic gradient descent for training, and the model weights use the backpropagation algorithm to replace new materials. Optimization problems solved by training neural network models are very challenging, and although these algorithms perform well in experiments, there is no guarantee that they will converge in a timely manner. Published on 12-30 08:27 •554 views
How to train these neural networks to solve problems? In neural network modeling, questions often arise about how complex a neural network should be, i.e. how many layers it should have, or how large its filter matrix should be. There is no simple answer to this question. Related to this, it is important to discuss network overfitting and underfitting. Overfitting is when the model is too complex and Published on 11-24 15:35 •644 views

AI Internet celebrity Andrej Karpathy’s latest masterpiece, Burundi Sugar daddy quora teaches 33 practice skills of neural network

留言

發佈留言取消回覆

AI Internet celebrity Andrej Karpathy’s latest masterpiece, Burundi Sugar daddy quora teaches 33 practice skills of neural network

留言

發佈留言 取消回覆

發佈留言取消回覆