Deep Reinforcement Learning with Continuous and Discrete Value Functions – An initial stage of the reinforcement learning task requires an initial set of objectives, which must fit under the optimal state distribution. One approach is to use a single objective for each goal, which is very much preferable to other strategies in that it avoids over-fitting. Then a policy learning scheme is proposed to learn a policy, and a policy selection algorithm is proposed to explore the optimal policy for the task. The algorithm is based on the principle of selecting the optimum policy for the task, which leads to a single policy. Experimental results show that the policy selection algorithm performs better than other policy learning methods.

As a powerful tool, deep learning can be used to discover the underlying structure of a computer’s input, and thus to model the dynamics of the input. In this work, we develop an iterative strategy for the deep learning to map input states into the input, as well as an iterative strategy for learning the output structure. To achieve this goal, in this work we construct an ensemble of deep network models, with weights on each model. Experimental results demonstrate that the weights have significantly different roles in the output structure and learned weights are more effective than other weights when applied to the same task.

Machine Learning Methods for Multi-Step Traffic Acquisition

# Deep Reinforcement Learning with Continuous and Discrete Value Functions

Learning the Structure of a Low-Rank Tensor with Partially-Latent Variables

Training of Convolutional Neural NetworksAs a powerful tool, deep learning can be used to discover the underlying structure of a computer’s input, and thus to model the dynamics of the input. In this work, we develop an iterative strategy for the deep learning to map input states into the input, as well as an iterative strategy for learning the output structure. To achieve this goal, in this work we construct an ensemble of deep network models, with weights on each model. Experimental results demonstrate that the weights have significantly different roles in the output structure and learned weights are more effective than other weights when applied to the same task.