Mildly Conservative Q-Learning in Offline Reinforcement Learning for Grid World Navigation