You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Directly applying advancements in transfer learning from BERT results in poor accuracy in domain-specific areas like law because of a word distribution shift from general domain corpora to domain-specific corpora. In our project, we will demonstrate how the pre-trained language model BERT can be adapted to additional domains, such as contract la…
A coursework-style project from my MSc Machine Learning on Big Data (University of East London), using PySpark to compute word frequency distributions on a large English corpus (~9.5 million words) and to compare frequency estimates from small samples against the full dataset.