The task is to build a model that take word( but character level ) as input “x” and calculate a state vector “s”. By using “s”, the network estimate p(w2 | w1). (w2, w1 are estimated and input word, respectively.) team2 and 3 will build same language model except that use different cost function and different approach to get morphemes. The cost function for you is the mutual information of input “x” and state “s”. We need to maximize mutual information(MI). In order maximize MI, a method called perturbation theory.

We Write Essays for Students

Tell us about your assignment and we will find the best writer for your paper

Get Help Now!

Notes: Objectives/Methodology of our Group
⦁ You are trying to derive morpheme embeddings using CNNs
⦁ The other are doing Language Models, you should agree on one model (ideally)
⦁ While there will be a difference on how we derive representations, the representations will be put on a common ground where they can both be tested
⦁ We should start by taking simple bigrams from the Brown corpus and try to predict the next word using morphemes
⦁ The task will be the same for both groups, but how we do it is different (our group uses Mutual Information CNNs, the others use Self-Attention)
Details for Constructing our Network
⦁ The morpheme vectors will be features of the word vector.
⦁ These word vectors are then used to predict the following words. In this sense the morphemes are like random variables
⦁ Entropy will be a corpus-wide measurement of this
⦁ Taking the expected-wide average of the corpus, the conditional entropy can be understood as between 2 random variables (Mutual information (Between X and Y)
⦁ Entropy of X – conditional entropy of X, given Y (This is symmetric, so Y, given X also)
We want to make good predictions using morpheme vectors (max probability) but we also want to maximize information transferred from each word vector.

⦁ The prof. talked a bit about a number of papers/tools related to persistent homology and topological data analysis.
⦁ A walkthrough of an example of simplicial complexes:
⦁ showing that an example of a boundary map, when composed with itself, always goes to zero
⦁ how deriving the homology group (Ker del_{n-1} / Im del_n) results in a system of equations (thus can be solved with gaussian elimination)

Useful link:

https://github.com/microsoft/nlp-recipes/blob/master/examples/model_explainability/interpret_dnn_layers.ipynb
examples/model_explainability/interpret_dnn_layers.ipynb
“`
here is the microsoft’s mutual information jupyter nitebook.

http://gudhi.gforge.inria.fr/doc/latest/

https://gudhi.inria.fr/python/latest/
I believe ripser is faster. You can use ripser for computation and gudhi for plots.

Question from other students:

Q1) if you currently thinking about what exactly our input and outputs should look like.
We should use a corpus to train a language model and put out a language model (vector space) or not. You should incorporate information about sub words in there? Or should we just test If subword information is actually included in the model?

A1)The language model is just a task to induce morphology. This basically means you can only use a plain text file to train on, without any annotations. It is up to you how to define the input-output. The simplest one could be to use pairs of consecutive words. Then the first word is the input and the second word is the output. The key part is that, your model should predict the second word by first breaking the input word into subword units and use their vectors to make that prediction of what words follow next. This breaking should be done dynamically for each word, so it needs character level processing. Each character would therefore be associated with a vector. So really, your input is a sequence of characters representing a word. The model then merges some characters together, and uses those subsequences (their vector representations) to compute probability over following words.

Q2)And then we should share the weights of the model with the topology team?

A2)In the process of training the model, you will derive vector representations of morphemes, and words. You just save those.

Q3) we are currently thinking of implementing subword (all the subword from 3 to 6 grams) into Transfomer instead of characters. We believe that through self -attention, actual morphemes of subwords will be highlighted so that morphological information will be derived at the top of the layer in transformer. But the thing is we haven’t figured out how we can merge subword embeddings and use them as an input to predict next word. Do you have any thoughts, or do you think is it possible to do so?

A3)There are more structured ways of merging them, but we want to do it in unsupervised way, so one way would be to use an LSTM.

Q4: Also, I don’t really get how we can feed each subword or characters into transformer layer. Because we want to do self-attention operation within each word (each will be a function of other part of the word), should we input a word one by one? Or if we feed many words in the same time, how can we do self-attention without looking at other words?

A4: You can have an architecture with one word at a time, but you could also have several words at a time and divide it all as you would just a single word in that case all the words would get split simultaneously.

Welcome to originalessaywriters.com, our friendly and experienced essay writers are available 24/7 to complete all your assignments. We offer high-quality academic essays written from scratch to guarantee top grades to all students. All our papers are 100% plagiarism-free and come with a plagiarism report, upon request

Tell Us “Write My Essay for Me” and Relax! You will get an original essay well before your submission deadline.

We Write Essays for Students

Tell Us “Write My Essay for Me” and Relax! You will get an original essay well before your submission deadline.

RECENT ASSIGNMENTS

What I hope to get out of writing about the Health care’s public policy is to get a better understanding of how it works and why it works the way they do.

Management Crisis