sam
3f8891f542
All checks were successful
Gitea Actions Demo / Explore-Gitea-Actions (push) Successful in 4s
56 lines
2.0 KiB
Markdown
56 lines
2.0 KiB
Markdown
# A Siri like AI Assistant
|
|
|
|
* Uses ChatGPT (or alternative LLM) for general queries
|
|
* Uses Wolfram Alpha API for anything math related
|
|
* Has built in NLP (using a NLI model) for determining if we can process query locally (skills system)
|
|
* Frontend/Backend architecture for ability to deploy lightweight clients
|
|
|
|
|
|
## Skills
|
|
|
|
- [ ] Translations
|
|
- [ ] Alarms (potentally complete, if we use Timers logic)
|
|
- [ ] Calendar
|
|
- [ ] Gmail
|
|
- [ ] ChatGPT
|
|
- [ ] Reminders
|
|
- [x] Timers - TODO: Adding in sound notifications.
|
|
- [ ] Todos
|
|
- [ ] Weather
|
|
- [ ] Wolfram
|
|
- [x] NLP
|
|
- [x] Speech to Text (frontend for sure)
|
|
- [x] Phone
|
|
- [x] inital implementation where the number is sent to the phone
|
|
- [ ] NLP name to check contact
|
|
- [ ] iCloud Contact API
|
|
- [ ] API
|
|
- [ ] Authentication
|
|
- [ ] General API
|
|
- [ ] TTS
|
|
- generate audio on backend or frontend?
|
|
- Perks of backend is fast generation
|
|
- Cons of backend is large file transfers between devices, lots of internet usage
|
|
- Perks of frontend is less data transfer between devices requiring less internet usage
|
|
- Cons of frontend is slower generation
|
|
- Current Solution: https://github.com/synesthesiam/opentts
|
|
- Currently hosted instance: [tts.imsam.ca](https://tts.imsam.ca)
|
|
|
|
|
|
## API Specs
|
|
|
|
Using websockets for communication allows for two way communication where the server can send the client info at any point
|
|
Link for example: https://stackoverflow.com/questions/53331127/python-websockets-send-to-client-and-keep-connection-alive
|
|
More examples (includes jwt authentication, though this is in node.js, still useful for figuring out how to do this stuff): https://www.linode.com/docs/guides/authenticating-over-websockets-with-jwt/
|
|
|
|
|
|
## Ideas
|
|
|
|
* Dashboard with api call counts (would require linking into all active skills, callbacks with class inheritance maybe?)
|
|
* Phone calls from Jarvis speaker
|
|
* JARVIS, initiate the House Party Protocol (takeover screen and show retro style text interface, possibly showing data from dashboard)
|
|
|
|
|
|
## Wants, but limitations prevent
|
|
|
|
* *tumble weed bounces by* Oh, dear. |